You are on page 1of 28

Technical Report:

NetApp A-SIS Deduplication

Deployment and Implementation Guide


Network Appliance, Inc. | Bill May, Product and Partner Engineering | 23 July 2007 | TR-3505

Second Revision

Abstract
This guide introduces the NetApp A-SIS deduplication technology and describes in detail how to
implement and utilize it.
It should prove useful for customers requiring assistance in understanding and architecting solutions
with A-SIS deduplication and NetApp storage systems.

Network Appliance Inc.


This page is intentionally blank.

Network Appliance Inc.


Technical Report: 23 July 2007
NetApp A-SIS Deduplication TR-3505
Deployment and Implementation Guide Second Revision

Table of Contents

1 Introduction............................................................................................................1
1.1 Intended Audience...................................................................................................... 1
1.2 Purpose....................................................................................................................... 1
1.3 Prerequisites and Assumptions ................................................................................. 1
1.4 Document Conventions.............................................................................................. 1
2 Overview.................................................................................................................2
2.1 NetApp Deduplication Technologies ......................................................................... 2
2.1.1 SnapVault for NetBackup™.................................................................................................. 3
2.1.2 A-SIS Deduplication .............................................................................................................. 3
2.2 Dense Volumes .......................................................................................................... 3
2.3 A-SIS Features and Functions................................................................................... 4
2.3.1 General A-SIS Operational Considerations ......................................................................... 5
3 Configuration and Operation ...............................................................................6
3.1 Requirements Overview............................................................................................. 6
3.2 Installing and Licensing A-SIS ................................................................................... 6
3.2.1 A-SIS Licensing in a Clustered Environment....................................................................... 7
3.3 Command Summary .................................................................................................. 7
3.4 A-SIS Quick Start Guide............................................................................................. 8
3.5 Monitoring A-SIS Status ............................................................................................. 8
3.6 End-to-End A-SIS Configuration Example .............................................................. 10
3.7 Configuring A-SIS Schedules .................................................................................. 14
4 Operating Characteristics ..................................................................................16
4.1 A-SIS Target Environment ....................................................................................... 16
4.2 A-SIS Performance .................................................................................................. 16
4.3 A-SIS Storage Savings............................................................................................. 16
4.4 Additional A-SIS Considerations.............................................................................. 16
4.4.1 Number of A-SIS Processes............................................................................................... 17
4.4.2 A-SIS and Active/Active Configuration ............................................................................... 17
4.4.3 A-SIS and Space Savings on Existing Data ...................................................................... 17
4.4.4 A-SIS Best Practices ........................................................................................................... 18

Network Appliance Inc.


ii
Technical Report: 23 July 2007
NetApp A-SIS Deduplication TR-3505
Deployment and Implementation Guide Second Revision

5 Common Problems and Troubleshooting .......................................................19


5.1 Licensing................................................................................................................... 19
5.2 Volume Sizes............................................................................................................ 19
5.3 Logs and Error Messages........................................................................................ 19
5.4 Other Issues.............................................................................................................. 19
5.5 Not Seeing Space Savings ...................................................................................... 20
5.6 Undeduplicating a Flexible Volume ......................................................................... 20
5.7 Additional Reporting with “sis stat –l”....................................................................... 21
5.8 A-SIS and Reboots................................................................................................... 21
6 A-SIS and Replication.........................................................................................22
6.1 Replicating an A-SIS Flexible Volume for DR ......................................................... 22
6.2 Replicating Primary Data to an A-SIS Flexible Volume .......................................... 23

Network Appliance Inc.


iii
Technical Report: 23 July 2007
NetApp A-SIS Deduplication TR-3505
Deployment and Implementation Guide Second Revision

1 Introduction

1.1 Intended Audience


This technical report is designed for customers who seek education on the A-SIS deduplication
capability introduced in Data ONTAP® 7.2L1 and made generally available in Data ONTAP 7.2.2.
It will be most beneficial to those who are already familiar with NetApp hardware and software.

1.2 Purpose
The purpose of this paper is to present a guide for implementing NetApp A-SIS deduplication. It will
address step-by-step configuration examples, introduce known caveats and recommendations to
assist the reader in designing optimal solutions, and prepare the audience for performing
deployments of the technology in customer environments.
Its use is threefold:
ƒ Provide detailed information to all interested parties.
ƒ Educate prior to performing deployments.
ƒ Serve as a reference for resolving issues that could arise.
This document is not:
ƒ A sales guide (although some high-level thoughts are covered in the “Solutions Overview”
section)
ƒ A competitive comparison
ƒ A complete product design document

1.3 Prerequisites and Assumptions


For various details and procedures described in this document to be most useful to the reader, the
following assumptions are made:
ƒ The reader has general knowledge of NetApp platforms and products, particularly in the area
of data protection.
ƒ The reader has general knowledge of backup protection, data retention, and disaster
recovery solutions.

1.4 Document Conventions


ƒ While “A-SIS deduplication” is the official product name, for brevity’s sake in this document
the acronym “A-SIS” alone will typically be used.

Network Appliance Inc. Introduction


1
Technical Report: 23 July 2007
NetApp A-SIS Deduplication TR-3505
Deployment and Implementation Guide Second Revision

2 Overview
This section provides a quick overview of deduplication in general and then introduces what A-SIS
deduplication is and how it works at a high level.

2.1 NetApp Deduplication Technologies


Since its beginning NetApp has been an innovator in delivering storage solutions and continues to
invent new capacity optimizing technologies that reduce the cost of data storage. The following are
some of the basic products/features that deliver the value:
ƒ Snapshot™ for disk- and network-efficient recovery copies
ƒ SnapVault® for disk- and network-efficient backups
ƒ FlexVol® for space-efficient volume provisioning
ƒ FlexClone® for space-efficient test and development copies

While all these technologies offer the benefit of reducing the amount of required storage, in the
marketplace they are often not considered “deduplication” technologies when compared to solutions
offered by other vendors. That sentiment, while not entirely accurate, is understood, and NetApp
continues to expand its portfolio with several technologies for further deduplication of data. The
following subsections cover two of the solutions that are available as of the writing of this paper;
additional deduplication technologies are coming in both the short term and the more distant future.
Before delving into technical solutions, it makes sense to understand the value of deduplication to
customers. The primary advantage of data deduplication is that it conserves physical disk space
when storing data on disk. The average UNIX® or Windows® disk volume contains thousands of
duplicate data strings. Traditionally, when copies of these volumes are created, every duplicate data
string is also copied, resulting in an inefficient use of secondary storage. Deduplication helps to
remove this inefficiency and yields a more effective cost per gigabyte in the data center.

Figure 1) Reduced storage costs with deduplication.

Network Appliance Inc. Overview


2
Technical Report: 23 July 2007
NetApp A-SIS Deduplication TR-3505
Deployment and Implementation Guide Second Revision

2.1.1 SnapVault for NetBackup™


The first industry-recognized deduplication technology from NetApp was SnapVault for NetBackup,
which provides space savings similar to those provided by SnapVault to traditional NetBackup
environments. This solution integrates the NetApp secondary storage system as an optimized backup
repository for heterogeneous (not NetApp) primary storage. Its value is based on the assumption that
a file in the same data set and path but in different backups is likely to have a lot of blocks in
common.
Backups written to a NetApp storage unit utilize less disk space when compared to traditional disk
storage units. After an initial client backup is performed, the Network Appliance™ Write Anywhere
File Layout (WAFL®) file system saves only changed blocks when subsequent backups are
performed for the same client, providing single-instance storage (SIS) deduplication of the additional
backup images.
To NetBackup, the backup on the NetApp system looks like a standard NetBackup TAR image
backup, allowing most normal NetBackup operations (duplication, synthetics, vaulting, and so on) to
be performed. To end users, the backup on the NetApp system looks like a standard WAFL file
system, accessible through NFS and CIFS.
SnapVault for NetBackup (SV-NBU) was released as a joint solution as part of Data ONTAP 7.1 and
NetBackup 6.0, with a focus on data protection.

2.1.2 A-SIS Deduplication


Unlike SV-NBU, which performs block-level deduplication only for the same client/policy/directory/file,
and only for use with NetBackup, A-SIS deduplicates blocks anywhere in the active file system within
the entire flexible volume, regardless of how the data got there.
In its initial release, A-SIS primarily had a focus on data retention/archiving of file system data on
secondary storage NetApp systems. Substantial storage savings can be achieved with A-SIS
deduplication in some tier 2 primary storage environments as well.
While A-SIS is really part of a suite of deduplication technologies offered by NetApp, it is the sole
focus of the remainder of this paper.

2.2 Dense Volumes


Despite the introduction of less expensive ATA disk drives, one of the biggest challenges for disk-
based backup today continues to be the storage cost. There is a desire to reduce storage
consumption (and therefore storage cost per megabyte) by eliminating duplicated data through
sharing across files.
The core NetApp technology to accomplish this goal is the dense volume, a flexible volume that
contains shared data blocks. The NetApp Data ONTAP file system, WAFL, is a file system structure
that supports shared blocks in order to optimize storage space consumption. Basically, within one file
system tree there is the ability to have multiple references to the same data block, as shown in Figure
2.

Network Appliance Inc. Overview


3
Technical Report: 23 July 2007
NetApp A-SIS Deduplication TR-3505
Deployment and Implementation Guide Second Revision

Figure 2) Dense volumes.

To keep track of the many indirect blocks (“IND” in Figure 2) that are pointing to it, each data block
has a block count reference kept in the volume metadata. As additional indirect blocks point to it or
existing ones stop pointing to it, this value is incremented or decremented accordingly. When no
indirect blocks point to a data block, it is released.
A-SIS uses dense volume technology to allow duplicate blocks anywhere in the flexible volume to be
deleted.

2.3 A-SIS Features and Functions


A-SIS provides block-level deduplication within the entire flexible volume on NetApp NearStore®
storage systems. The depiction of how this works, at the highest level, is shown in Figure 3.

Figure 3) How A-SIS deduplication works.

Network Appliance Inc. Overview


4
Technical Report: 23 July 2007
NetApp A-SIS Deduplication TR-3505
Deployment and Implementation Guide Second Revision

Essentially, A-SIS only stores unique blocks in the flexible volume and creates a small amount of
additional metadata in the process. Notable features include:
ƒ NetApp A-SIS deduplication operates with a high degree of granularity, at the block level.
ƒ It operates on the active file system of the flexible volume. Snapshot copies created after
running A-SIS enjoy the same storage savings benefits.
ƒ A-SIS is a background process that can be configured to run automatically, scheduled, or run
manually through the command-line interface.
ƒ A-SIS is application transparent and therefore can be used for deduplication of data
originating from anywhere in the data center.
ƒ A-SIS is enabled and managed using a simple command-line interface.
ƒ A-SIS can be enabled on and deduplicate blocks on flexible volumes with existing data too.

The remainder of this document goes into great detail on the operation of A-SIS, but in general the
following occurs:
Newly saved data on the NearStore is stored in blocks as usual by Data ONTAP. Each block
of data has a digital fingerprint, which is compared to all other fingerprints in the flexible
volume. If two fingerprints are found to be the same, a byte-for-byte comparison is done of all
bytes in the block, and, if there is an exact match between the new block and the existing
block on the flexible volume, the duplicate block is discarded and its disk space is reclaimed.

2.3.1 General A-SIS Operational Considerations


A-SIS is enabled on a per flexible volume basis.
A-SIS can be enabled on any number of flexible volumes.
A-SIS can be run one of three ways:
ƒ Scheduled on specific days and at specific times
ƒ Manually via the command line
ƒ Automatically, when 20% new data has been written to the volume
Only one A-SIS process runs on a flexible volume at a time.
Up to eight A-SIS processes can run concurrently on the same NetApp storage array.
A-SIS is supported in an active/active clustered failover configuration. For clarifying details, see the
“A-SIS and Active/Active Configuration” section.

Network Appliance Inc. Overview


5
Technical Report: 23 July 2007
NetApp A-SIS Deduplication TR-3505
Deployment and Implementation Guide Second Revision

3 Configuration and Operation


This section discusses what is required to install A-SIS, how to configure it, and various aspects of
managing it. Although this section discusses some basic things, in general it assumes both that the
NetApp storage system is already installed and running, and that the reader is familiar with basic
NetApp Data ONTAP administration.

3.1 Requirements Overview


Table 1 specifies the requirements for A-SIS.

Table 1) A-SIS requirements overview.


Hardware NearStore R200
FAS3020, FAS3040, FAS3050, FAS3070, FAS6030, FAS6070
FAS2020, FAS2050 (requires Data ONTAP 7.2.2L1)
IBM: N5200, N5300, N5500, N5600, N7600, N7800
Data ONTAP Data ONTAP 7.2.2 or later
Software nearstore_option (for all platforms except R200) license
a_sis license

Maximum Flexible FAS6070, N7800: 16TB


Volume Size
FAS6030, N7600: 10TB
FAS3070, N5600: 6TB
NearStore R200: 4TB
FAS3040, N5300: 3TB
FAS3050, N5500: 2TB
FAS3020, N5200: 1TB
FAS2050: 1TB
FAS2020: 0.5TB
Protocols All file-based and block-based protocols supported by Data ONTAP
Applications Refer to the “A-SIS Target Environment” section

3.2 Installing and Licensing A-SIS


A-SIS is included in Data ONTAP and just needs to be licensed. Add the A-SIS license using the
following command:
license add <a_sis>
If you want to run A-SIS on any of the FAS platforms you will also need to add the
nearstore_option license:
license add <nearstore_option>

Network Appliance Inc. Configuration and Operation


6
Technical Report: 23 July 2007
NetApp A-SIS Deduplication TR-3505
Deployment and Implementation Guide Second Revision

3.2.1 A-SIS Licensing in a Clustered Environment


A-SIS deduplication is a licensed option behind the NearStore option license. Hence, in a clustered
environment, both nodes must have the NearStore option and A-SIS licensed.

3.3 Command Summary


Table 2 provides a description of all A-SIS (related) commands. Cells that are shaded indicate those
commands that are only available via “priv set diag”.

Table 2) A-SIS command summary.


sis on <vol> Activates A-SIS on the flexible volume specified.
sis start -s <vol> Begins deduplication process on the flexible volume
specified.
Using the -s option tells the deduplication operation to
scan the flexible volume specified and process existing
data.
This option should only be used upon initial configuration
and deduplication on a flexible volume.
sis start <vol> Begins deduplication process on the flexible volume
specified.

sis status [-l] <vol> Returns current status of A-SIS for the specified flexible
volume.
The -l option causes a long listing to be displayed.
df –s <vol> Returns the value of A-SIS space savings in the active file
system for the specified flexible volume.

sis config [-s sched]\ Creates an automated deduplication sched(ule).


<vol>
The syntax follows the SnapVault syntax model.
When A-SIS is first enabled on a flexible volume, a default
schedule is configured, running it each day of the week at
midnight.
sis stop <vol> Suspends the A-SIS deduplication process (if one is
running) on the flexible volume specified.

sis off <vol> Deactivates A-SIS on the flexible volume specified. This
means there will be no more change logging or
deduplication operations, but the flexible volume will
remain a dense volume, and the storage savings will be
kept.
If this command is used, and then A-SIS is turned back on
for this flexible volume, the flexible volume will need to be
rescanned with the ”sis start –s” command.

Network Appliance Inc. Configuration and Operation


7
Technical Report: 23 July 2007
NetApp A-SIS Deduplication TR-3505
Deployment and Implementation Guide Second Revision

sis check <vol> Verifies and updates the fingerprint database for the
specified flexible volume and includes purging stale
fingerprints.
sis stat <vol> Displays the statistics of flexible volumes that have A-SIS
enabled.

sis undo <vol> Converts an A-SIS-enabled flexible volume to a normal


flexible volume.

3.4 A-SIS Quick Start Guide


This section provides a quick run-through of the steps to configure and manage A-SIS.

Table 3) A-SIS quick overview.


New Flexible Volume Flexible Volume with Existing Data
Flexible Volume Create flexible volume.
Configuration

Enable A-SIS sis on <vol>


on Flexible
Volume
Initial Scan Not applicable. Scan/deduplicate the existing data.
sis start -s <vol>

Create, Modify, Delete or modify the default A-SIS schedule that was configured when A-
Delete SIS was first enabled on the flexible volume or create desired schedule.
Schedules (if sis config [-s sched] <vol>
not doing
manually)
Manually Run sis start <vol>
A-SIS (if not
using
schedules)
Monitor Status sis status <vol>
of A-SIS

Monitor Space df –s <vol>


Savings

3.5 Monitoring A-SIS Status


This section describes the meaning of various status messages about A-SIS. The “sis status”
command is the primary command used to report on the status of A-SIS for a specific flexible volume
or all the flexible volumes.

Network Appliance Inc. Configuration and Operation


8
Technical Report: 23 July 2007
NetApp A-SIS Deduplication TR-3505
Deployment and Implementation Guide Second Revision

Below, from the sis man page, you see the various State, Status, and Progress messages that can
be returned when running sis status. Note that if you don’t provide a flexible volume name, the
status for all flexible volumes that have A-SIS enabled will be displayed.
toaster> sis status
Path State Status Progress
/vol/dvol_1 Enabled Idle Idle for 10:45:23
/vol/dvol_2 Enabled Pending Idle for 15:23:41
/vol/dvol_3 Disabled Idle Idle for 37:12:34
/vol/dvol_4 Enabled Active 25 GB Scanned
/vol/dvol_5 Enabled Active 25 MB Searched
/vol/dvol_6 Enabled Active 40 MB (20%) Done
/vol/dvol_7 Enabled Active 30 MB Verified
/vol/dvol_8 Enabled Active 10% Merged

And following is a textual description of the meaning for each flexible volume:
ƒ dvol_1 is Idle. The last A-SIS operation on the flexible volume was finished 10:45:23 ago.
ƒ dvol_2 is Pending for resource limitation. The A-SIS operation on the flexible volume will
become Active when the resource is available.
ƒ dvol_3 is Idle because the A-SIS operation is disabled on the flexible volume.
ƒ dvol_4 is Active. The A-SIS operation is doing the whole flexible volume scanning (initiated
with “sis start –s”). So far, it has scanned 25GB of data.
ƒ dvol_5 is Active. The operation is searching for duplicate data, and 25MB of data has already
been searched.
ƒ dvol_6 is also Active. The operation has saved 40MB of data. This is 20% of the total
duplicate data found in the searching stage.
ƒ dvol_7 is Active. It is verifying the metadata of processed data blocks. This process will
remove unused metadata.
ƒ dvol_8 is Active. Verified metadata are being merged. This process will merge together all
verified metadata of processed data blocks to an internal format that supports fast sis
operation.

The general flow of the phases A-SIS goes through and the correlating sis status messages
when actively running on a flexible volume are shown in Figure 4.

Network Appliance Inc. Configuration and Operation


9
Technical Report: 23 July 2007
NetApp A-SIS Deduplication TR-3505
Deployment and Implementation Guide Second Revision

Figure 4) A-SIS status messages and their correlation to A-SIS phases.

For additional information, the -l option will display detailed status, as shown below.
toaster> sis status -l /vol/dvol_6
Path: /vol/dvol_6
State: Enabled
Status: Active
Progress: 41020 KB (20%) Done
Type: Regular
Schedule: sun-sat@0
Last Operation Begin: Thu Mar 24 13:30:00 PST 2005
Last Operation End: Fri Mar 25 00:34:16 PST 2005
Last Operation Size: 4732932 KB
Last Operation Error: -

3.6 End-to-End A-SIS Configuration Example


This section steps through the entire typical process of creating a flexible volume and configuring,
running, and monitoring A-SIS on it. (Note that steps are spelled out in detail, so it appears a lot
lengthier than it would be in the real world.)
In this example we want a place to archive a number of large PST files various users have created
and are maintaining. The destination NetApp storage system is called r200-rtp01, and it is assumed
that A-SIS has been licensed on this machine. As NetApp storage arrays are multiprotocol boxes, in
this example we’ll actually be using a UNIX server to copy the PST data.

Network Appliance Inc. Configuration and Operation


10
Technical Report: 23 July 2007
NetApp A-SIS Deduplication TR-3505
Deployment and Implementation Guide Second Revision

1. Begin by creating a flexible volume (keeping in mind the maximum allowable volume size for the
platform, as specified in the requirements table at the beginning of this section).
r200-rtp01*> vol create VolPST aggr0 200g
Creation of volume 'VolPST' with size 200g on containing aggregate
'aggr0' has completed.

2. Now, as a best practice, we’ll disable scheduled Snapshot copies. An alternative to what’s shown
below would be to use the command “snap sched VolPST 0 0 0”.
r200-rtp01*> vol status VolPST
Volume State Status Options
VolPST online raid_dp, flex
Containing aggregate: 'aggr0'
r200-rtp01*> vol options VolPST nosnap true
r200-rtp01*> vol status VolPST
Volume State Status Options
VolPST online raid_dp, flex nosnap=on
Containing aggregate: 'aggr0'
3. Now we’ll enable A-SIS on the flexible volume and verify that it’s turned on. The vol status
command will show a sis attribute for flexible volumes that have A-SIS turned on. (It can be a bit
confusing, since sis is also indicated for those flexible volumes that have been written to by
SnapVault for NetBackup.)

Note that there needs to be space available in the flexible volume for the sis on command to
complete successfully. That is, if the sis on command were attempted on a flexible volume that
already had data and was completely full, it would fail (since there is no room to create the
required metadata).

Note that after turning A-SIS on, Data ONTAP lets you know that if this were an existing flexible
volume that already contained data prior to A-SIS being enabled, you would want to run sis
start –s; in this example it’s a brand-new flexible volume, so that’s not necessary.

r200-rtp01*> sis on /vol/VolPST


SIS for "/vol/VolPST" is enabled.
Already existing data could be processed by running "sis start -s
/vol/VolPST".
r200-rtp01*> vol status VolPST
Volume State Status Options
VolPST online raid_dp, flex nosnap=on
sis
Containing aggregate: 'aggr0'

Network Appliance Inc. Configuration and Operation


11
Technical Report: 23 July 2007
NetApp A-SIS Deduplication TR-3505
Deployment and Implementation Guide Second Revision

4. Another way to verify that A-SIS is enabled on the flexible volume is to just check the output from
running sis status on the flexible volume.
r200-rtp01*> sis status /vol/VolPST
Path State Status Progress
/vol/VolPST Enabled Idle Idle for 00:00:20

5. Next we’ll turn off the default A-SIS schedule. Since in this example the administrators will be
moving large quantities of PST files in as time permits, we’ll want to let them run A-SIS manually
at opportune times.
r200-rtp01*> sis config /vol/VolPST
Path Schedule
/vol/VolPST sun-sat@0
r200-rtp01*> sis config -s - /vol/VolPST
r200-rtp01*> sis config /vol/VolPST
Path Schedule
/vol/VolPST -

At this point, in our example, the administrator NFS-mounted the flexible volume to /testPSTs on a
Solaris™ host, sunv240-rtp01, and copied lots of PST files from their users’ directories into our
new PST archive directory flexible volume. The result from the host perspective is shown below.
(Obviously the same sort of thing could be accomplished by mapping a CIFS share to a Windows
host.)
root@sunv240-rtp01 # pwd
/testPSTs
root@sunv240-rtp01 # df -k .
Filesystem kbytes used avail capacity Mounted on
r200-rtp01:/vol/VolPST
167772160 33388384 134383776 20% /testPSTs
The example continues with examining the flexible volume, running A-SIS deduplication, and
monitoring the status.

6. Use df –s to examine the storage consumed and the space savings provided. Note that no space
savings have been achieved by simply copying data to the flexible volume even though A-SIS is
turned on. What has happened is that all the blocks that have been written to this flexible volume
since A-SIS was turned on have had their fingerprints written to the change log file.
r200-rtp01*> df -s /vol/VolPST
Filesystem used saved %saved
/vol/VolPST/ 33388384 0 0%

7. Start A-SIS running on the flexible volume. This causes the change log to be processed,
fingerprints to be sorted and merged, and duplicate blocks to be found.
r200-rtp01*> sis start /vol/VolPST
The SIS operation for "/vol/VolPST" is started.

Network Appliance Inc. Configuration and Operation


12
Technical Report: 23 July 2007
NetApp A-SIS Deduplication TR-3505
Deployment and Implementation Guide Second Revision

8. Use sis status to monitor the progress of A-SIS.


r200-rtp01*> sis status /vol/VolPST
Path State Status Progress
/vol/VolPST Enabled Active 9211 MB Searched

r200-rtp01*> sis status /vol/VolPST


Path State Status Progress
/vol/VolPST Enabled Active 11 MB (0%) Done

r200-rtp01*> sis status /vol/VolPST


Path State Status Progress
/vol/VolPST Enabled Active 1692 MB (14%) Done

r200-rtp01*> sis status /vol/VolPST


Path State Status Progress
/vol/VolPST Enabled Active 10 GB (90%) Done

r200-rtp01*> sis status /vol/VolPST


Path State Status Progress
/vol/VolPST Enabled Active 11 GB (99%) Done

r200-rtp01*> sis status /vol/VolPST


Path State Status Progress
/vol/VolPST Enabled Idle Idle for 00:00:07

9. Once sis status indicates the flexible volume is once again in the Idle state, A-SIS has finished
running, and we can now check the space savings it provided in the flexible volume.
r200-rtp01*> df -s /vol/VolPST
Filesystem used saved %saved
/vol/VolPST/ 24072140 9316052 28%

That’s all there is to it.

Network Appliance Inc. Configuration and Operation


13
Technical Report: 23 July 2007
NetApp A-SIS Deduplication TR-3505
Deployment and Implementation Guide Second Revision

3.7 Configuring A-SIS Schedules


This section provides some specifics about configuring schedules with A-SIS.
The sis config command is used to configure and view A-SIS schedules for flexible volumes.
Usage syntax is shown below.
r200-rtp01*> sis help config
sis config [ [ -s schedule ] <path> | <path> ... ]
- Sets up, modifies, and retrieves the schedule of SIS
volumes.

Run with no arguments, sis config will return the schedules for all flexible volumes that have A-
SIS enabled. The example below shows the four different formats the reported schedules can have.
toaster> sis config
Path Schedule
/vol/dvol_1 -
/vol/dvol_2 23@sun-fri
/vol/dvol_3 auto
/vol/dvol_4 sat@6

The meaning of each of these schedule types is as follows.


ƒ On flexible volume dvol_1 A-SIS is not scheduled to run.
ƒ On flexible volume dvol_2 A-SIS is scheduled to run every day from Sunday to Friday at 11
p.m.
ƒ On flexible volume dvol_3 A-SIS is set to auto schedule. This means A-SIS will be triggered
by the amount of new data written to the flexible volume, specifically when there are 20%
new fingerprints in the change log.
ƒ On flexible volume dvol_4 A-SIS is scheduled to run at 6 a.m. on Saturday.

When the -s option is specified, the command will set up or modify the schedule on the specified
flexible volume. The schedule parameter can be specified in one of four ways:
[day_list][@hour_list]
[hour_list][@day_list]
-
auto

The day_list specifies which days of the week A-SIS should run. It is a comma-separated list of the
first three letters of the day: sun, mon, tue, wed, thu, fri, sat. The names are not case sensitive.
Day ranges such as mon-fri can also be given. The default day_list is sun-sat.
The hour_list specifies which hours of the day A-SIS should run on each scheduled day. The
hour_list is a comma-separated list of the integers from 0 to 23. Hour ranges such as 8-17 are
allowed.

Network Appliance Inc. Configuration and Operation


14
Technical Report: 23 July 2007
NetApp A-SIS Deduplication TR-3505
Deployment and Implementation Guide Second Revision

Step values can be used in conjunction with ranges. For example, 0-23/2 means "every two hours."
The default hour_list is 0 (that is, midnight on the morning of each scheduled day).
If "-" is specified, there won't be a scheduled A-SIS operation on the flexible volume.
The “auto” schedule causes A-SIS to run on that flexible volume whenever there are 20% new
fingerprints in the change log. This check is done in a background process and occurs every minute.
When A-SIS is enabled on a flexible volume the first time, an initial schedule is assigned to the
flexible volume. This initial schedule is sun-sat@0, which means "once every day at midnight."
To configure the schedules shown earlier in this section, the following commands would be issued:
toaster> sis config -s - /vol/dvol_1
toaster> sis config -s 23@sun-fri /vol/dvol_2
toaster> sis config –s auto /vol/dvol3
toaster> sis config –s sat@6 /vol/dvol_4

Network Appliance Inc. Configuration and Operation


15
Technical Report: 23 July 2007
NetApp A-SIS Deduplication TR-3505
Deployment and Implementation Guide Second Revision

4 Operating Characteristics
This section discusses where A-SIS makes sense and the behavior that you can expect.

4.1 A-SIS Target Environment


This section discusses where A-SIS is a good fit.
A-SIS supports flexible volumes that have data written to them using CIFS or NFS, or as LUNs
accessed using FCP/iSCSI. Basically it doesn’t matter how the data got on the NetApp storage
system; A-SIS will deduplicate it.
A-SIS was initially targeted to data retention/archival environments in its first release (Data ONTAP
7.2L1), focusing on archives of file data: for example, home directories, engineering
development, Microsoft® Office, e-mail archive, SharePoint, technical and general publications, and
so on.
Substantial benefit can be achieved in some tier 2 primary storage environments as well.
A-SIS is supported in disaster recovery configurations where SnapMirror® is used; see the
“Replication and SnapMirror” section for specific details.

4.2 A-SIS Performance


A-SIS is tightly integrated with Data ONTAP and the WAFL file structure. Because of this,
deduplication is performed with extreme efficiency. Complex hashing algorithms and look-up tables
are not required. Instead, A-SIS is able to leverage the internal characteristics of Data ONTAP to
create and compare digital fingerprints, redirect data pointers, and free up redundant data areas, all
with a minimal amount of performance impact.

4.3 A-SIS Storage Savings


While A-SIS can deduplicate any blocks in a flexible volume of the NetApp storage system, the
storage savings achieved can vary based on the data set.
Running A-SIS one time on a single data set can provide the storage savings that cover the spectrum
of 10% to 90%, with 30% to 50% being typical.
In cases where customers are backing up or archiving data over and over again, the realized storage
savings A-SIS can provide get better and better, achieving 20:1 (95%) and higher over time.

4.4 Additional A-SIS Considerations


This section provides some discussion on other A-SIS-related topics. Some of this information may
be covered elsewhere, but it bears reiterating here.
First, refer to the A-SIS requirements table (Table 1) in the beginning of section 3 for specific
supported hardware and software and necessary licenses.

Network Appliance Inc. Operating Characteristics


16
Technical Report: 23 July 2007
NetApp A-SIS Deduplication TR-3505
Deployment and Implementation Guide Second Revision

4.4.1 Number of A-SIS Processes


A maximum of eight A-SIS processes can be run at the same time on the same NearStore device.
ƒ If another flexible volume is scheduled to have A-SIS run while eight A-SIS processes are
already running, A-SIS for this additional flexible volume will be queued. For example, say a
user sets a default schedule (sun-sat@0) for 10 A-SIS volumes. Eight will run at midnight,
and the remaining two will be queued.
ƒ As soon as one of the eight current A-SIS processes completes, one of the queued
ones will start, and when another A-SIS process completes, the second queued one
will start.
ƒ Next time A-SIS is scheduled to run on these same 10 flexible volumes, a round-
robin paradigm will be used so the same ones aren’t always the first ones run.
ƒ For manually triggered A-SIS runs, if eight A-SIS processes are already running when a
command is issued to start another one, the request will fail, and the operation will not be
queued.

4.4.2 A-SIS and Active/Active Configuration


NetApp cluster services are supported with A-SIS in the following manner upon failover to the partner
node.
ƒ Writes to the flexible volume will have fingerprints written to the change log.
ƒ No sis administration operations or deduplication will function.
ƒ Upon failback, normal A-SIS operations can continue and the updated change log processed.
A-SIS deduplication is a licensed option behind the NearStore option license. Our best practice
recommendation is to have both nodes in an active/active configuration licensed with the NearStore
option and A-SIS.

4.4.3 A-SIS and Space Savings on Existing Data


A major benefit of A-SIS is that it can be used to deduplicate existing data on previously used flexible
volumes (after upgrading to Data ONTAP 7.2.2). It is completely realistic to assume that Snapshot
copies may exist. What happens when you run A-SIS in this case?
When you first run A-SIS on this flexible volume, the storage savings will probably be rather small or
even nonexistent because existing Snapshot copies are not deduplicated.
ƒ Previous Snapshot copies will expire, and as they do some small savings will be realized, but
they too will probably be pretty low.
ƒ During this period of old Snapshot copies expiring, it is fair to assume new data is being
created on the flexible volume and Snapshot copies being created.
ƒ Thus the storage savings may stay rather flat (that is, very low).
ƒ When the last Snapshot copy that was created before A-SIS was run is deleted, the storage
savings should increase noticeably.

Network Appliance Inc. Operating Characteristics


17
Technical Report: 23 July 2007
NetApp A-SIS Deduplication TR-3505
Deployment and Implementation Guide Second Revision

4.4.4 A-SIS Best Practices


This section contains general rules of thumb that might not have been covered elsewhere in this
document.

ƒ If there is very little new data, run A-SIS infrequently, because it doesn't make sense to
unnecessarily consume CPU resources. How often you run it will depend on the change rate
of the data in the flexible volume.
ƒ The best options are:
ƒ Use the auto mode so that A-SIS only runs when significant additional data has
been written to each particular flexible volume (this will tend to naturally spread out
when A-SIS runs).
ƒ Stagger A-SIS schedules for the flexible volumes so it runs on alternative days.
ƒ Run A-SIS manually.
ƒ Run A-SIS before creating Snapshot copies, as this will ensure no undeduplicated data gets
locked in Snapshot copies. If a Snapshot copy is created on a flexible volume before A-SIS
has a chance to run/complete on that flexible volume, this could result in lower space
savings.
ƒ The Snapshot reserve should be greater than 0 if Snapshot copies are to be used. (An
exception to this might be in a SAN environment, where often it is set to zero for thin
provisioning of LUNs.)
ƒ There must be some free space in the flexible volume to allow A-SIS to operate and create
the metadata it requires. As necessary, flexible volumes can be resized, with no impact to
data access, to accommodate this.

Network Appliance Inc. Operating Characteristics


18
Technical Report: 23 July 2007
NetApp A-SIS Deduplication TR-3505
Deployment and Implementation Guide Second Revision

5 Common Problems and Troubleshooting


This section covers issues that have been known to come up when configuring and running A-SIS.

5.1 Licensing
Make sure A-SIS is properly licensed and, if the platform is not an R200, make sure the NearStore
option is also properly licensed:
fas3070-rtp01*> license

a_sis <license>
nearstore_option <license>

5.2 Volume Sizes


Adhere to the A-SIS volume size limits presented in the “Requirements Overview” section. If you
exceed them you will not be able to enable A-SIS on that volume.
Below is an example of the message displayed if the volume is too large to enable A-SIS.
london-fs3> sis on /vol/projects

Volume or maxfiles exceeded max allowed for SIS: /vol/projects

Also note that there needs to be free space available in the flexible volume for the “sis on”
command to complete successfully. If a flexible volume is full, A-SIS will not run. However, as noted
earlier, flexible volumes can be resized with no impact to data access to accommodate this.

5.3 Logs and Error Messages


New error log: /etc/log/sis
New error messages
Registry errors: Check if vol0 is full.
Metafile op errors: Check if the A-SIS flexible volume is full.
License errors: Check if license is installed.
Change log full error: Perform a “sis start” operation that will empty the change log
metafile when finished.

5.4 Other Issues


Refer to the Data ONTAP 7.2.2 release notes for complete information.

Network Appliance Inc. Common Problems and Troubleshooting


19
Technical Report: 23 July 2007
NetApp A-SIS Deduplication TR-3505
Deployment and Implementation Guide Second Revision

5.5 Not Seeing Space Savings


If you’ve run A-SIS on a flexible volume that you’re confident contains data that should deduplicate
well, yet you are not seeing any space savings, there’s a good chance a number of Snapshot copies
exist and are locking a lot of data. This especially tends to happen when people run A-SIS on existing
flexible volumes of data.
Use the “snap list” command to see what Snapshot copies exist and the “snap delete”
command to remove them. Alternatively, wait for the Snapshot copies to expire, and the space
savings will appear.

5.6 Undeduplicating a Flexible Volume


It is possible, and easy, to “undeduplicate” a flexible volume that has A-SIS enabled, by backing out
A-SIS and turning it back into a “regular” (non-dense) flexible volume. This can be done while the
flexible volume is online and is accomplished as described below.
Turn A-SIS off on the flexible volume. (Note that this command stops fingerprints from being written to
the change log as new data is written to the flexible volume. If this command is used, and then A-SIS
is turned back on for this flexible volume, the flexible volume will need to be rescanned with the ”sis
start –s” command.)
sis off <flexvol>
Use the following command 1 to recreate the duplicate blocks in the flexible volume.
sis undo <flexvol>
When this command completes, it will delete the fingerprint file and the change log files.
Below is an example of undeduplicating a flexible volume.
r200-rtp01*> df –s /vol/VolReallyBig2
/vol/VolReallyBig2/ 20568276 3768732 15%
r200-rtp01*> sis status /vol/VolReallyBig2
Path State Status Progress
/vol/VolReallyBig2 Enabled Idle Idle for 11:11:13
r200-rtp01*> sis off /vol/VolReallyBig2
SIS for "/vol/VolReallyBig2" is disabled.
r200-rtp01*> sis status /vol/VolReallyBig2
Path State Status Progress
/vol/VolReallyBig2 Disabled Idle Idle for 11:11:34
r200-rtp01*> sis undo /vol/VolReallyBig2
Wed Feb 7 11:13:15 EST [wafl.scan.start:info]: Starting SIS volume scan on
volume VolReallyBig2.
r200-rtp01*> sis status /vol/VolReallyBig2
Path State Status Progress
/vol/VolReallyBig2 Disabled Undoing 424 MB Processed
r200-rtp01*> sis status /vol/VolReallyBig2

1
Note that the undo option of the sis command is only available in the diag mode, accessed using
the command “priv set diag”.

Network Appliance Inc. Common Problems and Troubleshooting


20
Technical Report: 23 July 2007
NetApp A-SIS Deduplication TR-3505
Deployment and Implementation Guide Second Revision

No status entry found.


r200-rtp01*> df -s /vol/VolReallyBig2
Filesystem used saved %saved
/vol/VolReallyBig2/ 24149560 0 0%

Note that if sis undo starts processing and then there is not enough space to undeduplicate, it will
stop, complain with a message about insufficient space, and leave the flexible volume dense. All data
is still accessible, but some block sharing is still occurring. Use “df –s” to understand how much free
space you really have and then either grow the flexible volume or delete data or Snapshot copies to
provide the needed free space.

5.7 Additional Reporting with “sis stat –l”


For additional status information, you can do “priv set diag” and use the “sis stat –l”
command for long detailed listings.

5.8 A-SIS and Reboots


If a NetApp storage system is rebooted when A-SIS is running, when it reboots A-SIS will be in the
“Idle” state for that flexible volume. When the next A-SIS processing for that flexible volume starts, it
will clean up any remaining intermediate metadata that was created by the previous A-SIS operation.

Network Appliance Inc. Common Problems and Troubleshooting


21
Technical Report: 23 July 2007
NetApp A-SIS Deduplication TR-3505
Deployment and Implementation Guide Second Revision

6 A-SIS and Replication


Although there are substantial benefits to be achieved with A-SIS, a complete solution will most likely
involve the need to additionally mirror it to another location for disaster recovery purposes.
Replication of the A-SIS-enabled flexible volume is supported using NetApp SnapMirror in two ways,
as discussed in the next two subsections.

6.1 Replicating an A-SIS Flexible Volume for DR


An A-SIS flexible volume can be replicated to a secondary storage system (destination) using Volume
SnapMirror (VSM) as shown in Figure 5.

Figure 5) VSM of A-SIS flexible volume for disaster recovery.

Key points in this scenario are:


ƒ The nearstore_option must be licensed on both the source and destination.
ƒ A-SIS must be licensed at the primary location (source).
ƒ A-SIS does not need to be licensed at the destination. However, if there is a situation in
which the primary site is down and the secondary location becomes the new primary, A-SIS
needs to be licensed for continued deduplication to occur. Thus, the best practice is to have
A-SIS licensed at both locations.
ƒ A-SIS is only enabled, run, and managed from the primary location.
ƒ The flexible volume at the secondary location will “inherit” all the A-SIS attributes and storage
savings through SnapMirror.
ƒ Only unique blocks are transferred, so A-SIS deduplication reduces network bandwidth
usage too.

Network Appliance Inc. A-SIS and Replication


22
Technical Report: 23 July 2007
NetApp A-SIS Deduplication TR-3505
Deployment and Implementation Guide Second Revision

6.2 Replicating Primary Data to an A-SIS Flexible Volume


A production primary flexible volume can be replicated to an A-SIS-enabled flexible volume on a
secondary storage system using Qtree SnapMirror (QSM), as shown in Figure 6.

Figure 6) QSM of production data to an A-SIS flexible volume.

Key points in this scenario are:


ƒ The nearstore_option must be licensed on the destination.
ƒ A-SIS is only licensed at the secondary location (destination).
ƒ A-SIS is enabled, run, and managed on a flexible volume at the secondary location.
ƒ A-SIS doesn’t yield any network bandwidth savings as QSM works at the logical layer.
ƒ Storage savings benefit at the QSM destination is achieved by running A-SIS on the
destination after QSM has finished transferring the data.

Network Appliance Inc. A-SIS and Replication


23
Technical Report: 23 July 2007
NetApp A-SIS Deduplication TR-3505
Deployment and Implementation Guide Second Revision

Network Appliance, Inc.

© 2007 Network Appliance, Inc. All rights reserved. Specifications subject to change without notice. NetApp, the Network Appliance logo, Data
ONTAP, FlexClone, FlexVol, NearStore, SnapMirror, SnapVault, and WAFL are registered trademarks and Network Appliance and Snapshot are
trademarks of Network Appliance, Inc. in the U.S. and other countries. Solaris is a trademark of Sun Microsystems, Inc. Windows and Microsoft are
registered trademarks of Microsoft Corporation. UNIX is a registered trademark of The Open Group. NetBackup is a trademark of Symantec
Corporation or its affiliates in the U.S. and other countries. All other brands or products are trademarks or registered trademarks of their respective
holders and should be treated as such.
24

You might also like