You are on page 1of 428

Data Domain

System Administration
Student Guide

Education Services
February 2014
Welcome to Data Domain System Administration.

opyright © 1996, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011,2012, 2013 , 2014 EMC
Corporation. All Rights Reserved. EMC believes the information in this publication is accurate as of its publication date. The
information is subject to change without notice.
THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” EMC CORPORATION MAKES NO REPRESENTATIONS OR
WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS
IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Use, copying, and distribution of any EMC software described in this publication requires an applicable software license.
EMC2, EMC, Data Domain, RSA, EMC Centera, EMC ControlCenter, EMC LifeLine, EMC OnCourse, EMC Proven, EMC Snap,
EMC SourceOne, EMC Storage Administrator, Acartus, Access Logix, AdvantEdge, AlphaStor, ApplicationXtender,
ArchiveXtender, Atmos, Authentica, Authentic Problems, Automated Resource Manager, AutoStart, AutoSwap,
AVALONidm, Avamar, Captiva, Catalog Solution, C-Clip, Celerra, Celerra Replicator, Centera, CenterStage, CentraStar,
ClaimPack, ClaimsEditor, CLARiiON, ClientPak, Codebook Correlation Technology, Common Information Model,
Configuration Intelligence, Configuresoft, Connectrix, CopyCross, CopyPoint, Dantz, DatabaseXtender, Direct Matrix
Architecture, DiskXtender, DiskXtender 2000, Document Sciences, Documentum, elnput, E-Lab, EmailXaminer,
EmailXtender, Enginuity, eRoom, Event Explorer, FarPoint, FirstPass, FLARE, FormWare, Geosynchrony, Global File
Virtualization, Graphic Visualization, Greenplum, HighRoad, HomeBase, InfoMover, Infoscape, Infra, InputAccel, InputAccel
Express, Invista, Ionix, ISIS, Max Retriever, MediaStor, MirrorView, Navisphere, NetWorker, nLayers, OnAlert, OpenScale,
PixTools, Powerlink, PowerPath, PowerSnap, QuickScan, Rainfinity, RepliCare, RepliStor, ResourcePak, Retrospect, RSA, the
RSA logo, SafeLine, SAN Advisor, SAN Copy, SAN Manager, Smarts, SnapImage, SnapSure, SnapView, SRDF, StorageScope,
SupportMate, SymmAPI, SymmEnabler, Symmetrix, Symmetrix DMX, Symmetrix VMAX, TimeFinder, UltraFlex, UltraPoint,
UltraScale, Unisphere, VMAX, Vblock, Viewlets, Virtual Matrix, Virtual Matrix Architecture, Virtual Provisioning, VisualSAN,
VisualSRM, Voyence, VPLEX, VSAM-Assist, WebXtender, xPression, xPresso, YottaYotta, the EMC logo, and where
information lives, are registered trademarks or trademarks of EMC Corporation in the United States and other countries.

All other trademarks used herein are the property of their respective owners.

© Copyright 2014 EMC Corporation. All rights reserved. Published in the USA.

Revision Date: February 2014

Copyright © 2014 EMC Corporation. All rights reserved Module 0: Course Introduction 1
Copyright © 2014 EMC Corporation. All rights reserved Module 0: Course Introduction 2
Copyright © 2014 EMC Corporation. All rights reserved Module 0: Course Introduction 3
Copyright © 2014 EMC Corporation. All rights reserved Module 0: Course Introduction 4
Copyright © 2014 EMC Corporation. All rights reserved Module 0: Course Introduction 5
Copyright © 2014 EMC Corporation. All rights reserved Module 0: Course Introduction 6
Copyright © 2014 EMC Corporation. All rights reserved Module 0: Course Introduction 7
Copyright © 2014 EMC Corporation. All rights reserved Module 0: Course Introduction 8
Copyright © 2014 EMC Corporation. All rights reserved Module 0: Course Introduction 9
Copyright © 2014 EMC Corporation. All rights reserved Module #: Module Name 10
This module focuses on Data Domain core technologies. It includes the following lessons:
• Data Domain Overview
• Deduplication Basics
• EMC Data Domain Stream-Informed Segment Layout (SISL™) Scaling Architecture Overview
• EMC Data Domain Data Invulnerability Architecture (DIA) Overview
• EMC Data Domain File Systems Introduction
• EMC Data Domain Protocols Overview
• EMC Data Domain Data Paths Overview
• EMC Data Domain Administration Interfaces

This module also includes knowledge checks and a lab, which enable you to test your knowledge.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 1
This lesson is an introduction to the EMC Data Domain appliance. The first topic answers the
question: What is a Data Domain system?

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 2
EMC Data Domain storage systems are traditionally used for disk backup, archiving, and disaster
recovery. An EMC Data Domain system can also be used for online storage with additional features
and benefits.

A Data Domain system can connect to your network via Ethernet or Fibre Channel connections.

Data Domain systems use low-cost Serial Advanced Technology Attachment (SATA) disk drives and
implement a redundant array of independent disks (RAID) 6 in the software. RAID 6 is block-level
striping with double distributed parity.

Most Data Domain systems have a controller and multiple storage units.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 3
EMC has several hardware offerings to meet a variety of environments including:
• Small enterprise data centers and remote offices
• Midsized enterprise data centers
• Enterprise data centers
• Large enterprise data centers
• EMC Data Domain Expansion Shelves

Visit the Data Domain Hardware page on http://www.emc.com/ for specific models and
specifications.
http://www.emc.com > Products and Solutions > Backup and Recovery > EMC Data Domain >
Hardware

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 4
The latest Data Domain Operating System (DD OS) has several features and benefits, including:
• Support for leading backup, file archiving, and email archiving applications
• Simultaneous use of VTL, CIFS, NFS, NDMP, and EMC Data Domain Boost
• Inline write/read verification, continuous fault detection, and healing
• Conformance with IT governance and regulatory compliance standards for archived data

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 5
This lesson covers deduplication, which is an important technology that improves data storage by
providing extremely efficient data backups and archiving. This lesson also covers the different types
of deduplication (inline, post-process, file-based, block-based, fixed-length, and variable-length)
and the advantages of each type. The last topic in this lesson covers Data Domain deduplication
and its advantages.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 6
Deduplication is similar to data compression, but it looks for redundancy of large sequences of
bytes. Sequences of bytes identical to those previously encountered and stored are replaced with
references to the previously encountered data.

This is all hidden from users and applications. When the data is read, the original data is provided
to the application or user.

Deduplication performance is dependent on the amount of data, bandwidth, disk speed, CPU, and
memory or the hosts and devices performing the deduplication.

When processing data, deduplication recognizes data that is identical to previously stored data.
When it encounters such data, deduplication creates a reference to the previously stored data, thus
avoiding storing duplicate data.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 7
Deduplication typically uses hashing algorithms.

Hashing algorithms yield a unique value based on the content of the data being hashed. This value
is called the hash or fingerprint, and is much smaller in size than the original data.

Different data contents yield different hashes; each hash can be checked against previously stored
hashes.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 8
In file-based deduplication, only the original instance of a file is stored. Future identical copies of
the file use a small reference to point to the original file content. File-based deduplication is
sometimes called single-instance storage (SIS).

In this example, eight files are being deduplicated. The blue files are identical, but each has its own
copy of the file content. The grey files also have their own copy of identical content. After
deduplication there are still eight files. The blue files point to the same content, which is stored
only once on disk. This is similar for the grey files. If each file is 20 megabytes, the file-based
deduplication has reduced the storage required from 160 megabytes to 40.

File-based deduplication enables storage savings. It can be combined with compression (a way to
transmit the same amount of data in fewer bits) for additional storage savings. It is popular in
desktop backups. It can be more effective for data restores. It doesn’t need to re-assemble files. It
can be included in backup software, so an organization doesn’t have to depend on a vendor disk.

File-based deduplication results are often not as great as with other types of deduplication (such as
block- and segment-based deduplication). The most important disadvantage is there is no
deduplication with previously backed up files if the file is modified.

File-based deduplication stores an original version of a file and creates a digital signature for it
(such as SHA1, a standard for digital signatures). Future exact copy iterations of the file are pointed
to the digital signature rather than being stored.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 9
Fixed-length segment deduplication (also called block-based deduplication or fixed-segment
deduplication) is a technology that reduces data storage requirements by comparing incoming data
segments (also called fixed data blocks or data chunks) with previously stored data segments. It
divides data into a single, fixed length (for example, 4 KB, 8 KB, 12 KB).

Fixed-length segment deduplication reads data and divides it into fixed-size segments. These
segments are compared to other segments already processed and stored. If the segment is
identical to a previous segment, a pointer is used to point to that previous segment.

In this example, the data stream is divided into a fixed length of four units. Small pointers to the
common content are assembled in the correct order to represent the original data. Each unique
data element is stored only once.

For data that is identical (does not change), fixed-length segment deduplication reduces storage
requirements.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 10
When data is altered the segments shift, causing more segments to be stored. For example, when
you add a slide to a Microsoft PowerPoint deck, all subsequent blocks in the file are rewritten and
are likely to be considered as different from those in the original file, so the deduplication effect is
less significant. Smaller blocks get better deduplication than large ones, but it takes more resources
to deduplicate.

In backup applications, the backup stream consists of many files. The backup streams are rarely
entirely identical even when they are successive backups of the same file system. A single addition,
deletion, or change of any file changes the number of bytes in the new backup stream. Even if no
file has changed, adding a new file to the backup stream shifts the rest of the backup stream. Fixed-
sized segment deduplication backs up large numbers of segments because of the new boundaries
between the segments.

Many hardware and software deduplication products use fixed-length segments for deduplication.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 11
Variable-length segment deduplication evaluates data by examining its contents to look for the
boundary from one segment to the next. Variable-length segments are any number of bytes within
a range determined by the particular algorithm implemented.

Unlike fixed-length segment deduplication, variable-length segment deduplication uses the content
of the stream to divide the backup or data stream into segments based on the contents of the data
stream.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 12
When you apply variable-length segmentation to a data sequence, deduplication uses variable data
segments when it looks at the data sequence. In this example, byte A is added to the beginning of
the data. Only one new segment needs to be stored, since the data defining boundaries between
the remaining data were not altered.

Eventually variable-length segment deduplication will find the segments that have not changed,
and backup fewer segments than fixed-size segment deduplication. Even for storing individual files,
variable length segments have an advantage. Many files are very similar to, but not identical to,
other versions of the same file. Variable length segments will isolate the changes, find more
identical segments, and store fewer segments than fixed-length deduplication.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 13
With post-process deduplication, files are written to disk first, then they are scanned and
compressed.
Post-process deduplication should never interfere with the incoming backup data speed.
Post-process deduplication requires more I/O. It writes new data to disk and then reads the new
data before it checks for duplicates. It requires an additional write to delete the duplicate data and
another write to update the hash table. If it can’t determine whether a data segment is duplicate or
new, it requires another write (this happens about 5% of the time). It requires more disk space to:
• Initially capture the data.
• Store multiple pools of data.
• Provide adequate performance by distributing the data over a large number of drives.
Post-process deduplication is run as a separate processing task and could lengthen the time
needed to fully complete the backup.
In post-process deduplication, files are first written to disk in their entirety (they are buffered to a
large cache). After the files are written, the hard drive is scanned for duplicates and compressed. In
other words, with post-process deduplication, deduplication happens after the files are written to
disk.
With post-process deduplication, a data segment enters the appliance (as part of a larger stream of
data from a backup), and it is written to disk in its entirety. Then a separate process (running
asynchronously and possibly from another appliance accessing the same disk) reads the block of
data to determine if it is a duplicate. If it is a duplicate, it is deleted and replaced with a pointer. If it
is new, it is stored.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 14
With Data Domain inline deduplication, incoming data is examined as soon as it arrives to
determine if a segment (or block, or chunk) is new or unique or a duplicate of a segment previously
stored. Inline deduplication occurs in RAM before the data is written to disk. Around 99% of data
segments are analyzed in RAM without disk access.
In some cases, an inline deduplication process will temporarily store a small amount of data on disk
before it is analyzed. A very small amount of data is not identified immediately as either unique or
redundant. That data is stored to disk and examined again later against the previously stored data.

The process is shown in this slide, as follows:


• Inbound segments are analyzed in RAM.
• If a segment is redundant, a reference to the stored segment is created.
• If a segment is unique, it is compressed and stored.

Inline deduplication requires less disk space than post-process deduplication. There is less
administration for an inline deduplication process, as the administrator does not need to define
and monitor the staging space.

Inline deduplication analyzes the data in RAM, and reduces disk seek times to determine if the new
data must be stored.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 15
When the deduplication occurs close to where data is created, it is often referred to as source-
based deduplication, whereas when it occurs near where the data is stored, it is commonly called
target-based deduplication.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 16
EMC Data Domain Global Compression™ is the EMC Data Domain trademarked name for global
compression, local compression, and deduplication.

Global compression equals deduplication. It identifies previously stored segments and cannot be
turned off.

Local compression compresses segments before writing them to disk. It uses common, industry-
standard algorithms (for example, lz, gz, and gzfast). The default compression algorithm used by
Data Domain systems is lz.

Local compression is similar to zipping a file to reduce the file size. Zip is a file format used for data
compression and archiving. A zip file contains one or more files that have been compressed, to
reduce file size, or stored as is. The zip file format permits a number of compression algorithms.
Local compression can be turned off.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 17
This lesson covers EMC Data Domain SISL™ Scaling Architecture.

EMC Data Domain SISL™ Scaling Architecture is also called:


• Stream-Informed Segment Layout (SISL) scaling architecture
• SISL scaling architecture
• SISL architecture
• SISL technology

SISL architecture helps to speed up Data Domain systems.

In this lesson, you learn more about SISL architecture, its advantages, and how it works.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 18
SISL architecture provides fast and efficient deduplication:
• 99% of duplicate data segments are identified inline in RAM before they are stored to disk.
• System throughput increases directly as CPU performance increases.
• Reduces the disk footprint by minimizing disk access.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 19
SISL does the following:
1. Segment
The data is broken into variable-length segments.
2. Fingerprint
Each segment is given a fingerprint, or hash, for identification.
3. Filter
The summary vector and segment locality techniques identify 99% of the duplicate
segments in RAM, inline, before storing to disk. If a segment is a duplicate, it is referenced
and discarded. If a segment is new, the data moves on to step 4.
4. Compress
New segments are grouped and compressed using common algorithms: lz, gz, gzfast (lz by
default).
5. Write
Writes data (segments, fingerprints, metadata and logs) to containers, and containers are
written to disk.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 20
This lesson covers EMC Data Domain Data Invulnerability Architecture (DIA), which is an important
EMC Data Domain technology that provides safe and reliable storage.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 21
Data Invulnerability Architecture (DIA) is an important EMC Data Domain technology that provides
safe and reliable storage.

The EMC Data Domain operating system (DD OS) is built for data protection. Its elements comprise
an architectural design whose goal is data invulnerability. Four technologies within the DIA fight
data loss:
• End-to-end verification
• Fault avoidance and containment
• Continuous fault detection and healing
• File system recoverability

DIA helps to provide data integrity and recoverability and extremely resilient and protective disk
storage. This keeps data safe.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 22
The end-to-end verification check verifies all file system data and metadata. The end-to-end
verification flow is shown on this slide.
If something goes wrong, it is corrected through self-healing and the system alerts to back up
again.

Since every component of a storage system can introduce errors, an end-to-end test is the simplest
way to ensure data integrity. End-to-end verification means reading data after it is written and
comparing it to what was sent do disk, proving that it is reachable through the file system to disk,
and proving that data is not corrupted.

When the DD OS receives a write request from backup software, it computes a huge checksum
over the constituent data. After analyzing the data for redundancy, it stores the new data segments
and all of the checksums. After the I/O has selected a backup and all data is synced to disk, the DD
OS verifies that it can read the entire file from the disk platter and through the Data Domain file
system, and that the checksums of the data read back match the checksums of the written data.

This ensures that the data on the disks is readable and correct and that the file system metadata
structures used to find the data are also readable and correct. This confirms that the data is correct
and recoverable from every level of the system. If there are problems anywhere, for example if a
bit flips on a disk drive, it is caught. Mostly, a problem is corrected through self-healing. If a
problem can’t be corrected, it is reported immediately, and a backup is repeated while the data is
still valid on the primary store.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 23
Data Domain systems are equipped with a specialized log-structured file system that has important benefits.
• New data never overwrites existing data. (The system never puts existing data at risk.)
Traditional file systems often overwrite blocks when data changes, and then use the old block address. The Data
Domain file system writes only to new blocks. This isolates any incorrect overwrite (a software bug problem) to
only the newest backup data. Older versions remain safe.
As shown in this slide, the container log never overwrites or updates existing data. New data is written to new
containers. Old containers and references remain in place and safe even when software bugs or hardware faults
occur when new backups are stored.

• There are fewer complex data structures.


In a traditional file system, there are many data structures (for example, free block bit maps and reference counts)
that support fast block updates. In a backup application, the workload is primarily sequential writes of new data.
Because a Data Domain system is simpler, it requires fewer data structures to support it. As long as the Data
Domain system can keep track of the head of the log, new writes never overwrite old data. This design simplicity
greatly reduces the chances of software errors that could lead to data corruption.

• The system includes non-volatile RAM (NVRAM) for fast, safe restarts.
The system includes a non-volatile RAM (NVRAM) write buffer into which it puts all data not yet safely on disk. The
file system leverages the security of this write buffer to implement a fast, safe restart capability.
The file system includes many internal logic and data structure integrity checks. If a problem is found by one of
these checks, the file system restarts. The checks and restarts provide early detection and recovery from the kinds
of bugs that can corrupt data. As it restarts, the Data Domain file system verifies the integrity of the data in the
NVRAM buffer before applying it to the file system and thus ensures that no data is lost due to a power outage.
For example, in a power outage, the old data could be lost and a recovery attempt could fail. For this reason, Data
Domain systems never update just one block in a stripe. Following the no-overwrite policy, all new writes go to
new RAID stripes, and those new RAID stripes are written in their entirety. The verification-after-write ensures
that the new stripe is consistent (there are no partial stripe writes). New writes don’t put existing backups at risk.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 24
Continuous fault detection and healing provide an extra level of protection within the Data Domain
operating system. The DD OS detects faults and recovers from them continuously. Continuous fault
detection and healing ensures successful data restore operations.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 25
This is the flow for continuous fault detection and healing:
1. The Data Domain system periodically rechecks the integrity of the RAID stripes and
container logs.
2. The Data Domain system uses RAID system redundancy to heal faults. RAID 6 is the
foundation for Data Domain systems continuous fault detection and healing. Its dual-parity
architecture offers advantages over conventional architectures, including RAID 1 (mirroring),
RAID 3, RAID 4 or RAID 5 single-parity approaches.

RAID 6:
 Protects against two disk failures.
 Protects against disk read errors during reconstruction.
 Protects against the operator pulling the wrong disk.
 Guarantees RAID stripe consistency even during power failure without reliance on
NVRAM or an uninterruptable power supply (UPS).
 Verifies data integrity and stripe coherency after writes.

By comparison, after a single disk fails in other RAID architectures, any further
simultaneous disk errors cause data loss. A system whose focus is data protection must
include the extra level of protection that RAID 6 provides.

3. During every read, data integrity is re-verified.


4. Any errors are healed as they are encountered.

To ensure that all data returned to the user during a restore is correct, the Data Domain file
system stores all of its on-disk data structures in formatted data blocks. These are self-
identifying and covered by a strong checksum. On every read from disk, the system first
verifies that the block read from disk is the block expected. It then uses the checksum to
verify the integrity of the data. If any issue is found, it asks RAID 6 to use its extra level of
redundancy to correct the data error. Because the RAID stripes are never partially updated,
their consistency is ensured and thus so is the ability to heal an error when it is discovered.

Continuous error detection works well for data being read, but it does not address issues
with data that may be unread for weeks or months before being needed for a recovery. For
this reason, Data Domain systems actively re-verify the integrity of all data every week in an
ongoing background process. This scrub process finds and repairs defects on the disk before
they can become a problem.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 26
The EMC Data Domain Data Invulnerability Architecture (DIA) file system recovery is a feature that
reconstructs lost or corrupted file system metadata. It includes file system check tools.
If a Data Domain system does have a problem, DIA file system recovery ensures that the system is
brought back online quickly.
This slide shows DIA file system recovery:
• Data is written in a self-describing format.
• The file system can be recreated by scanning the logs and rebuilding it from metadata
stored with the data.

In a traditional file system, consistency is not checked. Data Domain systems check through initial
verification after each backup to ensure consistency for all new writes. The usable size of a
traditional file system is often limited by the time it takes to recover the file system in the event of
some sort of corruption.
Imagine running fsck on a traditional file system with more than 80 TB of data. The reason the
checking process can take so long is the file system needs to sort out the locations of the free
blocks so new writes do not accidentally overwrite existing data. Typically, this entails checking all
references to rebuild free block maps and reference counts. The more data in the system, the
longer this takes.

In contrast, since the Data Domain file system never overwrites existing data and doesn’t have
block maps and reference counts to rebuild, it has to verify only the location of the head of the log

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 27
to safely bring the system back online and restore critical data.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 27
This lesson covers the Data Domain file system. The Data Domain file system includes:
• ddvar (Administrative files)
• MTrees (File Storage)

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 28
Data Domain system administrative files are stored in /ddvar. This directory stores system core and
log files, generated support upload bundles, compressed core files, and .rpm upgrade packages.

• The NFS directory is /ddvar


• The CIFS share is \ddvar

The ddvar file structure keeps administrative files separate from storage files.

You cannot rename or delete /ddvar, nor can you access all of its sub-directories, such as the core
sub-directory.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 29
The MTree (Managed Tree) file structure is the destination for deduplicated data. It is also the root directory
for deduplicated data. It comes pre-configured for NFS export as /backup. You configure directory export
levels to separate and organize backup files in the MTree file system.
The MTree file structure:
• Uses compression.
• Implements data integrity.
• Reclaims storage space with file-system cleaning. You will learn more about file-system cleaning
later in this course.
MTrees provide more granular space management and reporting. This allows for finer management of
replication, snapshots, and retention locking. These operations can be performed on a specific MTree rather
than on the entire file system. For example, you can configure directory export levels to separate and
organize backup files.
Although a Data Domain system supports a maximum of 100 MTrees, system performance might degrade
rapidly if more than 14 MTrees are actively engaged in read or write streams (With DD OS 5.3 and 5.4 the
DD990, DD890, and DD880 series appliances will support up to 32 active MTrees). The degree of
degradation depends on overall I/O intensity and other file-system loads. For optimum performance, you
should contain the number of simultaneously active MTrees to a maximum of 14 or 32 depending on which
model is used. Whenever possible, it is best to aggregate operations on the same MTree into a single
operation.
You can add subdirectories to MTree directories. You cannot add anything to the /data directory. You can
change only the col1 subdirectory. The backup MTree (/data/col1/backup) cannot be deleted or renamed. If
MTrees are added, they can be renamed and deleted. You can replicate directories under /backup.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 30
This lesson covers Data Domain protocols.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 31
Five protocols can be used to connect to a Data Domain appliance:
• NFS
This protocol allows Network File System (NFS) clients access to Data Domain system
directories and MTrees.
• CIFS
This protocol allows Common Internet File System (CIFS) clients access to Data Domain
system directories and MTrees.
• VTL
The virtual tape library (VTL) protocol enables backup applications to connect to and
manage Data Domain system storage as if it were a tape library. All of the functionality
generally supported by a physical tape library is available with a Data Domain system
configured as a VTL. The movement of data from a system configured as a VTL to a physical
tape library is managed by backup software (not by the Data Domain system). The VTL
protocol is used with Fibre Channel (FC) networking.
• DD Boost
The DD Boost protocol enables backup servers to communicate with storage systems
without the need for Data Domain systems to emulate tape. There are two components to
DD Boost: one component that runs on the backup server and another component that
runs on a Data Domain system.
• NDMP
If the VTL communication between a backup server and a Data Domain system is through
NDMP (Network Data Management Protocol), no Fibre Channel (FC) is required. When you
use NDMP, all initiator and port functionality does not apply.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 32
This lesson covers Data Domain data paths, which include NFS, CIFS, DD Boost, NDMP, and VTL over
Ethernet or Fibre Channel.

This lesson also covers where a Data Domain system fits into a typical backup environment.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 33
Data Domain systems connect to backup servers as storage capacity to hold large collections of backup data.
This slide shows how a Data Domain system integrates non-intrusively into an existing storage environment.
Often a Data Domain system is connected directly to a backup server. The backup data flow from the clients
is simply redirected to the Data Domain device instead of to a tape library.
Data Domain systems integrate non-intrusively into typical backup environments and reduce the amount of
storage needed to back up large amounts of data by performing deduplication and compression on data
before writing it to disk. The data footprint is reduced, making it possible for tapes to be partially or
completely replaced.
Depending on an organization’s policies, a tape library can be either removed or retained.
An organization can replicate and vault duplicate copies of data when two Data Domain systems have the
Data Domain Replicator software option enabled.
One option (not shown) is that data can be replicated locally with the copies stored onsite. The smaller data
footprint after deduplication also makes WAN vaulting feasible. As shown in the slide, replicas can be sent
over the WAN to an offsite disaster recovery (DR) location.
WAN vaulting can replace the process of rotating tapes from the library and sending the tapes to a vault by
truck.
If an organization’s policies dictate that tape must still be made for long-term archival retention, data can
flow from the Data Domain system back to the server and then to a tape library.
Often the Data Domain system is connected directly to the backup server. The backup data flow is
redirected from the clients to the Data Domain system instead of to tape. If tape needs to be made for long-
term archival retention, data flows from the Data Domain system back to the server and then to tape,
completing the same flow that the backup server was doing initially. Tapes come out in the same standard
backup software formats as before and can go off-site for long-term retention. If a tape must be retrieved, it
goes back into the tape library, and the data flows back through the backup software to the client that

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 34
needs it.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 34
A data path is the path that data travels from the backup (or archive) servers to a Data Domain
system.

Ethernet supports the NFS, CIFS, FTP, NDMP, and DD Boost protocols that a Data Domain system
uses to move data.

In the data path over Ethernet (a family of computer networking technologies), backup and archive
servers send data from clients to Data Domain systems on the network via the TCP(UDP)/IP (a set
of communication protocols for the internet and other networks).

You can also use a direct connection between a dedicated port on the backup or archive server and
a dedicated port on the Data Domain system. The connection between the backup (or archive)
server and the Data Domain system can be Ethernet or Fibre Channel, or both if needed. This slide
shows the Ethernet connection.

When Data Domain Replicator is licensed on two Data Domain systems, replication is enabled
between the two systems. The Data Domain systems can be either local, for local retention, or
remote, for disaster recovery. Data in flight over the WAN can be secured using VPN. Physical
separation of the replication traffic from backup traffic can be achieved by using two separate
Ethernet interfaces on a Data Domain system. This allows backups and replication to run
simultaneously without network conflicts. Since the Data Domain OS is based on Linux, it needs
additional software to work with CIFS. Samba software enables CIFS to work with the
Data Domain OS.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 35
A data path is the path that data travels from the backup (or archive) servers to a Data Domain
system.

Fibre Channel supports the VTL and DD Boost protocols that a Data Domain system uses to move
data.

If the Data Domain virtual tape library (VTL) option is licensed, and a VTL FC HBA is installed on the
Data Domain system, the system can be connected to a Fibre Channel system attached network
(SAN). The backup or archive server sees the Data Domain system as one or multiple VTLs with up
to 512 virtual linear tape-open LTO-1, LTO-2, LTO-3 , or LTO-4 tape drives and 20,000 virtual slots
across up to 100,000 virtual cartridges.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 36
This lesson covers Data Domain administration interfaces, which include the System Manager,
which is the graphical user interface (GUI), and the command line interface (CLI).

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 37
The EMC Data Domain command line interface (CLI) enables you to manage Data Domain systems.

You can do everything from the CLI that you can do from the System Manager.

After the initial configuration, use the SSH or Telnet (if enabled) utilities to access the system
remotely and open the CLI.

The DD OS 5.4 Command Reference Guide provides information for using the commands to
accomplish specific administration tasks. Each command also has an online help page that gives the
complete command syntax. Help pages are available at the CLI using the help command. Any Data
Domain system command that accepts a list (such as a list of IP addresses) accepts entries
separated by commas, by spaces, or both.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 38
With the Data Domain System Manager (formerly the Data Domain Enterprise Manager), you can manage
one or more Data Domain systems. You can monitor and add systems from the System Manager. (To add a
system you need a sysadmin password.) You can also view cumulative information about the systems you’re
monitoring.
A Data Domain system should be added to, and managed by, only one System Manager.
You can access the System Manager from many browsers:
• Microsoft Internet Explorer™
• Google Chrome™
• Mozilla Firefox™
The Summary screen presents a status overview of, and cumulative information for, all managed systems in
the DD Network devices list and summarizes key operating information. The System Status, Space Usage,
and Systems panes provide key factors to help you recognize problems immediately and to allow you to drill
down to the system exhibiting the problem.

The tally of alerts and charts of disk space that the System Manager presents enables you to quickly spot
problems.
Click the plus sign (+) next to the DD Network icon in the sidebar to expose the systems being managed by
the System Manager.

The System Manager includes tabs to help you navigate your way through administrative tasks. To access
the top- and sub-level tabs, shown in this slide, you must first select a system. In the lower pane on the
screen, you can view information about the system you selected. In this slide, a system has been selected,
and you can view details about it.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 39
This lab covers the lab environment setup required for this class.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 40
This lab covers the steps necessary to access a Data Domain system.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 41
EMC Data Domain storage systems are traditionally used for disk backup, archiving, and disaster
recovery with high-speed, inline deduplication. An EMC Data Domain system can also be used for
online storage with additional features and benefits.

A Data Domain system can connect to your network via Ethernet or Fibre Channel connections.
With an Ethernet connection, the system can be accessed using the NDMP, DD Boost, CIFS and NFS
protocols. The Fibre Channel connection supports the VTL protocol.

EMC Data Domain implements deduplication in a special hardware device. Most Data Domain
systems have a controller and multiple storage units.

Data Domain systems use low-cost Serial Advanced Technology Attachment (SATA) disk drives and
implement a redundant array of independent disks (RAID) 6 in the software. RAID 6 is block-level
striping with double distributed parity. Data Domain systems use non-volatile random access
memory (NVRAM) to protect unwritten data. NVRAM is used to hold data not yet written to disk.
Holding data like this ensures that data is not lost in a power outage.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 42
Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 43
Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 44
This module covers basic administrative tasks on a Data Domain system. It includes the
following lessons:
• Verifying Hardware
• Managing System Access
• Introduction to Monitoring a Data Domain System
• Licensed Features
• Upgrading a Data Domain System

Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 1
As part of initially setting up your Data Domain system, you should verify that your hardware
is installed and configured correctly. This lesson covers verifying your hardware.

Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 2
The initial configuration of the Data Domain system can be done using the config setup
command, in the CLI, or the Data Domain System Manager Configuration Wizard.

The System Manager Configuration Wizard provides a graphical user interface (GUI) that
includes configuration options. After a network connection is configured, you can use the
System Manager Configuration Wizard to modify or add configuration data. The
Configuration Wizard performs an “initial” configuration—it does not cover all configuration
options; it configures what is needed for the most basic system setup. After the initial
configuration, you can use the System Manager or CLI commands to change or update the
configuration.

The command line version of the configuration wizard covers the following areas: Licenses,
Network, File System, System, CIFS, NFS, DD Boost, and VTL. The GUI version covers Licenses,
GDA (Global Deduplication Array), Network, File System, System, CIFS, NFS, DD Boost, and
VTL.

Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 3
You can configure or skip any section, but all sections must be done in order. After
completing the Configuration Wizard, reboot the Data Domain system.

Note: The file system configuration is not described here. Default values are acceptable to
most sites.

To launch the System Manager Configuration Wizard:


1. From the System Manager, click Maintenance.
2. Click the More Tasks menu.
3. Select Launch Configuration Wizard.
4. Follow the Configuration Wizard prompts.
You must follow the configuration prompts. You can’t select an item to configure from
the left navigation pane. You are prompted to submit your configuration changes as
you move through the wizard. You can also quit the wizard during your configuration.

You can also use the config setup command on a single node or in a GDA to change
configuration settings for the system, network, file system, CIFS, NFS, and licenses. Press
Enter to cycle through the selections. You will be prompted to confirm any changes. Choices
include Save, Cancel, and Retry:

# config setup
Enter essential configuration values

Note: This command option is unavailable on systems using Retention Lock Compliance. Use
the System Manager to change configuration settings.

Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 4
After your Data Domain system is installed, you should verify that you have the correct
model number, DD OS version, and serial number to ensure that they match what you
ordered.
You can also use the system show command using the command line interface (CLI) to view
system options.
# system show modelno
Display the hardware model number of a Data Domain system.

# system show version


Display the Data Domain OS version and build identification number.

# system show uptime


Display the file system uptime, the time since the last reboot, the number of users,
and the average load.
# system show serialno
Display the system serial number.

# system show all


Show all system information.

You can also view this information in the Data Domain System Manager by selecting

Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 5
Maintenance > System.

Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 5
After your Data Domain system is installed, you should verify that your storage is operational.

From the command line, you can use the storage show and disk show commands to display
information about file system storage.

# storage show {all | summary | tier {active | archive}}


Display information about file system storage. All users may run this command
option.

Output includes the number of disk groups working normally and the number of
degraded disk groups. Details on disk groups undergoing, or queued for,
reconstruction, are also shown when applicable. The abbreviation N/A in the column
Shelf Capacity License Needed indicates the enclosure does not require a capacity
license, or that part of the enclosure is within a tier and the required capacity license
for the entire enclosure has been accounted for.

Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 6
# disk show state
Display state information about all disks in an enclosure (a Data Domain system or an
attached expansion shelf), or LUNs in a Data Domain gateway system using storage
area network (SAN) storage.

Columns in the output display the disk state for each slot number by enclosure ID, the
total number of disks by disk state, and the total number of disks.

If a RAID disk group reconstruction is underway, columns for the disk identifier,
progress, and time remaining are also shown.

# disk show hardware


Display disk hardware information.

# disk show failure-history


Display a list of serial numbers of failed disks in the Data Domain system.

In the System Manager, the Hardware > Storage tab provides a way of organizing the Data
Domain system storage so disks can be viewed by usage type (Active, Archive, Failed, and so
on), operational status, and location. This includes internal system storage and systems
configured with external disk shelves. The status and inventory are shown for all enclosures,
disks, and RAID groups. The system is automatically scanned and inventoried so all storage is
shown here.

The status of a storage system can be:


• Normal: System operational (green). All disks in the system are in good condition.
• Warning: System operational (yellow). The system is operational, but there are
problems that need to be corrected. Warnings may result from a degraded RAID
group, the presence of foreign storage, or failed or absent disks.
• Error: System non-operational (red). The system is not operational.

Disks in the active tier are currently marked as usable by the Data Domain file system.
Sections are organized by disks in use and disks not in use. If the optional archive feature is
installed you can expand your view of the disk use in the active tier from the Storage Status
Overview pane. You can view both disks in use and disks not in use.

Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 7
Usable enclosures are those that aren’t incorporated into the file system yet. The Usable
Enclosures section enables you to view the usable disks within the expansion shelves on a
Data Domain system. You can also view the details of individual disks including the disk ID,
model, disk size, disk count, license needed, failed disks, serial number and temperature
status.

If there are any unusable disks, whether failed, foreign or absent, they will be displayed in
this section.

• Failed: The number of failed disks.


• Foreign: The number of foreign disks. The foreign state indicates that the disk
contains valid Data Domain file system data and alerts the administrator to the
presence of this data to make sure it is attended properly. This commonly happens
during chassis swaps, or when new shelves are added to an active system.
• Absent: The number of absent disks.

The Failed/Foreign/Absent Disks section enables you to view failed, foreign, and absent
Disks. You can also view the details of individual disks, including the disk name, slot, status,
size, manufacturer/model, firmware and serial number.

Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 8
Sometimes it may be necessary to physically locate a disk on a Data Domain system. You can
locate, or beacon, a disk to easily identify where it is located in the enclosure. You can
beacon a disk from either the command line or System Manager.

The disk beacon command causes the LED that signals normal operation to flash on the
target disk.

# disk beacon <enclosure-id>.<disk-id>


Cause the LED that signals normal operation to flash on the target disk. Press Ctrl-C to
stop the flash. To check all disks in an enclosure, use the enclosure beacon command
option.

Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 9
In the System Manager, the Disks view lists all the system disks in a scrollable table with the
following information.
• Disk: The disk identifier. It can be:
• The enclosure and disk number (in the form Enclosure.Slot).
• A gateway disk (devn).
• A LUN.
• Status: The status of the disk (for example In Use, Spare).
• Manufacturer/Model The manufacturer’s model designation. The display may include
a model ID or RAID type or other information depending on the vendor string sent by
the storage array.
• Firmware: The firmware level used by the third-party physical disk storage controller.
• Serial Number: The manufacturer’s serial number for the disk.

The Disks tab enables you to see the status of all disks and details on individual disks. Use the
radio buttons to select how the disks are viewed: by all disks, or by tier, or by disk group.

To locate (beacon) a disk (for example, when a failed disk needs to be replaced):
1. Click Hardware > Storage > Disks.
2. Select a disk from the Disks table and click Beacon.
 The Beaconing Disk dialog window appears on screen.
 The LED light on the physical disk begins flashing.
3. Click Stop to stop the LED from beaconing.

Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 10
There are several commands that can be used to check different aspects of the chassis.
# enclosure show all [<enclosure>]
Show all enclosure environmentals.

# system show stats


Show system statistics for the time period since the last reboot.

# system show hardware


Display information about slots and vendors and other hardware
In the System Manager, the Hardware > Chassis tab provides a block drawing of the chassis and its
components—disks, fans, power supplies, NVRAM, CPUs, Memory, etc. The components that appear depend
on the Data Domain system model.

From here you can view the status of the following components by hovering your mouse over them:
• NVRAM
• PCI slots
• SAS
• Power supply
• PS fan
• Riser expansion
• Temperature
• Fans

Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 11
• Front and back chassis views

Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 11
Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 12
Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 13
This lesson covers user privileges, administration access, and user administration.

Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 14
To enhance security, each user can be assigned a different role. Roles enable you to restrict
system access to a set of privileges. A Data Domain system supports the following roles:
• Admin - Allows you to administer, that is, configure and monitor, the entire Data
Domain system.
• User- Allows you to monitor Data Domain systems and perform the fast copy
operation.
• Security - In addition to the user role privileges, allows you to set up security officer
configurations and manage other security officer operators.
• Backup-operator - In addition to the user role privileges, allows you to create
snapshots, import and export tapes to a VTL library and move tapes within a VTL
library.
• Data-access - Intended for DD Boost authentication, an operator with this role cannot
monitor or configure a Data Domain system.

Note: The available roles display based on the user’s role. Only the Sysadmin user can create
the first security officer. After the first security officer is created, only security officers can
create or modify other security officers. Sysadmin is the default admin user and cannot be
deleted or modified.

Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 15
In the System Manager, you can use the System Settings > Access Management > Local Users
tab, to create and manage users.

Managing users enables you to name the user, grant them privileges, make them active,
disabled or locked, and find out if and when they were disabled. You can also find out the
user’s last login location and time.

Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 16
To create new users in the System Manager, follow these steps:
1. Click the System Settings > Access Management > Local Users tabs.
The Local Users view appears.
2. Click the Create button to create a new user.
The Create User dialog box appears.
3. Enter the following information in the General Tab:
 User – The user ID or name.
 Password – The user password. Set an initial password (the user can change it
later).
 Verify Password – The user password, again.
 Role – The role assigned to the user.
4. Enter the following information in the Advanced Tab:
 Minimum Days Between Change – The minimum number of days between
password changes that you allow a user. Default is 0.
 Maximum Days Between Change – The maximum number of days between
password changes that you allow a user. Default is 99999.
 Warn Days Before Expire – The number of days to warn the users before their
password expires. Default is 7.
 Disable Days After Expire – The number of days after a password expires to
disable the user account. Default is Never.
 Disable account on the following date – Check this box and enter a date
(mm/dd/yyyy) when you want to disable this account. Also, you can click the
calendar to select a date.
5. Click OK.

To enable or disable users, follow these steps:


1. Click the System Settings > Access Management > Local Users tabs.
The Local Users view appears.
2. Click one or more user names from the list.
3. Click either the Enable or Disable button to enable or disable user accounts.
The Enable or Disable User dialog box appears.
4. Click OK and Close.

Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 17
You can also use the command line interface to create and manage users.

# user add <user> [role {admin | security | user | backup-


operator | data-access}] [min-days-between-change <days>]
[max-days-between-change <days>] [warn-days-before-expire
<days>] [disable-days-after-expire <days>] [disable-date
<date>]
Add a new user. If no role is specified, the system will default to the user role.

# user enable <user> [disable-date <date>]


Enable a user’s account

# user disable <user>


Disable a user’s account

# user show list


Show all known users

# user show active


Show the current logged in users

Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 18
As an administrator, you need to view and configure services that provide administrator and
user access to a Data Domain system. The services include:
• Telnet: Provides access to a Data Domain system through a Telnet connection.
• FTP/FTPs: Provides access to a Data Domain system through an FTP or FTPS
connection.
• HTTP/HTTPS: Provides access to a Data Domain system through an HTTP HTTPS, or
both, connection.
• SSH: Provides access to a Data Domain system through an SSH connection.
• SCP: provides access to securely copy files to and from a Data Domain system.

Managing administration access protocols enables you to view and manage how other
administrators and users access a Data Domain system.

Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 19
Using the command line interface (CLI) the adminaccess command can be used to allow
remote hosts to use the FTP, FTPS, Telnet, HTTP, HTTPS, SCP, and SSH administrative
protocols on the Data Domain system.

# adminaccess enable {http | https | ftp | ftps | telnet |


ssh | scp | all}
Enable administrative access

# adminaccess disable {http | https | ftp | ftps | telnet |


ssh | scp | all}
Disable administrative access

# adminaccess {http | https | ftp | ftps | telnet | ssh} add


<host-list>
Add an http, https, ftp, ftps, telnet or ssh host

# adminaccess show
Shows service status and host lists

To provide administrative access to a Data Domain system using the System Manager:
1. On the Access Management page, select the protocol you wish to configure by
placing a check in the box next to the protocol in the Services list and click the
Services button.
The Configure Access dialog box appears.
2. To enable access, click the Allow Access checkbox for the protocol(s) you wish to
enable.
For HTTP/HTTPS and SCP/SSH you can configure both protocols in the same dialog
box. However, for FTP/FTPS only one can be active at a time. Enabling FTP will
disable FTPS and vice versa.
3. Determine how the hosts connect:
 To allow complete access, click the Allow all hosts to connect radio button.
 To configure specific hosts, click the Limit Access to the following systems
radio button, and click the appropriate icon in the Allowed Hosts pane. A
hostname can be a fully qualified hostname or an IP address.
 To add a host, click the plus button (+). Enter the hostname, and click OK.
 To modify a hostname, click the checkbox of the hostname in the Hosts
list, and click the edit button (pencil). Change the hostname, and click OK.
 To remove a hostname, click the checkbox of the hostname in the Hosts
list, click the minus button (-), and click OK.
4. Click OK.

Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 20
Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 21
Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 22
This lesson covers the basics of monitoring a Data Domain system, including log file
locations, settings and alerts.

Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 23
The Data Domain system logs system status messages hourly. Log files can be bundled and
sent to Data Domain Support to provide the detailed system information that aids in
troubleshooting any system issues that may arise.

The Data Domain system log file entries contain messages from the alerts feature,
autosupport reports, and general system messages. The log directory is /ddvar/log. The
/ddvar/log folder includes files related to troubleshooting. Only relevant files or folders
are listed on this slide. The /ddvar folder contains other log files that you cannot view
through log commands or from the System Manager.

Every Sunday at 3 a.m., the Data Domain system automatically opens new log files and
renames the previous files with an appended number of 1 through 9, such as messages.1.
Each numbered file is rolled to the next number each week. For example, at the second
week, the file messages.1 is rolled to messages.2. If a file messages.2 already
existed, it rolls to messages.3. An existing messages.9 is deleted when messages.8
rolls to messages.9.

Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 24
You can use the following commands from the CLI to list and view Data Domain log files:
# log list [debug]
List top level or debug files in the log directory

# log view [<filename>]


View the system log or another log file

# log watch [<filename>]


Watch the system log or another log file in real time

You can also use the System Manager to view the system log files in /ddvar/log.
However, you cannot view the log files in the /ddvar/log/debug folder using the System
Manager.
1. Go to the Maintenance > Logs tab
2. Click the file you want to view.

To view all Data Domain system log files, you can create a /ddvar share (CIFS) or mount the
/ddvar folder (NFS).

Contents of listed log files:


• messages: Messages from the alerts, autosupport reports, and general system
messages
• space.log: Messages about disk space used by Data Domain system components and
data storage, and messages from the cleaning process
• ddfs.info: Debugging information created by the file system processes
• vtl.info: VTL information messages
• perf.log: Performance statistics used by Data Domain support staff for system tuning
• cifs.log: CIFs information messages
• join_domain.log: Active directory information messages
• ost.log: System information related to DD Boost
• messages.engineering: Engineering-level messages related to the system
• kern.info: Kernel information messages

Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 25
Autosupport reports and alert messages help solve and prevent potentially crippling Data
Domain system problems.

Autosupport alert files provide timely notification of significant issues. Autosupport sends
system administrators, as well as Data Domain Support (when configured), a daily report of
system information and consolidated status output from a number of Data Domain system
commands and entries from various log files. Included in the report are extensive and
detailed internal statistics and log information to aid Data Domain Support in identifying and
debugging system problems.

Autosupport reports are sent by email as simple text. Autosupport report distribution can be
scheduled, with the default time being 6:00 a.m.

During normal operation, a Data Domain system may produce warnings or encounter failures
whereby administrators must be informed immediately. This communication is performed by
means of an alert.

Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 26
Alerts are sent out to designated individuals or groups so appropriate actions can be taken
promptly. Alerts are sent as email in two forms: one is an immediate email for an individual
alert to subscribers set via the notification settings. The other is sent as a cumulative Daily
Alert Summary email that is logged on the Current Alerts page. These summaries are sent
daily at 8:00 a.m. Daily alert summaries update any critical events that might be occurring to
the system.

Autosupport reports and alert messages:


• Report the system status and identify potential system problems
• Provide daily notification of the system’s condition
• Send email notifications to specific recipients for quicker, targeted responses
• Supply critical system data to aid support case triage and management

Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 27
Each autosupport report can be rather large, depending on your system configuration (plain
text format).

The autosupport file contains a great deal of information on the system. The file includes
general information, such as the DD OS version, System ID, Model Number and Uptime, as
well as information found in many of the log files.

Autosupport logs are stored in the Data Domain system in /ddvar/support. Autosupport
contents include:
• system ID
• uptime information
• system command outputs
• runtime parameters
• logs
• system settings
• status and performance data
• debugging information

Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 28
By default, the full autosupport report is emailed daily at 6:00 a.m. A second report, the
autosupport alert summary, is sent daily at 8:00 a.m.

A Data Domain system can send autosupport reports, if configured, to EMC Data Domain via
SMTP to the autosupport data warehouse within EMC. Data Domain captures the above files
and stores them by Data Domain serial number in the data warehouse for reference when
needed for troubleshooting that system. Autosupport reports are also a useful resource for
Data Domain Technical Support to assist in researching any cases opened against the system.

Within the EMC Data Domain support portal (http://support.emc.com or


http://my.datadomain.com), you can access and view autosupports, alert messages, and
alert summaries sent by a Data Domain system. Only systems sending autosupport
information to Data Domain are presented through the support portal.

The autosupport function also sends alert messages to report anomalous behaviors, such as,
reboots, serious warnings, failed disk, failed power supply, and system nearly full. For more
serious issues, such as system reboots and failed hardware, these messages, can be
configured to send to Data Domain, and to automatically create cases for Support to
proactively take action on your behalf.

Autosupport requires SMTP service to be active on the Data Domain system pointing to a
valid email server over a connection path to the Internet.

Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 29
In the System Manager, you can add, delete, or edit email subscribers by clicking Configure in
the Autosupport Mailing List Subscribers area of the Autosupport tab.

Autosupport subscribers receive daily detailed reports. Using SMTP, autosupports are sent to
Data Domain Technical Support daily at 6 a.m. local time. This is the default setting.

View any of the collection of Autosupport reports in the Autosupport Report file listing by
clicking the file name. You are then prompted to download the file locally. Open the file for
reading in a standard web browser for convenience.

Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 30
You can also use the command line interface (CLI) to configure Autosupports.

# support notification enable {autosupport | alerts | all}


Enables the sending of the Daily Alert Summary and the Autosupport Report to Data
Domain Support.

# support notification disable {autosupport | alerts | all}


Disables the sending of the Daily Alert Summary and the Autosupport Report to Data
Domain Support.

# autosupport add {alert-summary | asup-detailed} emails


<email-list>
Adds entries to the email list for the Daily Alert Summary or the Autosupport Report.

# autosupport del {alert-summary | asup-detailed} emails


<email-list>
Deletes entries to the email list for the Daily Alert Summary or the Autosupport
Report.

# autosupport set schedule {alert-summary | asup-detailed}


{[{daily | <days>} <time>] | never}
Schedules the Daily Alert Summary or the Autosupport Report. For either report, the
most recently configured schedule overrides the previously configured schedule.

# autosupport show {all | alert-summary | asup-detailed}


Displays autosupport configuration.

# autosupport show schedule [alert-summary | asup-detailed]


Displays the schedules for the Daily Alert Summary and the Autosupport Report.

Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 31
Alerts are notification messages generated by a Data Domain system if an undesirable event
occurs.

A configured Data Domain system sends an alert immediately via email to any list of
subscribers. Higher-level alerts can be sent automatically to EMC Data Domain Support for
tracking.

If Data Domain Support receives a copy of the message, and depending on the nature of the
event, a support case is generated, and a Technical Support Engineer proactively tries to
resolve the issue as soon as possible.

• Alerts contain a short description of the problem.


• Alerts have a separate email distribution list.
• On receipt of an alert, Data Domain creates a support case.

Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 32
Alert notification groups allows flexibility in notifying the responsible parties who maintain
the Data Domain system. Individual subscribers can be targeted for specific types of alerts.
Instead of sending alerts to every subscriber for every type of problem, a sysadmin can
configure groups of contacts related to types of issues. For example, you can create an
environment alert notification group for team members who are responsible for data center
facilities, and power to the system. When the system creates a specific, environment-related
alert, only those recipients for that class of alerts are contacted.

System administrators can also set groups according to the seriousness of the alert.

Set alert notification groups in Status > Alerts > Notifications tab.

After a group is created, you can configure the Class Attributes pane to modify the types and
severity of the alerts this group should receive. In the Subscribers pane, you can modify a list
of recipient email addresses belonging to this group.

You can also use the command line interface (CLI) to configure autosupports.

# alerts notify-list create


Creates a notification list and subscribes to events belonging to the specified list of
classes and severity levels.

# alerts notify-list add


Adds to a notification list and subscribes to events belonging to the specified list of
classes and severity levels.

# alerts notify-list del


Deletes members from a notification list, a list of classes, a list of email addresses.

# alerts notify-list destroy


Destroys a notification list

# alerts notify-list reset


Resets all notification lists to factory default

# alerts notify-list show


Shows notification lists’ configuration

# alerts notify-list test


Sends a test notification to alerts notify-list

Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 33
When troubleshooting problems, Data Domain Customer Support may ask for a support
bundle, which is a tar-g-zipped selection of log files with a README file that includes
identifying autosupport headers. To create a support bundle, use the following procedure:
1. Navigate to Maintenance > Support in the System Manager.
2. Select More Tasks > Generate Support Bundle.
3. Click the link to download the bundle.
4. Email the file to Data Domain support at support@datadomain.com.

Note: If the bundle is too large to be emailed, use the EMC/Data Domain support site to
upload the bundle.

You can also generate support bundles from the command line:

# support bundle create {files-only <file-list> | traces-


only} [and-upload [transport {http|https}]]
Compress listed files into bundle and upload if specified.

# support bundle create default [with-files <file-list>]


[and-upload [transport {http|https}]]
Compress default and listed files into bundle and upload if specified.

Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 34
The Simple Network Management Protocol (SNMP) is an open-standard protocol for
exchanging network management information, and is a part of the Transmission Control
Protocol/Internet Protocol (TCP/IP) protocol suite. SNMP provides a tool for network
administrators to monitor and manage network-attached devices, such as Data Domain
systems, for conditions that warrant administrator attention.

In typical SNMP uses, one or more administrative computers, called managers, have the task
of monitoring or managing a group of hosts or devices on a computer network. Each
managed system runs a software component called an agent that reports information via
SNMP to the manager.

Essentially, SNMP agents expose management data on the managed systems through object
IDs (OIDs). The protocol also permits active management tasks, such as modifying and
applying a new configuration, through remote modification of these variables. In the case of
Data Domain systems, active management tasks are not supported. The data contained in
the OIDs are called variables, and are organized in hierarchies. These hierarchies, and other
metadata (such as type and description of the variable), are described by Management
Information Bases (MIBs).

Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 35
When an SNMP agent residing on the Data Domain system transmits OID traps, which are
messages from the system indicating change of system state in the form of a very basic OID
code (for example, 1.3.6.1.4.1.19746.2.0.1). The management system, running the snmp
daemon, interprets the OID through the Data Domain MIB and generates the alert message
to the SNMP management console (for example, powerSupplyFailedAlarm).

DD OS supports two forms of SNMP authentication, each in a different SNMP version. In


SNMP version 2 (v2), each SNMP management host and agent belongs to an SNMP
community: a collection of hosts grouped together for administrative purposes. Deciding the
computers that should belong to the same community is generally, but not always,
determined by the physical proximity of the computers. Communities are identified by the
names you assign them. A community string can be thought of as a password shared by
SNMP management consoles and managed computers. Set hard-to-guess community strings
when you install the SNMP service. There is little security as none of the data is encrypted.

SNMP version 3 (v3) offers individual users instead of communities with related
authentication (MD5 or SHA1) and AES or DES privacy.

When an SNMP agent receives a message from the Data Domain system, the community
string or user authentication information contained in the packet is verified against the
agent's list of acceptable users or community strings. After the name is determined to be
acceptable, the request is evaluated against the agent's list of access permissions for that
community. Access can be set to read-only or read-write. System status information can be
captured and recorded for the system that the agent is monitoring.

You can integrate the Data Domain management information base into SNMP monitoring
software, such as EMC NetWorker or Data Protection Advisor. Refer to your SNMP
monitoring software administration guide for instructions on how to integrate the MIB into
your monitoring software and for recommended practices. SNMP management systems
monitor the system by maintaining an event log of reported traps.

You can download the Management Information Base (MIB) file from the System Manager by
navigating to System Settings > General Configuration > SNMP and clicking the Download
MIB file button. You can also download the MIB files from the /ddvar/snmp directory via
FTP, FTPS, CIFS or NFS.

Install the MIB file according to the instructions of your management server.

The default port that is open when SNMP is enabled is port 161. Traps are sent out through
port 162.

Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 36
Configure either SNMP V3 or V2C in the same window. Follow the instructions for your
SNMP management software to ensure proper set-up and communication between the
management console and the Data Domain system.

You can also use the command line to enable and configure SNMP on a Data Domain system.

# snmp enable
Enable SNMP

# snmp add ro-community <community-string-list> [hosts


<host-list>]
Add a list of SNMP read-only community strings.

# snmp add rw-community <community-string-list> [hosts


<host-list>]
Add a list of SNMP read-write community strings.

# snmp add trap-host <host-name-list>[:port] [version {v2c |


v3}] [{community <community> | user <user>}]
Add a list of hosts to receive SNMP traps.

# snmp show config [version {v2c | v3}]


Show SNMP configuration.

# snmp show ro-communities


Show the SNMP read-only community strings.

# snmp show rw-communities


Show the SNMP read-write community strings.

# snmp show trap-hosts [version {v2c | v3}]


Show the SNMP trap receiver hosts.

# snmp status
Show SNMP status.

Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 37
# snmp user add <user-name> access {read-only | read-write}
[authentication-protocol {MD5 | SHA1} authentication-key
<auth-key> [privacy-protocol {AES | DES} privacy-key
<priv-key>]]
Add a SNMPv3 User

# snmp user show <user-name>


Display a SNMPv3 user.

Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 38
Some log messages can be sent from the Data Domain system to other systems. DD OS uses
syslog to publish log messages to remote systems.

• In a Data Domain system, the remote logging feature uses UDP port 514.
• You can configure a Data Domain system to send system messages to a remote syslog
server.
• A Data Domain system exports the following facility.priority selectors for log files. For
information on managing the selectors and receiving messages on a third-party
system, see your vendor-supplied documentation for the receiving system.
• The log host commands manage the process of sending log messages to another
system.

Syslog can be configured using only the command line interface (CLI) with the Data Domain
system.

# log host add host


Adds a system to the list that receives Data Domain system log messages.

# log host del host


Removes a system from the list that receives Data Domain system log messages.

Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 39
# log host enable
Enables sending log messages to other system.

# log host disable


Disables sending log messages to other systems.

# log host show


Displays the list of systems that receive log messages and log status (enabled or
disabled).

Configure syslog by doing the following:


• Obtain the IP address of the remote logging device receiving the Data Domain system
log information.
• Use the log command to configure remote logging.
• Ensure that UDP port 514 is open and available on the remote log device.
• Enable remote logging with the log host enable command.
• Add a syslog server using the log host add [serverIP] command.
• Check the configuration using the log host show command.
• If you need to disable the syslog for any reason, use the log host disable command.

Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 40
Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 41
Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 42
Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 43
Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 44
This lesson covers the basics of adding licensed features to, and removing optional licenses
from, a Data Domain system.

Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 45
• DD Boost - Allows a system to use the Boost interface on a Data Domain system.
• Replication - Adds the Data Domain Replicator for replication of data from one Data
Domain system to another.
• Retention Lock Governance - Protects selected files from modification and
premature deletion, that is, deletion before a specified retention period has expired.
• Retention Lock Compliance - Allows you to meet the strictest data retention
requirements from regulatory standards such as SEC17a-4.
• VTL (Virtual Tape Library) - Allows backup software to see a Data Domain system as a
tape library.
• Encryption of Data at Rest - Allows data on system drives or external storage to be
encrypted while being saved, and then locked before moving to another location.
• Expansion Storage - Allows the upgrade of capacity for the Data Domain system.
Enables either the upgrade of a 9-disk DD510/DD530 to 15 disks, or the upgrade of a
7-disk DD610/DD630 to 12 disks.
• Shelf Capacity - Allows ES30 and ES20 (purchased for use with DD OS 5.1) external
shelves to be added to the Data Domain system for additional capacity.
• DD Extended Retention (formerly DD Archiver) - Provides long-term backup
retention on the DD860 and DD990 platforms.
• Nearline - Identifies systems deployed for archive and nearline workloads.

Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 46
You can check which licenses are enabled on your Data Domain system using the System
Manager.

1. In the Navigational pane, expand the DD Network and select a system.


2. Click the System Settings > Licenses tabs.

The Feature Licenses pane appears, showing the list of license keys and features.

You can also use the command line interface (CLI) to check which licenses are enabled by
using the license show command. If the local argument is included in the option, output
includes details on local nodes only.

To add a feature license using the System Manager:


1. In the Feature Licenses pane, click Add Licenses.
The Add Licenses dialog box displays.
2. In the License Key text box, type or paste one or more license keys, each on its own
line or separated by a space or comma (and they will be automatically placed on a
new line).
3. Click Add.

Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 47
The added licenses display in the Added license list. If there are errors, they will be shown in
the error license list. Click a license with an error to edit the license, and click Retry Failed
License(s) to retry the key. Otherwise, click Done to ignore the errors and return to the
Feature Licenses page.

You can also add one or more licenses for features and storage capacity using the command
line interface (CLI). Include dashes when entering the license codes. This command option
may run on a standalone Data Domain system or on the master controller of a Global
Deduplication Array.

# license add <license-code> [<license-code> ...]

Example
# license add ABCD-DCBA-AABB-CCDD BBCC-DDAA-CCAB-AADD-DCCB-
BDAC-E5
Added "ABCD-DCBA-AABB-CCDD" : REPLICATION feature
Added "BBCC-DDAA-CCAB-AADD-DCCB-BDAC-E5" : CAPACITY-
ARCHIVE feature for 6TiB capacity ES20

To remove one or more feature licenses using the System Manager:


1. In the Feature Licenses pane, click a checkbox next to one or more licenses you wish
to remove and click Delete Selected Licenses.
2. In the Warning dialog box, verify the license(s) to delete and click OK.
The licenses are removed from the license list.

You can also use the command line interface (CLI) to delete one or more software option
licenses. In a GDA configuration, run this command on the master controller.

Security officer authorization is required to delete licenses from Retention Lock Compliance
systems only.

Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 48
You can also use the license del command to remove licenses from the command line.

Example
# license del EEFF-GGHH-JJII-LLKK MMNN-OOQP-NMPQ-PMNM STXZ-
ZDYSGSSG-BBAA
License code "EEFF-GGHH-JJII-LLKK" deleted.
License code "MMNN-OOQP-NMPQ-PMNM" deleted.
License code "STXZ-ZDYS-GSSG-BBAA" deleted.

If you need to remove all licenses at once using the command line interface (CLI) you can use
the license reset command. This command option requires security officer authorization if
removing licenses from Retention Lock Compliance systems. Licenses cannot be reset on a
Global Deduplication Array.

Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 49
Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 50
Upon completion of this module, you should be able to describe the upgrade process for a
Data Domain system.

This lesson covers the following topics:


• Preparing for a DD OS upgrade
• Using release notes to prepare for an upgrade
• Performing the upgrade process

Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 51
There are two basic release types:
1. DA (Directed Availability)
A DA release has completed all internal testing, as well as testing at selected
customer sites. A DA release is provided to a limited number of receptive customers
and is primarily used to help customers who want to start looking at new features.

DA releases are not available to all Data Domain system owners as a general
download. They can be obtained only through the appropriate EMC Data Domain
Sales or Support team approvals.
2. GA (General Availability)
A GA release is available as a download on the Data Domain Support website and is
intended for production use by all customers. Any customer running an earlier Data
Domain operating system release, GA release or non-GA release, should upgrade to
the latest GA release.

Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 52
To ensure consistency in how we introduce our software, all release types move through the
DA and GA progression in a similar fashion. This allows customers to evaluate the releases
using similar standards. Data Domain recommends that you track Data Domain OS releases
deployed in your backup environment. It is important that the backup environment run the
most current, supported releases. Minimize the number of different deployed release
versions in the same environment. As a general rule, you should upgrade to the latest GA
release of a particular release family. This ensures you are running the latest version that has
achieved our highest reliability status.

When DA status releases are made available for upgrade, carefully consider factors such as
the backup environment, the feature improvements that are made to the release, and the
potential risks of implementing releases with less customer run-time than a GA release.
Depending on these factors, it might make sense to wait until a release reaches GA status.

Any upgrade packages, regardless of where they are in the release cycle, that are available
for your organization can be downloaded from the EMC/Data Domain support site.

There is no down-grade path to a previous version of the Data Domain operating system (DD
OS). The only method to revert to a previous DD OS version is to destroy the file system and
all the data contained therein, and start with a fresh installation of your preferred DD OS.

Caution: REVERTING TO A PREVIOUS DD OS VERSION DESTROYS ALL DATA ON THE DATA


DOMAIN SYSTEM.

Before upgrading:
• Read all pertinent information contained in the release notes for the given upgrade
version.
• If you have questions or need additional information about an upgrade, contact EMC
Data Domain Support before upgrading for the best advice on how to proceed.

Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 53
It is not always essential, but it is wise, to maintain a Data Domain system with the current
versions of the OS. With the newest version of the Data Domain operating system, you can
be sure that you have access to all features and capabilities your system has to offer.

• When you add newer Data Domain systems to your backup architecture, a newer
version of DD OS is typically required to support hardware changes – such as remote-
battery NVRAM, or when adding the newer ES30 expansion shelf.
• Data Domain Support recommends that systems paired in a replication configuration
all have the same version of DD OS.
• Administrators upgrading or changing backup host software should always check the
minimum DD OS version recommended for a version of backup software in the
Backup Compatibility Guide. This guide is available in the EMC Data Domain support
portal. Often, newer versions of backup software are supported only with a newer
version of DD OS. Always use the version of the Data Domain operating system
recommended by the backup software used in your backup environment.
• No software is free of flaws, and EMC Data Domain works continuously to improve
the functionality of the DD OS. Each version release has complete Release Notes that
identify bug fixes by number and what was fixed in the version.

Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 54
An upgrade to release 5.4 can be performed only from systems using release families 5.2 or
5.3. Typically when upgrading DD OS, you should upgrade only two release families at a time
( 5.0 to 5.2, or 5.2 to 5.4). If you are more than two release families behind, contact EMC
Data Domain Support for advice on the intermediate versions to use for your stepped
upgrade.

Make sure you allocate appropriate system downtime to perform the upgrade. Set aside
enough time to shut down processes prior to the upgrade and for spot-checking the
upgraded system after completing the upgrade. The time to run an the actual upgrade
should take no longer than 45 minutes. Adding the time to shut down processes, and to
check the upgraded system, might take 90 minutes or more to complete the upgrade. Double
this time if you are upgrading more than two release families.

For replication users: Do not disable replication on either side of the replication pair. After it
is back online, replication automatically resumes service.

You should upgrade the destination (replica) before you upgrade the source Data Domain
system.

Be sure to stop any client connections before beginning the upgrade.

Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 55
When you have the new DD OS upgrade package downloaded locally, you can upload it to
the Data Domain system with the Data Domain System Manager:
1. Click Upload Upgrade Package and browse your local system until you find the
upgrade package you downloaded from the support portal.
2. Click OK.

The file transfers to the Data Domain system. The file is now in the list of available upgrade
packages.

To perform a system upgrade using the System Manager:


1. Select the upgrade package you want to use from the list of available upgrade
packages.
2. Click Perform System Upgrade.
The upgrade proceeds.

Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 56
To perform a system upgrade from the command line, upload the upgrade package via FTP,
FTPS, CIFS or NFS to the /ddvar/releases folder and use the system upgrade
[precheck] <file> command.

When the upgrade is complete, the system automatically reboots on its own. You need to
login to the Data Domain System Manager to resume administrative control of the Data
Domain system.

# system upgrade <file>


Upgrade the software from the specified <file> in the /ddvar/releases
directory. If precheck is used, it only performs checking.

# system upgrade history


Report the upgrade history.

# system upgrade status


Report the current upgrade status.

Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 57
Copyright © 2014 EMC Corporation. All rights reserved Module 2: Basic Administration 58
This module focuses on managing network interfaces. It includes the following lessons:
• Configuring Network Interfaces
• Link Aggregation
• Link Failover
• VLAN and IP Alias Interfaces

This module also includes knowledge checks and a lab, which enable you to test your
knowledge.

Copyright © 2014 EMC Corporation. All rights reserved Module 3: Managing Network Interfaces 1
This lesson covers configuring network interfaces. To do this, you need to know how to
manage network settings and routes, and how to create and configure static routes.

Copyright © 2014 EMC Corporation. All rights reserved Module 3: Managing Network Interfaces 2
In the Data Domain System Manager, navigate to Hardware > Network > Interfaces to view
and configure network settings.

The Network view provides a means to:


• Configure network interfaces so the Data Domain system is available for management
and backup activities over a network.
• Configure network interfaces to maximize throughput and be highly available.
• Name the Data Domain system in the network environment and resolve the names of
other systems in the environment.
• Isolate backup and near-line traffic in shared network environments.
• View all the network-related settings.
• Troubleshoot and diagnose network issues.

Copyright © 2014 EMC Corporation. All rights reserved Module 3: Managing Network Interfaces 3
The Interfaces table presents the following information:
• Interface: Shows the name of each interface associated with the selected Data
Domain system. Physical interfaces names start with eth. Virtual interface names
start with veth.
• Enabled: Indicates whether or not the interface is enabled. Select Yes to enable the
interface and connect it to the network. Select No to disable the interface and
disconnect it from the network.
• DHCP: Indicates if the interface is configured to use DHCP. Shows Yes, No, or N/A.
• IP Address: Shows the IP address associated with the interface. The address is used
by the network to identify the interface. If the interface is configured through DHCP,
an asterisk appears after this value.
• Netmask: Shows the netmask associated with the interface. Uses the standard IP
network mask format. If the interface is configured through DHCP, an asterisk appears
after this value.
• Link: Indicates whether or not the interface currently has a live Ethernet connection
(set to either Yes or No).
• Additional Info: Lists additional settings for the interface, such as the bonding mode.

Intelligent Platform Management Interface (IPMI)


• Yes/No: Indicates if IPMI health and management monitoring is configured for the
interface.
• View IPMI Interfaces: Links to the Maintenance > IPMI configuration tab.

Copyright © 2014 EMC Corporation. All rights reserved Module 3: Managing Network Interfaces 4
You can also use the command line interface (CLI) to configure and manage physical and
virtual interfaces, DHCP, DNS, IP addresses, and display network information and status.

# net show settings


Display Ethernet interface settings

# net show all


Display all networking information, including IPv4 and IPv6 addresses.

# net show config [<ifname>]


Display the configuration for a specific Ethernet interface.

# net show {domainname | searchdomains}


Display the domain name or search domains used for email sent by a Data Domain
system.

# net show dns


Display a list of DNS servers used by the Data Domain system. The final line in the
output shows if the servers were configured manually or by DHCP.

# net show hardware


Display Ethernet port hardware information.

# net show stats [ipversion {ipv4 | ipv6}] [all | interfaces


| listening | route | statistics]
Display network statistics.

Copyright © 2014 EMC Corporation. All rights reserved Module 3: Managing Network Interfaces 5
You can configure an Ethernet interface by the command line or by using the System
Manager.

To configure an Ethernet interface using the command line, you would use the following
commands:

# net config <ifname> {[[<ipaddr>] [netmask <mask>] [dhcp


{yes|no}]] | [<ipv6addr>]} {[autoneg] | [duplex {full |
half} speed {10|100|1000|10000}] [up | down] [mtu {<size>
| default}]
Configure an Ethernet interface

# net enable <ifname>


Enable an Ethernet interface

# net show config [<ifname>]


Display the configuration for the Ethernet interface

Copyright © 2014 EMC Corporation. All rights reserved Module 3: Managing Network Interfaces 6
To configure an Ethernet interface using the System Manager:
1. From the Navigation pane, select the Data Domain system to configure.
2. Select Hardware > Network > Interfaces tab.
3. Select an interface to configure.
4. Click Configure.
The Configure Interface dialog box appears.
5. Determine how the interface IP address is to be set:
 Use DHCP to assign the IP address. In the IP Settings pane, click the Obtain using
DHCP radio button.
 Specify the IP settings manually. In the IP Settings pane, click the Manually
configure IP Address radio button.
The IP Address and Netmask fields become active.
 Enter an IP Address.
The Internet Protocol (IP) address is the numerical label assigned to the
interface, for example, 192.168.10.23.
 Enter a Netmask address.
The netmask is the subnet portion of the IP address assigned to the interface.
The format is typically 255.255.255.###, where the ### are the values that
identify the interface.
6. Specify the speed and duplex settings.
The speed and duplex settings define the rate of data transfer through the interface.
Select one of these options:
 Autonegotiate Speed/Duplex: Select this option to allow the network interface
card to autonegotiate the line speed and duplex setting for an interface.
 Manually Configure Speed/Duplex: Select this option to manually set an
interface data transfer rate. Select the speed and duplex from the drop-down lists.
 Duplex options are Unknown, half-duplex or full-duplex.
 The speed options listed are limited to the capabilities of the hardware device.
Options are Unknown, 10Mb, 100Mb, 1000Mb, and 10Gb.
 Half-duplex is available only for 10Mb and 100Mb speeds.
 1000Mb and 10Gb line speeds require full-duplex.
 Optical interfaces require the Autonegotiate option.
 Copper interface default is 10Mb. If a copper interface is set to 1000Mb or
10Gb line speed, duplex must be full-duplex.
7. Specify the maximum transfer unit (MTU) size for the physical (Ethernet) interface.
Supported values are from 350 to 9014. For 100 Base-T and gigabit networks, 1500 is
the standard default.
8. Click Default to return the MTU setting to the default value.

Copyright © 2014 EMC Corporation. All rights reserved Module 3: Managing Network Interfaces 7
9. Ensure that all of your network components support the size set with the MTU
option.
10. Optionally, select the Dynamic DNS Registration option.
 Dynamic domain name system (DDNS) is the protocol that allows machines on a
network to communicate with, and register their IP address on, a DNS server.
 The DDNS must be registered to enable this option. Refer to “Registering a DDNS”
in the DD OS 5.4 Administration Guide for additional information. This option
disables DHCP for this interface.
11. Click Next.
The Configure Interface Settings summary page appears. The values listed reflect the
new system and interface state, which are applied when you click Finish.
12. Click Finish.
13. Click OK.

Copyright © 2014 EMC Corporation. All rights reserved Module 3: Managing Network Interfaces 8
To manage hardware settings, go to the Hardware tab, select the Network tab, then select
the Settings tab. From the Settings tab, you can view and edit the host settings, domain list,
host mappings, and DNS list.

The Settings view enables you to manage Network settings in one place without having to
execute multiple commands.

The Network view presents status and configuration information about the system Ethernet
interfaces. It contains the Interfaces view, Settings view, and Routes view.

Copyright © 2014 EMC Corporation. All rights reserved Module 3: Managing Network Interfaces 9
Use the Hardware > Network > Settings view to view and configure network settings. This
includes network parameters such as the hostname, domain name, search domains, host
mapping, and the DNS list.

• Host Settings
 Host Name: The hostname of the selected Data Domain system.
 Domain Name: The fully-qualified domain name associated with the selected
Data Domain system.
• Search Domain List
 Search Domain: A list of search domains used by the Data Domain system. The
Data Domain system applies the search domain as a suffix to the hostname.
• Hosts Mapping
 IP Address: IP address of the host to resolve.
 Host Name: Hostnames associated with the IP address.
• DNS List
 DNS IP Address: Current DNS IP addresses associated with the selected Data
Domain system. An asterisk (*) indicates the addresses were assigned through
DHCP.

Copyright © 2014 EMC Corporation. All rights reserved Module 3: Managing Network Interfaces 10
You can also view and configure network settings using the command line:

# net set {domainname <local-domain-name> | searchdomains


<search-domain-list>}
Set the domainname or searchdomains

# net set dns <ipv4-ipv6-list>


Set the DNS server list

# net set hostname <host>


Set the hostname

# net show {domainname | searchdomains}


Display the domainname or searchdomain

# net show all


Display all networking information

# net show dns


Display the DNS server list

# net show hardware


Display Ethernet port information

# net show hostname


Display the hostname

Copyright © 2014 EMC Corporation. All rights reserved Module 3: Managing Network Interfaces 11
Data Domain systems do not generate or respond to any of the network routing
management protocols (RIP, EGRP/EIGRP, and BGP) in any way. The only routing
implemented on a Data Domain system is based on the internal route table, where the
administrator may define a specific network or subnet used by a physical interface (or
interface group).

Data Domain systems use source-based routing, which means outbound network packets
that match the subnet of multiple interfaces will be routed over only the physical interface
from which they originated.

In the Routes view, you can view and manage network routes without having to execute
many commands.

To set the default gateway:


1. Click the Hardware > Network > Routes tabs.
2. Click Edit in the Default Gateway area.
The Configure Default Gateway dialog box appears.

Copyright © 2014 EMC Corporation. All rights reserved Module 3: Managing Network Interfaces 12
3. Choose how the gateway address is set. Either:
 Select the Use DHCP value radio button for setting the gateway.
The Dynamic Host Configuration Protocol (DHCP) indicates if the gateway is
configured using the value from the DHCP server.
 Or, select the Manually Configure radio button.
The gateway address box becomes available.
4. Enter the gateway address in the Gateway field.
5. Click OK.
The system processes the information and returns you to the Routes tab.
The Create Routes > Summary page appears. The values listed reflect the new
configuration.
6. Click Finish.
Progress messages display. When changes are applied, the message indicates
Completed.
7. Click OK to close the dialog box.
The new route specification is listed in the Route Spec list.

To create Static Routes:


1. From the Navigation pane, select the Data Domain system to configure.
2. Click the Hardware > Network > Routes tabs.
3. Click Create in the Static Routes area.
The Create Routes dialog box appears.
4. Select an interface to configure for the static route.
 Click the checkboxes of the interface(s) whose route you are configuring.
 Click Next.
5. Specify the Destination. Select either of the following.
The Network Address and Netmask.
 Click the Network radio button.
 Enter destination information, by providing the destination network address and
netmask.
Note: This is not the IP of any interface. The interface is selected in the initial
dialog, and it is used for routing traffic.
The hostname or IP address of the host destination.
 Click the Host radio button.
 Enter the hostname or IP address of the destination host to use for the route.
6. Optionally, change the gateway for this route.
 Click the checkbox, Specify different gateway for this route.
 Enter a gateway address in the Gateway field.

Copyright © 2014 EMC Corporation. All rights reserved Module 3: Managing Network Interfaces 13
7. Review changes, and click Next.
The Create Routes > Summary page appears. The values listed reflect the new
configuration.
8. Complete the action, and click Finish.
Progress messages display. When changes are applied, the message indicates
Completed. Click OK to close the dialog.
The new route specification is listed in the Route Spec list.

You can also use the command line to view and manage routes on a Data Domain System. An
added routing rule appears in the Kernel IP routing table and in the Data Domain system
Route Config list, a list of static routes that are reapplied at each system boot.

# route add [ipversion {ipv4 | ipv6}] <route spec>


Add a routing rule.

# route set gateway {<ipaddr> | <ipv6addr>}


Set the default gateway.

# route show config


Display the configured static routes in the Route Config list.

# route show table [ipversion {ipv4 | ipv6}]


Display all entries in the Kernel IP routing table.

Copyright © 2014 EMC Corporation. All rights reserved Module 3: Managing Network Interfaces 14
Copyright © 2014 EMC Corporation. All rights reserved Module 3: Managing Network Interfaces 15
This lesson covers link aggregation. First you learn about link aggregation. Then, you create a
virtual interface for link aggregation.

Copyright © 2014 EMC Corporation. All rights reserved Module 3: Managing Network Interfaces 16
Using multiple Ethernet network cables, ports, and interfaces (links) in parallel, link
aggregation increases network throughput, across a LAN or LANs, until the maximum
computer speed is reached.

Data processing can thus become faster than when data is sent over individual links. For
example, you can enable link aggregation on a virtual interface (veth1) to two physical
interfaces (eth0a and eth0b) in the link aggregation control protocol (LACP) mode and hash
XOR-L2. Link aggregation evenly splits network traffic across all links or ports in an
aggregation group. It does this with minimal impact to the splitting, assembling, and
reordering of out-of-order packets.

Aggregation can occur between two directly attached systems (point-to-point and physical or
virtual). Normally, aggregation is between the local system and the connected network
device or system. A Data Domain system is usually connected to a switch or router.
Aggregation is handled between the IP layer (L3 and L4) and the mac layer (L2) network
driver. Link aggregation performance is impacted by the following:
• Switch speed: Normally the switch can handle the speed of each connected link, but
it may lose some packets if all of the packets are coming from several ports that are
concentrated on one uplink running at maximum speed. In most cases, this means
you can use only one switch for port aggregation coming out of a Data Domain
system. Some network topologies allow for link aggregation across multiple switches.

Copyright © 2014 EMC Corporation. All rights reserved Module 3: Managing Network Interfaces 17
• The quantity of data the Data Domain system can process.
• Out-of-order packets: A network program must put out-of-order packets back in their
original order. If the link aggregation mode allows the packets to be sent out of order,
and the protocol requires that they be put back to the original order, the added
overhead may impact the throughput speed enough that the link aggregation mode
causing the out-of-order packets should not be used.
• The number of clients: In most cases, either the physical or OS resources cannot
drive data at multiple Gbps. Also, due to hashing limits, you need multiple clients to
push data at multiple Gbps.
• The number of streams (connections) per client can significantly impact link
utilization depending on the hashing used.
• A Data Domain system supports two aggregation methods; round robin and balance-
xor (you set it up manually on both sides).

Requirements
• Links can be part of only one group.
• Aggregation is only between two systems.
• All links in a group must have the same speed.
• All links in a group must be either half-duplex or full-duplex.
• No changes to the network headers are allowed.
• You must have a unique address across aggregation groups.
• Frame distribution must be predictable and consistent.

Copyright © 2014 EMC Corporation. All rights reserved Module 3: Managing Network Interfaces 18
To create a link aggregation virtual interface:
1. Make sure your switch supports aggregation.
2. Select the Hardware tab, then the Interfaces tab.
3. Disable the physical interface where you want to add the virtual interface by selecting
the interface and selecting No from the Enabled menu.
4. From the Create menu, select Virtual Interface.
The Create Virtual Interface dialog box appears.
5. Specify a virtual interface name in the veth text box.

Copyright © 2014 EMC Corporation. All rights reserved Module 3: Managing Network Interfaces 19
6. Enter a virtual interface name in the form vethx, where x is a unique ID (typically one
or two digits).
A typical virtual interface name with VLAN and IP alias is veth56.3999.199. The
maximum length of the full name is 15 characters. Special characters are not allowed.
Numbers must be between 0 and 9999.
 From the General tab, specify the bonding mode by selecting type from the
Bonding Type list.

In this example, aggregate is selected. The registry setting can be different from
the bonding configuration. When you add interfaces to the virtual interface, the
information is not sent to the bonding module until the virtual interface is
brought up. Until that time, the registry and the bonding driver configuration are
different. Specify a bonding mode compatible with the system requirements to
which the interfaces are directly attached. The available modes are:
 Round robin: Transmits packets in sequential order from the first available link
through the last in the aggregated group.
 Balanced: Sends data over the interfaces as determined by the selected hash
method. All associated interfaces on the switch must be grouped into an
EtherChannel (trunk).
 LACP: Is similar to Balanced, except for the control protocol that communicates
with the other end and coordinates what links, within the bond, are available. It
provides heartbeat failover.
7. Select an interface to add to the aggregate configuration by clicking the checkbox
corresponding to the interface.
8. Click Next.
The Create Virtual Interface veth name dialog box appears.

Copyright © 2014 EMC Corporation. All rights reserved Module 3: Managing Network Interfaces 20
To create a link aggregation virtual interface (continued):
9. Enter an IP address.
10. Enter a netmask address.
The netmask is the subnet portion of the IP address assigned to the interface. The format is
usually 255.255.255.XXX, where XXX is the value that identifies the interface. If you do not
specify a netmask, the Data Domain system uses the netmask format as determined by the
TCP/IP address class (A, B, C) that you are using.
11. Specify the speed and duplex options by selecting either Autonegotiate Speed/Duplex or
Manually Configure Speed/Duplex.
The combination of the speed and duplex settings defines the rate of data transfer through
the interface.
12. Select Autonegotiate Speed/Duplex to allow a NIC to auto-negotiate the line speed and
duplex setting for an interface.
13. Select Manually Configure Speed/Duplex if you want to manually set an interface data
transfer rate.
Duplex options are half-duplex or full-duplex. Speed options are limited to the capabilities of
the hardware. Ensure that all of your network components support the size set with this
option.
14. Optionally select Dynamic Registration (also called DDNS). The dynamic DNS (DDNS) protocol
enables machines on a network to communicate with and register IP addresses on a Data
Domain system DNS server. The DDNS must be registered to enable this option.

Copyright © 2014 EMC Corporation. All rights reserved Module 3: Managing Network Interfaces 21
15. Click Next.
The Create Virtual Interface Settings summary appears.
16. Ensure that the values listed are correct.
17. Click Finish.
18. Click OK.

Several commands can be used from the command line interface (CLI) to set up and
configure link aggregation on a Data Domain system:

# net aggregate add <Virtual-ifname> interfaces <physical-


ifname-list>
Enables aggregation on a virtual interface by specifying the physical interfaces and
mode. Choose the mode compatible with the requirements of the system to which
the ports are attached.

#net aggregate modify <virtual-ifname> [mode {roundrobin |


balanced hash {xor-L2 |xor-L3L4| xor-L2L3} | lacp hash
{xor-L2|xor-L3L4|xor-L2L3} [rate {fast | slow}]}] [up
{<time> | default}] [down {<time> | default}]
Changes the aggregation configuration on a virtual interface by specifying the physical
interfaces and mode. Choose the mode compatible with the requirements of the
system to which the ports are directly attached.

#net aggregate show


Displays basic information on the aggregate setup.

Copyright © 2014 EMC Corporation. All rights reserved Module 3: Managing Network Interfaces 22
Copyright © 2014 EMC Corporation. All rights reserved Module 3: Managing Network Interfaces 23
This lesson covers link failover. You learn what link failover does and then you learn how to
create a virtual interface for link failover on a Data Domain system.

Copyright © 2014 EMC Corporation. All rights reserved Module 3: Managing Network Interfaces 24
A virtual interface may include both physical and virtual interfaces (for example virtual
interfaces configured for link aggregation) as members.

Link failover improves network stability and performance by keeping backups operational
during network glitches.

Link failover is supported by a bonding driver on a Data Domain system. The bonding driver
checks the carrier signal on the active interface every 0.9 seconds. If the carrier signal is lost,
the active interface is changed to another standby interface. An address resolution protocol
(ARP) is sent to indicate that the data must flow to the new interface. The interface can be:
• On the same switch
• On a different switch
• Directly connected

Copyright © 2014 EMC Corporation. All rights reserved Module 3: Managing Network Interfaces 25
Specifications
• Only one interface in a group can be active at a time.
• Data flows over the active interface. Non-active interfaces can receive data.
• You can specify a primary interface. If you do specify a primary interface, it is the
active interface if it is available.
• Bonded interfaces can go to the same or different switches.
• You do not have to configure a switch to make link failover work.
• For a 1 GbE interface, you can put two, or more interfaces in a link failover bonding
group.
• The bonding interfaces can be:
 On the same card
 Across cards
 Between a card and an interface on the motherboard
• Link failover is independent of the interface type. For example, copper and optical can
be failover links if the switches support the connections.
• For a 10 GbE interface, you can put only two interfaces in a failover bonding group.

Copyright © 2014 EMC Corporation. All rights reserved Module 3: Managing Network Interfaces 26
To create a virtual interface for Link Failover:
1. Go to Hardware > Network > Interfaces.
2. Select the Create pull-down menu.
3. Choose Virtual Interface.
4. Enter the virtual interface id.
5. Select General.
6. Enter the bonding information.
7. Select the interface(s) for bonding.

Copyright © 2014 EMC Corporation. All rights reserved Module 3: Managing Network Interfaces 27
Continued from previous slide:
8. Click Next.
9. Enter the IP address and Netmask for the virtual interface.
10. Set the Speed/Duplex, and MTU settings.
11. Click Next.
12. Verify that the information in the settings dialog is correct.
13. Click Finish.

Copyright © 2014 EMC Corporation. All rights reserved Module 3: Managing Network Interfaces 28
The command line interface (CLI) can also be used to create and modify link failover.

# net failover add <virtual-ifname> interfaces <ifname-list>


[primary <ifname>] [up {<time> | default}] [down {<time>
| default}
Adds network interfaces to a failover interface.

# net failover modify <virtual-ifname> [primary {<ifname> |


none}] [up {<time> | default}] [down {<time> | default}]
Modifies the primary network interface for a failover interface. A down interface
must be up for the amount of time to be designated up. An up interface must be
down for the amount of time to be designated down. A primary interface cannot be
removed from failover. To remove a primary, use the argument primary <physical-
ifname> none.

# net failover show


Displays all failover interfaces. This command shows what is configured at the
bonding driver. To see what is in the registry, use the net show settings command
option. The registry settings may be different from the bonding configuration. When
interfaces are added to the virtual interface, the information is not sent to the
bonding module until the virtual interface is brought up. Until that time the registry
and the bonding driver configuration differ.

Copyright © 2014 EMC Corporation. All rights reserved Module 3: Managing Network Interfaces 29
Copyright © 2014 EMC Corporation. All rights reserved Module 3: Managing Network Interfaces 30
This lesson covers virtual local area network (VLAN) and Internet protocol (IP) alias
interfaces. You learn more about these interfaces and how they differ. Also, you learn how
to enable and disable them using the Enterprise Manager.

Copyright © 2014 EMC Corporation. All rights reserved Module 3: Managing Network Interfaces 31
VLANs and IP aliases are two methods of managing network traffic.

• VLANs provide the segmentation services normally provided by routers in LAN


configurations.
• VLANs address issues such as scalability, security, and network management.
• Routers in VLAN topologies provide broadcast filtering, security, address
summarization, and traffic-flow management.
• Switches may not bridge IP traffic between VLANs as doing so would violate the
integrity of the VLAN broadcast domain.

By using VLANs, one can control traffic patterns and react quickly to relocations. VLANs
provide the flexibility to adapt to changes in network requirements and allow for simplified
administration.

Partitioning a local network into several distinctive segments in a common infrastructure


shared across VLAN trunks can provide a very high level of security with great flexibility to a
comparatively low cost. Quality of Service schemes can optimize traffic on trunk links.

Copyright © 2014 EMC Corporation. All rights reserved Module 3: Managing Network Interfaces 32
VLANs could be used in an environment to provide easier access to local networks, to allow
for easy administration, and to prevent disruption on the network.

IP aliasing is associating more than one IP address to a network interface. With this, one
node on a network can have multiple connections to a network, each serving a different
purpose.

Copyright © 2014 EMC Corporation. All rights reserved Module 3: Managing Network Interfaces 33
A VLAN tag is the VLAN or IP alias ID.

VLAN tag insertion (VLAN tagging) enables you to create multiple VLAN segments.

You get VLAN tags from a network administrator. In a Data Domain system, you can have up
to 4096 VLAN tags. You can create a new VLAN interface from either a physical interface or a
virtual interface. The recommended total number that can be created is 80, although it is
possible to create up to 100 interfaces before the system is affected.

You may add your Data Domain system to a VLAN because the switch port it is connected to
may be a member of multiple VLANs, and you want the most direct path to the DD client
(backup software) for minimum latency.

To create a VLAN tag from the command line, use the following command:

# net create interface {<physical-ifname> | <virtual-


ifname>} {vlan <vlan-id> | alias <alias-id>}
Create a VLAN interface or an IP Alias.

Copyright © 2014 EMC Corporation. All rights reserved Module 3: Managing Network Interfaces 34
To create a VLAN tag from the System Manager:
1. Navigate to Hardware > Network > Interfaces.
2. Click Create, and select the VLAN Interface option.
The Create VLAN Interface dialog box appears.
3. Specify a VLAN ID by entering a number in the ID field.
The range of a VLAN ID is between 1 and 4095.
You get the VLAN tag from your system administrator.
4. Enter an IP address.
The Internet Protocol (IP) address is the numerical label assigned to the interface.
For example, 192.168.10.23.
5. Enter a netmask address.
The netmask is the subnet portion of the IP address assigned to the interface. The
format is typically 255.255.255.###, where the ### are the values that identify the
interface. If you do not specify a netmask, the Data Domain system uses the
netmask format as determined by the TCP/IP address class (A,B,C) you are using.
6. Specify the MTU settings. Specifying the MTU settings sets the maximum transfer
unit (MTU) size for the physical ( or Ethernet) interface. Supported values are from
350 to 9014. For 100 Base-T and gigabit networks, 1500 is the standard default. Click
the Default button to return this setting to the default value. Ensure that all of your
network components support the size set with this option.
7. Specify the dynamic DNS Registration option.
Dynamic DNS (DDNS) is the protocol that allows machines on a network to
communicate with, and register their IP address on, a domain name system (DNS)
server. The DDNS must be registered to enable this option.
8. Click Next.
The Create VLAN Interface Settings summary page appears. The values listed reflect
the new system and interface state.
9. Click Finish.
10. Click OK.

Copyright © 2014 EMC Corporation. All rights reserved Module 3: Managing Network Interfaces 35
You can create a new IP Alias interface from a physical interface, a virtual interface, or a
VLAN. When you do this, you are telling the interface the IP Subnet(s) to which it belongs.
This is done because the interface is connected to multiple IP subnets.

The recommended total number of IP Aliases, VLAN, physical, and virtual interfaces that can
exist on the system is 80, although it is possible to have up to 100 interfaces.

To create an IP Alias from the command line, use the following command:

# net create interface {<physical-ifname> | <virtual-


ifname>} {vlan <vlan-id> | alias <alias-id>}
Create a VLAN interface or an IP Alias.

Copyright © 2014 EMC Corporation. All rights reserved Module 3: Managing Network Interfaces 36
To create an IP Alias from the System Manager, do the following:
1. Navigate to Hardware > Network > Interfaces.
2. Click the Create menu and select the IP Alias option.
The Create IP Alias dialog box appears.
3. Specify an IP Alias ID by entering a number in the eth0a field.
Requirements are: 1 to 4094 inclusive.
4. Enter an IP Address.
The Internet Protocol (IP) Address is the numerical label assigned to the interface. For
example, 192.168.10.23
5. Enter a Netmask address.
The Netmask is the subnet portion of the IP address assigned to the interface.

The format is typically 255.255.255.000. If you do not specify a netmask, the Data
Domain system uses the netmask format as determined by the TCP/IP address class
(A,B,C) you are using.
6. Specify Dynamic DNS Registration option.
Dynamic DNS (DDNS) is the protocol that allows machines on a network to
communicate with, and register their IP address on, a Domain Name System (DNS)
server. The DDNS must be registered to enable this option.
7. Click Next.
The Create IP Alias Interface Settings summary page appears. The values listed reflect
the new system and interface state.
8. Click Finish and OK.

Copyright © 2014 EMC Corporation. All rights reserved Module 3: Managing Network Interfaces 37
Copyright © 2014 EMC Corporation. All rights reserved Module 3: Managing Network Interfaces 38
This module focuses on connecting to a Data Domain appliance using the CIFS and NFS
protocols.

Copyright © 2014 EMC Corporation. All rights reserved Module 4: CIFS and NFS 1
In many cases, as part of the initial Data Domain system configuration, CIFS clients were
configured to access the ddvar and MTree directories. This module describes how to modify
these settings and how to manage data access using the Enterprise Manager and cifs
command.

This lesson covers the following topics:


• Data Access for CIFS
• Enabling CIFS Services
• Creating a CIFS Share
• Accessing a CIFS Share
• Monitoring CIFS

Copyright © 2014 EMC Corporation. All rights reserved Module 4: CIFS and NFS 2
The Common Internet File System (CIFS) clients can have access to the system directories on the Data
Domain system. The /data/col1/backup directory is the default destination directory for compressed
backup server data. The /ddvar directory contains Data Domain system core and log files.

Clients, such as backup servers that perform backup and restore operations with a Data Domain
System, at the least, need access to the /data/col1/backup directory. Clients that have administrative
access need to be able to access the /ddvar directory to retrieve core and log files.

The Common Internet File System (CIFS) operates as an application-layer network protocol. It is
mainly used for providing shared access to files, printers, serial ports, and miscellaneous
communication between nodes on a network.

When you configure CIFS, your Data Domain system is able to communicate with MS Windows.

To configure a CIFS share, you must:


1. Configure the workgroup mode, or configure the active directory mode.
2. Give a descriptive name for the share.
3. Enter the path to the target directory (for example, /data/col1/mtree1).

The cifs command enables and disables access to a Data Domain system from media servers and
other Windows clients that use the CIFS protocol.

Copyright © 2014 EMC Corporation. All rights reserved Module 4: CIFS and NFS 3
After configuring client access, enable CIFS services, which allow the client to access the
system using the CIFS protocol.

1. For the Data Domain system selected in the Enterprise Manager Navigation pane,
click Data Management > CIFS.
2. In the CIFS Status area, click Enable.

The hostname for the Data Domain system that serves as the CIFS server was set during the
system’s initial configuration.

A Data Domain system’s hostname should match the name assigned to its IP address, or
addresses, in the DNS table. Otherwise, there might be problems when the system attempts
to join a domain, and authentication failures can occur. If you need to change the Data
Domain system’s hostname, use the net set hostname command, and also modify the
system’s entry in the DNS table.

When the Data Domain system acts as a CIFS server, it takes the hostname of the system. For
compatibility, it also creates a NetBIOS name. The NetBIOS name is the first component of
the hostname in all uppercase letters. For example, the hostname jp9.oasis.local is truncated
to the NetBIOS name JP9. The CIFS server responds to both names.

Copyright © 2014 EMC Corporation. All rights reserved Module 4: CIFS and NFS 4
From the command line, you can use the cifs enable command to enable CIFS services.

# cifs enable
Enable the CIFS service and allow CIFS clients to connect to the Data Domain system.

Copyright © 2014 EMC Corporation. All rights reserved Module 4: CIFS and NFS 5
The Enterprise Manager Configure Authentication dialog box allows you to set the
authentication parameters that the Data Domain system uses for working with CIFS.

The Data Domain system can join the active directory (AD) domain or the NT4 domain, or be
part of a workgroup (the default). If you did not use the Enterprise Manager’s Configuration
Wizard to set the join mode, use the procedures in this section to choose or change a mode.

The Data Domain system must meet all active-directory requirements, such as a clock time
that differs no more than five minutes from that of the domain controller.

The workgroup mode means that the Data Domain system authenticates CIFS clients using
local user accounts defined on the Data Domain system.

Copyright © 2014 EMC Corporation. All rights reserved Module 4: CIFS and NFS 6
You can also set authentication for CIFS shares using the command line interface (CLI):

# cifs set authentication active-directory <realm> { [<dc1>


[<dc2> ...]] | * }
Set authentication to the Active Directory. The realm must be a fully qualified name.
Use commas, spaces, or both to separate entries in the domain controller list.
Security officer authorization is required for systems with Retention Lock Compliance.

Note: Data Domain recommends using the asterisk to set all controllers instead of
entering them individually.

When prompted, enter a name for a user account. The type and format of the name
depend on whether the user is inside or outside the company domain.

For user Administrator inside the company domain, enter the name only:
administrator.

For user JaneDoe in a non-local, trusted domain, enter the username and domain:
jane.doe@trusteddomain.com. The account in the trusted domain must have
permission to join the Data Domain system to your company domain.

If DDNS is enabled, the Data Domain system automatically adds a host entry to the
DNS server. It is not necessary to create the entry manually when DDNS is enabled.

If you set the NetBIOS hostname using the command cifs set nb-hostname, the entry
is created for the NetBIOS hostname only, not the system hostname. Otherwise, the
system hostname is used.

# cifs set authentication workgroup <workgroup>


Set the authentication mode to workgroup for the specified workgroup name.

Copyright © 2014 EMC Corporation. All rights reserved Module 4: CIFS and NFS 7
When creating shares, you must assign client access to each directory separately and remove
access from each directory separately. For example, a client can be removed from /ddvar
and still have access to /data/col1/backup.

Note: If Replication is to be implemented, a Data Domain system can receive backups from
both CIFS clients and NFS clients as long as separate directories are used for each. Do not mix
CIFS and NFS data in the same directory.

To share a folder using the CIFS protocol on a Data Domain system:


1. From the Navigational pane, select a Data Domain system to configure shares.
2. Click Data Management > CIFS tabs to navigate to the CIFS view.
3. Ensure authentication has been configured.
4. On the CIFS client, set shared directory permissions or security options.
5. On the CIFS view, click the Shares tab.
6. Click Create.
The Create Shares dialog box appears.

Copyright © 2014 EMC Corporation. All rights reserved Module 4: CIFS and NFS 8
7. In the Create Shares dialog box, enter the following information:
 Share Name: A descriptive name for the share.
 Directory Path: The path to the target directory (for example,
/data/col1/backup/dir1).
 Comment: A descriptive comment about the share.
8. Add a client by clicking the plus sign (+) in the Clients area. The Client dialog box
appears.
9. Enter the name of the client in the Client text box and click OK. No blanks or tabs
(white space) characters are allowed. Repeat this step for each client that you
need to configure.
10. To modify a User or Group name, in the User/Group list, click the checkbox of the
user or group and click edit (pencil icon) or delete (X).
11. To add a user or group, click (+), and in the User/Group dialog box, select the Type
radio button for User or Group, and enter the user or group name.

You can also use the command line to set up CIFS shares:

# cifs share create <share> path <path> {max-connections


<max connections> | clients <clients> | browsing {enabled
| disabled} | writeable {enabled | disabled} | users
<users> | comment <comment>}
Create a new share

Copyright © 2014 EMC Corporation. All rights reserved Module 4: CIFS and NFS 9
From a Windows Client, you can access CIFS shares on a Data Domain system either from a
Windows Explorer window or at the DOS prompt (Run menu).

From a Windows Explorer window:


1. Select Map Network Drive.
2. Select a Drive letter to assign the share.
3. Enter the DD system to connect to and the share name (\\<DD_Sys>\<Share>),
for example, \\host1\backup.
4. Check the box Connect using a different username, if necessary.
5. Click Finish.
If Connect using a different username was checked, you are prompted for your Data
Domain username and password.

Copyright © 2014 EMC Corporation. All rights reserved Module 4: CIFS and NFS 10
From the DOS Prompt or Run menu, enter:
> net use drive: \\<DD_Sys>\<Share> /USER:<DD_Username>

You will be prompted for the password to your Data Domain user account.

For example, enter:


> net use H: \\DDSystem\backup /USER:dd02

This command maps the backup share from Data Domain system DDSystem to drive H on the
Windows system and gives the user named dd02 access to the \\DDSystem\backup directory.

Copyright © 2014 EMC Corporation. All rights reserved Module 4: CIFS and NFS 11
The CIFS tab of the Data Domain Enterprise Manager provides information about the
configuration and status of CIFS shares.

Easily viewable are the number of open connections, open files, connection limit and open
files limit per connection. Click the Connection Details link to view the details about active
connections to the CIFS shares.

Copyright © 2014 EMC Corporation. All rights reserved Module 4: CIFS and NFS 12
You can also use the command line interface (CLI) to view details and statistics about CIFS
shares.

# cifs show active


Display all active CIFS clients.

# cifs show clients


Display all allowed CIFS clients for the default /ddvar administrative share and the
default /backup data share.

# cifs show config


Display the CIFS configuration.

# cifs show detailed-stats


Display statistics for every individual type of SMB operation, display CIFS client
statistics, and print a list of operating systems with their client counts.

The list counts the number of different IP addresses connected from each operating
system. In some cases, the same client may use multiple IP addresses.

Output for CIFS Client Type shows Miscellaneous clients, where Yes means the
displayed list of clients is incomplete. No means the list is complete, and Maximum
connections, where the value is the maximum number of connections since the last
reset.

# cifs show stats


Show CIFS statistics.

Copyright © 2014 EMC Corporation. All rights reserved Module 4: CIFS and NFS 13
Copyright © 2014 EMC Corporation. All rights reserved Module 4: CIFS and NFS 14
This lesson covers the configuration and monitoring of NFS exports on a Data Domain
system.

Copyright © 2014 EMC Corporation. All rights reserved Module 4: CIFS and NFS 15
The Network File System (NFS) is a distributed file system protocol originally developed by
Sun Microsystems in 1984. It allows a user on a client computer to access files over a
network in a manner similar to how local storage is accessed. NFS, like many other protocols,
builds on the Open Network Computing Remote Procedure Call (ONC RPC) system. The
Network File System is an open standard defined in RFCs, allowing anyone to implement the
protocol.

Network File System (NFS) clients can have access to the system directories or MTrees on the
Data Domain system.
• The /ddvar directory contains Data Domain system, core, and log files.
• The /data/col1 path is the top-level destination when using MTrees for
compressed backup server data.

Clients, such as backup servers that perform backup and restore operations with a Data
Domain System, need to mount an MTree under /data/col1. Clients that have
administrative access need to mount the /ddvar directory to retrieve core and log files.

Copyright © 2014 EMC Corporation. All rights reserved Module 4: CIFS and NFS 16
To configure an NFS export:
1. Select a Data Domain system in the left navigation pane.
2. Go to Data Management > NFS > Exports.
3. Click Create.
4. Enter a path name for the export.
5. In the Clients area, select an existing client or click the plus (+) icon to create a client.
The Create NFS Exports dialog box appears.
6. Enter a server name in the text box:
 Enter fully qualified domain names, hostnames, or IP addresses.
 A single asterisk (*) as a wild card indicates that all backup servers are used as
clients.
 Clients given access to the /data/col1/backup directory have access to the
entire directory.
 A client given access to a subdirectory of /data/col1/backup has access
only to that subdirectory.

Copyright © 2014 EMC Corporation. All rights reserved Module 4: CIFS and NFS 17
 A client can be a(n):
 fully-qualified domain hostname
 IP address
 IP address with either a netmask or length
 NIS netgroup name with the prefix @, or an asterisk (*) wildcard with a
domain name, such as *.yourcompany.com
7. Select the checkboxes of the NFS options for the client.
 Read-only permission
 Default requires that requests originate on a port that is less than
IPPORT_RESERVED (1024).
 Map requests from UID or GID 0 to the anonymous UID or GID
 Map all user requests to the anonymous UID or GID.
 Use default anonymous UID or GID.

The nfs command enables you to add NFS clients and manage access to a Data Domain
system. It also enables you to display status information, such as verifying that the NFS
system is active, and the time required for specific NFS operations.

# nfs add <path > <client-list> [(<option-list>)]


Add NFS clients that can access the Data Domain system. A client can be a fully
qualified domain hostname, class-C IP addresses, IP addresses with netmasks or
length, an NIS netgroup name with the prefix @, or an asterisk wildcard for the
domain name, such as *.yourcompany.com.

An asterisk by itself means no restrictions. A client added to a subdirectory under


/backup has access only to that subdirectory.

The <options-list> is comma or space separated, enclosed by parentheses. If no


option is specified, the default options are rw, root_squash, no_all_squash, and
secure.

In GDA configurations, only /ddvar is exported. The export of /data shares is not
supported.

# nfs enable
Allow all NFS-defined clients to access the Data Domain system.

Copyright © 2014 EMC Corporation. All rights reserved Module 4: CIFS and NFS 18
You can use the Data Domain Enterprise Manager to monitor NFS client status and NFS
configuration:
1. Click Data Management.
2. Click the NFS tab.
The top pane shows the operational status of NFS, for example, NFS is currently
active and running.

You can also use the command line interface (CLI) to monitor NFS client status and statistics.

# nfs show active


List clients active in the past 15 minutes and the mount path for each. Allow all NFS-
defined clients to access the Data Domain system.

# nfs show clients


List NFS clients allowed to access the Data Domain system and the mount path and
NFS options for each.

# nfs show detailed-stats


Display NFS cache entries and status to facilitate troubleshooting.

Copyright © 2014 EMC Corporation. All rights reserved Module 4: CIFS and NFS 19
# nfs show histogram
Display NFS operations in a histogram. Users with user role permissions may run this
command.

# nfs show port


Display NFS port information. Users with user role permissions may run this
command.

# nfs show stats


Display NFS statistics.

# nfs status
Enter this option to determine if the NFS system is operational. When the file system
is active and running, the output shows the total number of NFS requests since the
file system started, or since the last time the NFS statistics were reset.

Copyright © 2014 EMC Corporation. All rights reserved Module 4: CIFS and NFS 20
Copyright © 2014 EMC Corporation. All rights reserved Module 4: CIFS and NFS 21
Copyright © 2014 EMC Corporation. All rights reserved Module 4: CIFS and NFS 22
In this module, you learn about managing data with a Data Domain system.

Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 1
This lesson covers configuring and monitoring MTrees for storing backups within a Data Domain file
system.

Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 2
MTrees (Management Trees) are used to provide more granular management of data so different
types of data, or data from different sources, can be managed and reported on, separately. Various
backup operations are directed to individual MTrees. For example, you can configure directory
export levels and quotas to separate and manage backup files by department.

Before MTrees were implemented, subdirectories under a single /backup directory were created
to keep different types of data separate. Data from different sources, departments, or locales were
backed up to separate subdirectories under /backup but all subdirectories were subject to the
same permissions, policies, and reporting.

With MTrees enabled, data can now be backed up to separately managed directory trees, MTrees.
A static MTree, /backup, is still created by the file system, but cannot be removed or renamed.
Additional MTrees can be configured by the system administrator under /data/col1 (col stands
for collection). You can still create a subdirectory under any MTree, but it will be subject to the
same permissions, policies, and reporting as the MTree in which it resides.

Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 3
Although a Data Domain system supports a maximum of 100 MTrees, system performance on
many models might degrade rapidly if more than 14* MTrees are actively engaged in read or write
streams. The degree of degradation depends on overall I/O intensity and other file system loads.
For optimum performance, constrain the number of simultaneously active MTrees. Whenever
possible, aggregate operations on the same MTree into a single operation.

Regular subdirectories can be configured under /data/col1/backup as allowed in prior


versions of DD OS. Subdirectories can also be configured under any other configured MTree.
Although you can create additional directories under an MTree, the Data Domain system
recognizes and reports on the cumulative data contained within the entire MTree.

You cannot add data or directories to /data. You can add MTrees only to /col1/data. /col1,
and /backup cannot be deleted or renamed.

* With DD OS 5.3 and 5.4 the DD990, DD890, and DD880 series appliances support up to 32 active
MTrees.

Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 4
Increased granular reporting of space and deduplication rates – in the case you might have
different departments or geographies backing up to the same Data Domain system, each
department or geography could have their own independent storage location.

The term, snapshot, is a common industry term denoting the ability to record the state of a storage
device or a portion of the data being stored on the device, at any given moment, and to preserve
that snapshot as a guide for restoring the storage device, or portion thereof. Snapshots are used
extensively as a part of the Data Domain data restoration process. With MTrees, snapshots can be
managed at a more granular level.

Retention lock, is an optional feature used by Data Domain systems to securely retain saved data
for a given length of time and protecting it from accidental or malicious deletion. Retention lock
feature can now be applied at the MTree level.

Another major benefit is to limit the logical, pre-comp, space used by the specific MTree through
quotas.

Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 5
NFS and CIFS can access /data and all of the MTrees beneath /col1 by configuring normal CIFS
shares and NFS exports.

VTL and DD Boost have special storage requirements within the MTree structure and are discussed
in later modules.

Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 6
MTree quotas allow you to set limits on the amount of logical, pre-comp space used by individual
MTrees. Quotas can be set for MTrees used by CIFS, NFS, VTL, or DD Boost data.

There are two types of quotas:


• Soft limit: When this limit is reached, an alert is generated through the system, but
operations continue as normal.
• Hard limit: When this limit is reached, any data in the process of backup to this MTree fail.
An alert is also generated through the system, and an out of space error (EMOSP for VTL) is
reported to the backup app. In order to resume backup operations after data within an
MTree reaches a hard limit quota, you must either delete sufficient content in the MTree,
increase the hard limit quota, or disable quotas for the MTree.

You can set a soft limit, a hard limit, or both soft and hard limits. Quotas work using the amount
of logical space (pre-comp, not physical space) allocated to an individual MTree. The smallest quota
that can be set is 1 MiB.

An administrator can set the storage space restriction for an MTree to prevent it from consuming
excess space.

Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 7
To create an MTree in the System Manager:
1. Click Data Management > MTree > Create.
A Create MTree dialog appears.
2. Type the name of the MTree you are creating in the MTree name field.
3. Click OK to complete the MTree creation.

MTree quotas can be set at the same time that an MTree is created, or they can be set after
creating the MTree. Quotas can be set and managed using the System Manager or the CLI. The
advantage of MTree operations is that quotas can be applied to a specific MTree as opposed to the
entire file system.

When the MTree is created, it appears in the list of MTrees alphabetically by name.

As data fills the MTree, Data Domain System Manager displays graphically and by percentage the
quota hard limit. You can view this display at Data Management > MTree. The MTree display
presents the list of MTrees, quota hard limits, daily and weekly pre-comp and post-comp amounts
and ratios.

Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 8
The following commands can be used to create and manage MTrees from the command line:

# mtree create <mtree-path> [quota-soft-limit <n>


{MiB|GiB|TiB|PiB}] [quota-hard-limit <n> {MiB|GiB|TiB|PiB}]
Creates an MTree

# mtree delete <mtree-path>


Deletes an MTree

# mtree undelete <mtree-path>


Undeletes an MTree

# mtree list [<mtree-path>]


Lists the MTrees

Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 9
The Data Management > Quota page shows the administrator how many MTrees have no soft or
hard quotas set, and for MTrees with quotas set, the percentage of pre-compressed soft and hard
limits used.

The entire quota function is enabled or disabled from the Quota Settings window. Quotas for
existing MTrees are set by selecting the Configure Quota button.

You can also create and manage quotas from the command line:

# quota disable
Disables quota function

# quota enable
Enables quota function

# quota reset { all | mtrees <mtree-list> | storage-units


<storage-unit-list> } [soft-limit] [hard-limit]
Resets quota limits to none

Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 10
# quota set {all | mtrees <mtree-list> | storage-units <storage-
unit-list>} {soft-limit <n> {MiB|GiB|TiB|PiB} | hard-limit
<n> {MiB|GiB|TiB|PiB} | soft-limit <n> {MiB|GiB|TiB|PiB}
hard-limit <n> {MiB|GiB|TiB|PiB}}
Sets quota limits

# quota show {all | mtrees <mtree-list> | storage-units


<storage-unit-list>}
Lists quotas for MTrees and storage-units

# quota status
Shows status for quota function

Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 11
Data Domain systems not only provide improved control over backups using MTrees, the system
also provides data monitoring at the MTree level.

Under Data Management > MTree a summary tab provides an at-a-glance view of all configured
MTrees, their quota hard limits (if set), pre- and post-comp usage, as well as compression ratios for
the last 24 hours, the last 7 days, and current weekly average compression. Select an MTree, and
the Summary pane presents current information about the selected MTree.

Note: The information on this summary page can be delayed by up to 10-15 minutes.

For real-time monitoring of MTrees and quotas, the following commands can be used from the
command prompt:

# mtree show compression <mtree_path> [tier {active | archive}]


[summary | daily | daily-detailed] {[last <n> { hours | days
| weeks | months } | [start <date> [end <date>]]}
Show MTree compression statistics

# quota show {all | mtrees <mtree-list> | storage-units


<storage-unit-list>}
List quotas for MTrees and storage-units

Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 12
A Data Domain system provides control through individual MTree organization. You can also
monitor system usage at the same MTree level.

Under Data Management > MTree you find a summary tab providing an at-a-glance view of all
configured MTrees, their quota hard limits (if set), pre- and post-comp usage, as well as
compression ratios for the last 24 hours, the last 7 days, and current weekly average compression.

Below the list of MTrees, the MTree Summary pane shows at-a-glance the settings associated with
the selected MTree. In this pane, you can also perform the following on the selected MTree:
• Rename the MTree
• Configure quotas, hard and soft
• Create an NFS export

On the same display below the summary pane, you can also find panes that monitor MTree
replication, snapshots and retention lock for the selected MTree. This course covers the MTree
replication pane and the retention lock pane in a later module.

Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 13
You can control the snapshot schedules associated with the selected MTree. You can also see at-a-
glance, the total number of snapshots collected, expired, and unexpired, as well as the oldest,
newest, and next scheduled snapshot.

The Space Usage tab displays a graph representing the amount of space used in the selected MTree
over the selected duration (7, 30, 60, or 120 days).

The Daily Written tab shows a graph depicting the amount of space written in the selected MTree
over a selected duration (7, 30, 60, or 120 days).

Note: You must have the most current version of Adobe Flash installed and enabled with your web
browser in order to view these reports.

The related pre-, post-, and total compression factors over the same time period are also reported.

Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 14
If a quota-enabled MTree fills with data, the system will generate soft and hard limit alerts when a
soft or hard limit in a specific MTree is reached.

• Soft limit: When this limit is reached, an alert is generated through the system, but
operations continue as normal.

• Hard limit: When this limit is reached, any data in the process of backup to this MTree fail.
An alert is also generated through the system, and an out of space error (EMOSP for VTL) is
reported to the backup app. In order to resume backup operations after data within an
MTree reaches a hard limit quota, you must either delete sufficient content in the MTree,
increase the hard limit quota, or disable quotas for the MTree.

These alerts are reported in the Data Domain System Manager > Status > Summary > Alerts pane
in the file system alerts. Details are reported in the Status > Alerts > Current Alerts and Alerts
History tabs. When an alert is reported, you see the status as “posted.” After the alert is resolved,
you see the status as cleared.

Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 15
Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 16
This lesson covers snapshot operations and their use in a Data Domain file system.
You will have a chance to configure and create a snapshot on a Data Domain system in a structured
lab.

Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 17
Snapshot is a common industry term denoting the ability to record the state of a storage device or
a portion of the data being stored on the device, at any given moment, and to preserve that
snapshot as a guide for restoring the storage device, or portion thereof. A snapshot primarily
creates a point-in-time copy of the data. Snapshot copy is done instantly and made available for
use by other applications such as data protection, data analysis and reporting, and data replication
applications. The original copy of the data continues to be available to the applications without
interruption, while the snapshot copy is used to perform other functions on the data.

Snapshots provide an excellent means of data protection. The trend towards using snapshot
technology comes from the benefits that snapshots deliver in addressing many of the issues that
businesses face. Snapshots enable better application availability, faster recovery, and easier back
up management of large volumes of data.

Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 18
Snapshot benefits:
• Snapshots initially do not use many system resources.
Note: Snapshots continue to place a hold on all data they reference even when the backups
have expired.
• Snapshots are useful for saving a copy of MTrees at specific points in time – for instance,
before a Data Domain OS upgrade – which can later be used as a restore point if files need
to be restored from that specific point in time. Use the snapshot command to take an image
of an MTree, to manage MTree snapshots and schedules, and to display information about
the status of existing snapshots.
• You can schedule multiple snapshot schedules at the same time or create them individually
as you choose.

The maximum number of snapshots allowed to be stored on a Data Domain system is 750 per
MTree. You will receive a warning when the number of snapshots reaches 90% of the allowed
number (675-749) in a given MTree. An alert is generated when you reach the maximum snapshot
count.

Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 19
A snapshot saves a read-only copy of the designated MTree at a specific point in time where it can
later be used as a restore point if files need to be restored from that specific point in time.

In a snapshot, only the pointers to the production data being copied are recorded at a specific
point in time. In this case, 22:24 GMT. The copy is extremely quick and places minimal load on the
production systems to copy this data.

Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 20
When changes occur to the production data (in this case 1 & 2 are no longer part of the file) and
additional data is written (5 & 6), then the file system removes the pointers to the original data no
longer in use and adds pointers to the new data. The original data (1 & 2) is still stored, allowing
the snapshot pointers to continue to point to the data as saved at the specific point in time. Data is
not overwritten, but changed data is added to the system, and new pointers are written for
production file 1.

When production data is changed, additional blocks are written, and pointers are changed to
access the changed data. The snapshot maintains pointers to the original, point-in-time data. All
data remains on the system as long as pointers reference the data.

Snapshots are a point-in-time view of a file system. They can be used to recover previous versions
of files, and also to recover from an accidental deletion of files.

Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 21
As an example, snapshots for the MTree named “backup” are created in the system directory
/data/col1/backup/.snapshot. Each directory under /data/col1/backup also has a
.snapshot directory with the name of each snapshot that includes the directory. Each MTree has
the same type of structure, so an MTree named HR would have a system directory
/data/col1/HR/.snapshot, and each subdirectory in /data/col1/HR would have a
.snapshot directory as well.

Use the snapshot feature to take an image of an MTree, to manage MTree snapshots and
schedules, and to display information about the status of existing snapshots.

Note: If only /data is mounted or shared, the .snapshot directory is not visible. The .snapshot
directory is visible when the MTree itself is mounted.

Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 22
To create a Snapshot in the System Manager:
1. Go to Data Management > Snapshots.
2. Select an MTree from the Selected MTree dropdown list.
 If snapshots are listed, you can search by using a search term in the Filter By Name or
Year field.
 You can modify the expiration date, rename a snapshot or immediately expire any
number of selected snapshots from the Snapshots pane.
3. Click Create.
A snapshot Create dialog appears.
4. Name the snapshot, and set an expiration date. If you do not set a date, the snapshot will
not release the data to which it is pointing until you manually remove the snapshot.

You can perform modify, rename, and delete actions using the same interface in the Snapshots tab.

Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 23
You can also create and manage snapshots using the command line:

# snapshot expire <snapshot> mtree <mtree-path> [retention


{<date> | <period> | forever}]
Sets or resets the retention time of a snapshot. Expires a snapshot. If you want to expire the
snapshot immediately, use the snapshot expire operation with no options. An expired
snapshot remains available until the next file system clean operation.

# snapshot rename <snapshot> <new-name> mtree <mtree-path>


Renames a snapshot

# snapshot list mtree <mtree-path>


Displays a list of snapshots of a specific MTree. The display shows the snapshot name, the
amount of pre-compression data, the creation date, the retention date, and the status. The
status may be blank or expired.

# snapshot create <snapshot> mtree <mtree-path> [retention


{<date> | <period>}]
Creates a snapshot.

Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 24
In the System Manager, create a schedule for a series of snapshots by doing the following:
1. From the Schedules tab, click Create.
2. Follow the Snapshot Schedule Wizard to define a name, naming pattern, the schedule for
recurring snapshot events, and the retention period before the snapshots expire.
A summary window appears allowing you to approve the schedule.
3. Click Finish to confirm the schedule.
4. A warning dialog appears allowing you to add MTree(s) to the schedule. Click OK to add one
or more MTrees to the schedule.
5. Select one or more MTrees from the Available MTree(s) list.
6. Click Add to add the MTree(s) to the Selected MTree(s) list.
7. Click OK.

Snapshots occur as scheduled. Scheduled snapshots appear in the list below the Schedules tab.

Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 25
You can also create and manage snapshot schedules using the command line:

# snapshot schedule create <name> [mtrees <mtree-list>] [days


<days>] time <time> [,<time>...] [retention <period>] [snap-
name-pattern <pattern>]
# snapshot schedule create <name> [mtrees <mtree-list>] [days
<days>] time <time> every <mins> [retention <period>] [snap-
name-pattern <pattern>]
# snapshot schedule create <name> [mtrees <mtree-list>] [days
<days>] time <time>-<time> [every <hrs | mins>] [retention
<period>] [snap-name-pattern <pattern>]
Schedules when snapshots are taken.

# snapshot schedule del <name> mtrees <mtree-list>


Schedules when snapshots are deleted.

# snapshot schedule destroy [<name> | all]


Deletes snapshots from the schedule.

# snapshot schedule modify <name> [days <days>] time <time>


[,<time>...] [retention <period>] [snap-name-pattern
<pattern>]
# snapshot schedule modify <name> [days <days>] time <time>
every {<mins> | none} [retention <period>] [snap-name-pattern
<pattern>]
# snapshot schedule modify <name> [days <days>] time <time>-
<time> [every {<hrs> | <mins> | none}] [retention <period>]
[snap-name-pattern <pattern>]
Modifies the existing snapshot schedule.

# snapshot schedule reset


Deletes all snapshot schedules.

# snapshot schedule show [<name> | mtrees <mtree-list>]


Shows all snapshot schedules.

Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 26
Immediately below the MTree list, in the summary pane, you can view the Snapshot pane that
monitors snapshots for the selected MTree.

The Snapshots pane in the MTree summary page allows you to see at-a-glance, the total number of
snapshots collected, expired, and unexpired, as well as the oldest, newest, and next scheduled
snapshot within a given MTree.

You can associate configured snapshot schedules with the selected MTree name. Click Assign
Snapshot Schedules, select a schedule from the list of snapshot schedules and click Okay to assign
it. You can create additional snapshot schedules if needed.

Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 27
Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 28
This lesson covers fast copy operations and their use in a Data Domain file system. Topics include:
• Fast copy definition, use, and benefits.
• Basic fast copy operations: creation, schedule, and expiration.

Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 29
Fast copy is a function that makes an alternate copy of your backed up data on the same Data
Domain system. Fast copy is very efficient at making duplicate copies of pointers to data by using
the DD OS snapshot function with only 1% to 2% of overhead needed to write data pointers to the
original data.

Sometimes, access to production backup data is restricted. Fast copy gives access to all data fast
copied readable and writeable, making this operation handy for data recovery from backups.

The difference between snapshots and fast copied data is that the fast copy duplicate is not a
point-in-time duplicate. Any changes that are made during the data copy, in either the source or
the target directories, will not be duplicated in the fast copy.

Note that fast copy is a read/write copy of a point-in-time copy at the time it was made while a
snapshot is read only.

Fast copy makes a copy of the pointers to data segments and structure of a source to a target
directory on the same Data Domain system. You can use the fast copy operation to retrieve data
stored in snapshots. In this example, the /HR MTree contains two snapshots in the /.snapshot
directory. One of these snapshots, 10-31-2012, is fast copied to /backup/Recovery. Only pointers to
the actual data are copied, adding a 1% to 2% increase in actual used data space. All of the
referenced data is readable and writable. If the /HR MTree or any of its contents is deleted, no data
referenced in the fast copy is deleted from the system.

Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 30
To perform a fast copy from the System Manager:
1. Navigate to Data Management > File System > More Tasks > Fast Copy.
2. Enter the data source and the destination (target location).
3. Enter the pathname for the directory where the data to be copied resides.
If you want to copy a snapshot created in the HR MTree, to a destination named, HRCopy in
the /backup MTree, use the path to the given snapshot as the source and the full path to
the directory, HRCopy, in the destination field.
Specifying an non-existent directory creates that directory. Be aware that the destination
directory must be empty or the fast copy operation will fail. You can choose to overwrite
the contents of the destination by checking that option in the Fast Copy dialog window.

You can also perform a fast copy from the command line:
# filesys fastcopy source <src> destination <dest>
Copies a file or directory tree from a Data Domain system source directory to a destination
on the Data Domain system.

Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 31
The fast copy operation can be used as part of a data recovery workflow using a snapshot.
Snapshot content is not viewable from a CIFS share or NFS mount, but a fast copy of the snapshot
is fully viewable. From a fast copy on a share or a mount, you can recover lost data without
disturbing normal backup operations and production files.

Fast copy makes a destination equal to the source, but not at a particular point in time. The source
and destination may not be equal if either is changed during the copy operation.

This data must be manually identified and deleted to free up space. Space reclamation (file system
cleaning) must be run to regain the data space held by the fast copy. When backup data expires, a
fast copy directory will prevent the Data Domain system from recovering the space held by the
expired data because it is flagged by the fast copy directory as in-use.

Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 32
Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 33
This lesson covers Data Domain file system cleaning, also called garbage collection.
Topics include the purpose and use of file system cleaning, scheduling, configuring, and running the
file system cleaning operation.

Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 34
When your backup application (such as NetWorker or NetBackup) expires backups, the associated
data is marked by the Data Domain system for deletion. However, the expired data is not deleted
immediately by the Data Domain system; it is removed during the cleaning operation. While the
data is not immediately deleted, the path name is. This results in unclaimed segment space that is
not immediately available.

File system cleaning is the process by which storage space is reclaimed from stored data that is no
longer needed. For example, when retention periods on backup software expire, the backups are
removed from the backup catalog, but space on the Data Domain system is not recovered until file
system cleaning is completed.

Depending on the amount of space the file system must clean, file system cleaning can take from
several hours to several days to complete. During the cleaning operation, the file system is available
for all normal operations including backup (write) and restore (read).

Although cleaning uses a significant amount of system resources, cleaning is self-throttling and
gives up system resources in the presence of user traffic.

Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 35
Data invulnerability requires that data be written only into new, empty containers – data already
written in existing containers cannot be overwritten. This requirement also applies to file system
cleaning. During file system cleaning, the system reclaims space taken up by expired data so you
can use it for new data.

The example in this figure refers to dead and valid segments. Dead segments are segments in
containers no longer needed by the system, for example, claimed by a file that has been deleted
and was the only/or final claim to that segment, or any other segment/container space deemed
not needed by the file system internally. Valid segments contain unexpired data used to store
backup-related files. When files in a backup are expired, pointers to the related file segments are
removed. Dead segments are not allowed to be overwritten with new data since this could put
valid data at risk of corruption. Instead, valid segments are copied forward into free containers to
group the remaining valid segments together. When the data is safe and reorganized, the original
containers are appended back onto the available disk space.

Since the Data Domain system uses a log structured file system, space that was deleted must be
reclaimed. The reclamation process runs automatically as a part of file system cleaning.

Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 36
During the cleaning process, a Data Domain system is available for all normal operations, to include
accepting data from backup systems.

Cleaning does require a significant amount of system processing resources and might take several
hours, or under extreme circumstances days, to complete even when undisturbed. Cleaning applies
a set processing throttle of 50% when other operations are running, sharing the system resources
with other operations. The throttling percentage can be manually adjusted up or down by the
system administrator.

File system cleaning can be scheduled to meet the needs of your backup plan. The default time
schedule is set to run every Tuesday at 6 a.m. The default CPU throttle is 50%. This setting applies
half of the CPU resources with the cleaning process and half with all of the other processes.
Increasing the throttle amount, increases the resources dedicated to the cleaning process and
decreases resources available to other running processes.

Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 37
Using the Data Domain System Manager, navigate to Data Management > File System > Start
Cleaning.

This action begins an immediate cleaning session.

A window displays an informational alert describing the possible performance impact during
cleaning, and a field to set the percentage of throttle for the cleaning session.

Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 38
Schedule file system cleaning to start when the period of high activity ends, and the competition
for resources is minimal or non-existent.

To schedule file system cleaning using the Data Domain System Manager, navigate to Data
Management > File System > Configuration > Clean Schedule.

You see a window with three options for scheduling file system cleaning:
• Default: Tuesday at 6 a.m. with 50% throttle.
Note: The throttle setting affects cleaning only when the system is servicing other user
requests. When there are no user requests, cleaning always runs at full throttle. For
example, if throttle is set to 70%, the system uses 100% of the system resources and
throttles down to 70% of resources when the system is handling other user requests.
• No Schedule: The only cleaning that occurs would be manually initiated.
• Custom Clean Schedule: Configurable with weekly-based or monthly-based settings.

Every day or selected days of the week on the schedule will run cleaning at the same time on the
given days.

Click OK to set the schedule you have selected.

Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 39
You can also set file system cleaning and monitor cleaning from the command line:

# filesys clean reset {schedule | throttle | all}


Resets the clean schedule to the default of Tuesday at 6 a.m. (tue 0600), the default throttle
of 50 percent, or both.

# filesys clean set schedule { never | daily <time> | <day(s)>


<time> | biweekly <day> <time> | monthly <day(s)> <time> }
Sets the schedule for the clean operation to run automatically. Default is Tuesday at 6 a.m.

# filesys clean set throttle <percent>


Sets the clean operations to use a lower level of system resources when the Data Domain
system is busy. At zero percent, cleaning runs slowly or not at all, depending on how busy
the system is.

# filesys clean show config


Displays settings for file system cleaning.

# filesys clean show schedule


Displays the current date and time for the clean schedule.

# filesys clean show throttle


Displays throttle setting for cleaning.

# filesys clean start


Starts the clean process manually.

# filesys clean status


Displays the status of the clean process.

# filesys clean stop


Stops the clean process.

# filesys clean watch


Monitors the filesys clean process.

Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 40
Considerations and suggested practices:
• You should schedule cleaning for times when system traffic is lowest.
• Cleaning is a file system operation that impacts overall system performance while it runs.
Adjusting the cleaning throttle higher than 50% consumes more system resources during
the cleaning operation and can potentially slow down other system processes.
• Any operation that shuts down the Data Domain file system or powers off the device (a
system power-off, reboot, or filesys disable command) stops the clean operation. File
system cleaning does not automatically continue when the Data Domain system or file
system restarts.
• Encryption and gz compression requires much more time than normal to complete cleaning
as all existing data needs to be read, uncompressed, and compressed again.
• Expiring files from your backup does not guarantee that space will be freed after cleaning. If
active pointers exist to any segments related to the data you expire, such as snapshots or
fast copies, those data segments are still considered valid and will remain on the system
until all references to those segments are removed.
• Daily file system cleaning is not recommended as overly frequent cleaning can lead to
increased file fragmentation. File fragmentation can result in poor data locality and, among
other things, higher-than-normal disk utilization. If the retention period of your backups is
short, you might be able to run cleaning more often than once weekly. The more frequently
the data expires, the more frequently file system cleaning can operate. Work with EMC Data
Domain Support to determine the best cleaning frequency under unusual circumstances.

Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 41
• If your system is growing closer to full capacity, do not change the cleaning schedule to
increase cleaning cycles. A higher frequency of cleaning cycles might reduce the
deduplication factor, thus reducing the logical capacity on the Data Domain system and
causing more space to be used by the same data stored.
Instead, manually remove unneeded data or reduce the retention periods set by your
backup software to free additional space. Run cleaning per the schedule after data on the
system has been expired.
If you encounter a system full (100%) or near full (90%) alert, and you are unable to free up
space before the next backup, contact Support as soon as possible.
• If cleaning is run during replication operations and replication lags in its process, cleaning
may not be able to complete operations. This condition requires either replication break
and resync after cleaning has completed or allowing replication to catch up (for example,
increasing network link speed or writing less new data to the source directory).

Note: It is good practice to run a cleaning operation after the first full backup to a Data Domain
system. The initial local compression on a full backup is generally a factor of 1.5 to 2.5. An
immediate cleaning operation gives additional compression by another factor of 1.15 to 1.2 and
reclaims a corresponding amount of disk space.

Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 42
Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 43
This lesson covers how to monitor Data Domain file system space usage.

Topics include the factors that affect the rate at which space is consumed on the system and
monitoring the space used and rate of consumption on the system.

Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 44
When a disk-based deduplication system such as a Data Domain system is used as the primary
destination storage device for backups, sizing must be done appropriately. Presuming the correctly
sized system is installed, it is important to monitor usage to ensure data growth does not exceed
system capacity.

The factors affecting how fast data on a disk grows on a Data Domain system include:
• The size and number of data sets being backed up: An increase in the number of backups
or an increase in the amount of data being backed-up and retained will cause space usage
to increase.
• The compressibility of data being backed up: Pre-compressed data formats do not
compress or deduplicate as well as non-compressed files and thus increase the amount of
space used on the system.
• The retention period specified in the backup software: The longer the retention period,
the larger the amount of space required.

If any of these factors increase above the original sizing plan, your backup system could easily
overrun its capacity.

There are several ways to monitor the space usage on a Data Domain system to help prevent
system full conditions.

Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 45
The File System Summary tab is under the Data Management tab in the Data Domain System
Manager.

The window displays an easy-to-read dashboard of current space usage and availability. It also
provides an up-to-the-minute indication of the compression factor.

The Space Usage section shows two panes.


The first pane shows the amount of disk space available and used by file system components,
based on the last cleaning.

/data:post-comp shows:
• Size (GiB): The amount of total physical disk space available for data.
• Used: (GiB): The actual physical space used for compressed data. Warning messages go to
the system log, and an email alert is generated when the use reaches 90%, 95%, and 100%.
At 100%, the Data Domain system accepts no more data from backup hosts.
• Available (GiB): The total amount of space available for data storage. This figure can change
because an internal index may expand as the Data Domain system fills with data. The index
expansion takes space from the Avail GiB amount.
• Cleanable (GiB): The estimated amount of space that could be reclaimed if a cleaning
operation were run.

Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 46
The /ddvar line is the space reserved for system operations such as log files and upgrade tar files.
It is not a part of the data storage total.

The second Space Usage pane shows the compression factors:


• Currently Used: The amounts currently in use by the file system.
• Written in Last 24 Hours: The compression activity over the last day.

For both of these areas, the following is shown:


• Pre-Compression (GiB*): Data written before compression
• Post-Compression (GiB*): Storage used after compression
• Global-Comp Factor: Pre-Compression / (Size after global compression)
• Local-Comp Factor: (Size after global compression) / Post- Compression
• Total-Comp Factor: Pre-Compression / Post-Compression
• Reduction %: [(Pre-Compression - Post-Compression) / Pre-Compression] * 100

* The gibibyte is a standards-based binary multiple (prefix gibi, symbol Gi) of the byte, a unit of
digital information storage. The gibibyte unit symbol is GiB.[1] 1 gibibyte = 230 bytes =
1073741824bytes = 1024 mebibytes.

Note: It is important to know how these compression statistics are calculated and what they are
reporting to ensure a complete understanding of what is being reported.

You can also monitor the space usage and compression from the command line:

# filesys show space


Display the space available to, and used by, file system resources.

# filesys show compression [<filename>] [last <n> { hours | days


}] [no-sync]
# filesys show compression [tier {active | archive}] summary |
daily | daily-detailed {[last <n> { hours | days | weeks |
months }] | [start <date> [end <date>]]}
Display the space used by, and compression achieved for, files and directories in the file
system.

Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 47
The Space Usage view contains a graph that displays a visual representation of data usage for the
system.

This view is used to monitor and analyze daily activities on the Data Domain system.
• Roll over a point on a graph line to display a box with data at that point. (as shown in the
slide).
• Click Print (at the bottom on the graph) to open the standard Print dialog box.
• Click Show in a new window to display the graph in a new browser window.

The lines of the graph denote measurement for:


• Pre-comp Written—The total amount of data sent to the Data Domain system by backup
servers. Pre-compressed data on a Data Domain system is what a backup server sees as the
total uncompressed data held by a Data Domain system-as-storage unit. Shown with the
Space Used (left) vertical axis of the graph.
• Post-comp Used—The total amount of disk storage in use on the Data Domain system.
Shown with the Space Used (left) vertical axis of the graph.
• Comp Factor—The amount of compression the Data Domain system has performed with
the data it received (compression ratio). Shown with the Compression Factor (right) vertical
axis of the graph.

Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 48
The bottom of the screen also displays all three measurements when a point is rolled over on the
graph.

Note: In this example, 16.9 GiB was ingested while only 643.5 MiB was used to store the data for a
total compression factor of 26.8x.

The view can be set to various durations between 7 and 120 days.

You can also monitor the file system from the command line:

# filesys show compression [<filename>] [last <n> { hours | days


}] [no-sync]
# filesys show compression [tier {active | archive}] summary |
daily | daily-detailed {[last <n> { hours | days | weeks |
months }] | [start <date> [end <date>]]}
Display the space used by, and compression achieved for, files and directories in the file
system.

# filesys show space


Show the file system space report

Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 49
The Space Consumption view contains a graph that displays the space used over time, shown in
relation to total system capacity.

With the Capacity option unchecked (see circled on the slide), the scale is reduced from TiB to GiB
in order to present a clear view of space used. In this example, only 2.1 GiB post-comp has been
stored with a 7.5 TiB capacity. See the next slide to see the consumption view with the capacity
indicator.

This view is useful to note trends in space availability on the Data Domain system, such as changes
in space availability and compression in relation to cleaning processes.
• Roll over a point on a graph line to display a box with data at that point.
• Click Print (at the bottom on the graph) to open the standard Print dialog box.
• Click Show in a new window to display the graph in a new browser window.

Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 50
The lines of the graph show measurements for:
• Capacity (not shown) — The total amount of disk storage available for data on the Data
Domain system. The amount is shown on the Space Used (left) vertical axis of the graph.
Clicking the Capacity checkbox changes the view of space between GiB and TiB. The
capacity on the example system is 7.5 TiB and does not show the capacity line in this
smaller view.
• Post-comp (as shown in the larger shaded area in the graph) — The total amount of disk
storage in use on the Data Domain system. This is shown with the Space Used (left) vertical
axis of the graph.
• Comp Factor (as shown in the slide as a single black line on the graph) — The amount of
compression the Data Domain system has performed with the data it received
(compression ratio). This is shown on the Compression Factor (right) vertical axis of the
graph.
• Cleaning — A grey vertical line appears on the graph each time a file system cleaning
operation was started. Roll over a data line representing cleaning to see the date and time
cleaning was started and the duration of the process.
• Data Movement (not shown) — The amount of disk space moved to the archiving storage
area (if the Archive license is enabled).

You can change the interval of time represented on the graph by clicking a different duration, up to
120 days. 30 days is the default duration.

Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 51
When the capacity option is checked, the display scales to TiB, and a line at the maximum capacity
of 7.5 TiB appears.

When you roll over the capacity line, an indicator will show the capacity details as shown in this
screenshot.

Notice that at this scale, the 666.0 MiB Post-Comp data mark on February 5, does not show on the
graph.

Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 52
The Daily Written view contains a graph that displays a visual representation of data that is written
daily to the system over a period of time, selectable from 7 to 120 days. The data amounts are
shown over time for pre- and post-compression amounts.

It is useful to see data ingestion and compression factor results over a selected duration. You
should be able to notice trends in compression factor and ingestion rates.

It also provides totals for global and local compression amounts, and pre-compression and post-
compression amounts:
• Roll over a point on a graph line to display a box with data at that point.
• Click Print (at the bottom on the graph) to open the standard Print dialog box.
• Click Show a in new window to display the graph in a new browser window.

Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 53
The lines of the graph show measurements for:
• Pre-Comp—The total amount of data written to the Data Domain system by backup hosts.
Pre-compressed data on a Data Domain system is what a backup host sees as the total
uncompressed data held by a Data Domain system-as-storage-unit.
• Post-Comp—The total amount of data written to the Data Domain system after
compression has been performed, as shown in GiBs.
• Total Comp—The total amount of compression the Data Domain system has performed
with the data it received (compression ratio). Shown with the Total Compression Factor
(right) vertical axis of the graph.

You can change the interval of time represented on the graph by clicking a different duration, up to
120 days. 30 days is the default duration.

Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 54
Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 55
Copyright © 2014 EMC Corporation. All rights reserved Module 5: File System and Data Management 56
Replication of deduplicated, compressed data offers the most economical approach to the
automated movement of data copies to a safe site using minimum WAN bandwidth. This
ensures fast recovery in case of loss of the primary data, the primary site or the secondary
store.

Copyright © 2014 EMC Corporation. All rights reserved Module 6: Data Replication and Recovery 1
This lesson is an overview of Data Domain replication types and topologies, configuration, and
seeding replication.

Copyright © 2014 EMC Corporation. All rights reserved Module 6: Data Replication and Recovery 2
Data Domain systems are used to store backup data onsite for a short period such as 30, 60 or
90 days, depending on local practices and capacity. Lost or corrupted files are recovered easily
from the onsite Data Domain system since it is disk-based, and files are easy to locate and read
at any time.

In the case of a disaster that destroys the onsite data, the offsite replica is used to restore
operations. Data on the replica is immediately available for use by systems in the disaster
recovery facility. When a Data Domain system at the main site is repaired or replaced, the data
can be recovered using a few simple recovery configuration and initiation commands.

You can quickly move data offsite (with no delays in copying and moving tapes). You don’t have
to complete replication for backups to occur. Replication occurs in real time.

Replication typically consists of a source Data Domain system (which receives data from a
backup system), and one or more destination Data Domain systems.

Replication duplicates backed-up data over a WAN after it has been deduplicated and
compressed. Replication creates a logical copy of the selected source data post-deduplication,
and only sends any segments that do not already exist on the destination. Network demands
are reduced during replication because only unique data segments are sent over the network.

Copyright © 2014 EMC Corporation. All rights reserved Module 6: Data Replication and Recovery 3
Replication provides a secondary copy replicated (usually) to an offsite location for:
• Disaster recovery
• Remote office data protection
• Multiple site tape consolidation

After you configure replication between a source and destination, only new data written to the
source is automatically replicated to the destination. Data is deduplicated at the source and at
the destination. All offsite replicated data is recoverable online, reducing the amount of time
needed for recovering from data loss.

The replication process is designed to deal with network interruptions common in the WAN
and to recover gracefully with very high data integrity and resilience. This ensures that the data
on the replica is in a state usable by applications – a critical component for optimizing the
utility of the replica for data recovery and archive access.

A Data Domain system is able to perform normal backup and restore operations and
replication, simultaneously.

Replication is a software feature that requires an additional license. You need a replicator
license for both the source and destination Data Domain systems.

Copyright © 2014 EMC Corporation. All rights reserved Module 6: Data Replication and Recovery 4
Defining a replication source and destination is called a “pair.” A source or a destination in the
replication pair is referred to as a context. The context is defined in both the source and
destination Data Domain systems paired for replication.

A replication context can also be termed a “replication stream,” and although the use case is
quite different, the stream resource utilization within a Data Domain system is roughly
equivalent to a read stream (for a source context) or a write stream (for a destination context).

The count of replication streams per system depends upon the processing power of the Data
Domain system on which they are created. Lesser systems can handle no more than 15 source
and 20 destination streams, while the most powerful Data Domain system can handle over 200
streams.

Copyright © 2014 EMC Corporation. All rights reserved Module 6: Data Replication and Recovery 5
Data Domain supports various replication topologies in which data flows from a source to a
destination directory over a LAN or WAN.

• One-to-one replication
The simplest type of replication is from a Data Domain source system to a Data Domain
destination system, otherwise known as a one-to-one replication pair. This replication
topology can be configured with directory, MTree, or collection replication types.

• Bi-directional replication
In a bi-directional replication pair, data from a directory or MTree on System A is
replicated to System B, and from another directory or MTree on System B to System A.

• One-to-many replication
In one-to-many replication data flows from a source directory or MTree on a System A
to several destination systems. You could use this type of replication to create more
than two copies for increased data protection, or to distribute data for multi-site usage.

Copyright © 2014 EMC Corporation. All rights reserved Module 6: Data Replication and Recovery 6
• Many-to-one replication
In many-to-one replication, whether with MTree or directory, replication data flows
from several source systems to a single destination system. This type of replication can
be used to provide data recovery protection for several branch offices at the corporate
headquarters IT systems.

• Cascaded replication
In a cascaded replication topology, a source directory or MTree is chained among three
Data Domain systems. The last hop in the chain can be configured as collection, MTree,
or directory replication, depending on whether the source is directory or MTree.

For example, the first DD system replicates one or more MTrees to a second DD system,
which then replicates those MTrees to a final DD system. The MTrees on the second DD
system are both a destination (from the first DD system) and a source (to the final DD
system). Data recovery can be performed from the non-degraded replication pair
context.

Copyright © 2014 EMC Corporation. All rights reserved Module 6: Data Replication and Recovery 7
Data Domain Replicator software offers four replication types that leverage the different logical
levels of the system – described in the previous slide – for different effects.

• Collection replication: This performs whole-system mirroring in a one-to-one topology,


continuously transferring changes in the underlying collection, including all of the
logical directories and files of the Data Domain file system. This type of replication is
very simple and requires fewer resources than other types; therefore it can provide
higher throughput and support more objects with less overhead.
• Directory replication: A subdirectory under /backup/ and all files and directories below
it on a source system replicates to a destination directory on a different Data Domain
system. This transfers only the deduplicated changes of any file or subdirectory within
the selected Data Domain file system directory.
• MTree replication: This is used to replicate MTrees between Data Domain systems. It
uses the same WAN deduplication mechanism as used by directory replication to avoid
sending redundant data across the network. The use of snapshots ensures that the data
on the destination is always a point-in-time copy of the source with file consistency,
while reducing replication churn, thus making WAN use more efficient. Replicating
individual directories under an MTree is not permitted with this type.
• A fourth type, managed replication, belongs to Data Domain Boost operations and is
discussed later in this course.

Copyright © 2014 EMC Corporation. All rights reserved Module 6: Data Replication and Recovery 8
Collection replication replicates the entire /data/col1 area from a source Data Domain system
to a destination Data Domain system. Collection replication uses the logging file system
structure to track replication. Transferring data in this way means simply comparing the heads
of the source and destination logs, and catching-up, one container at a time, as shown in this
diagram. If collection replication lags behind, it continues until it catches up.
The Data Domain system to be used as the collection replication destination must be empty
before configuring replication. Once replication is configured, the destination system is
dedicated to receive data only from the source system.
With collection replication, all user accounts and passwords are replicated from the source to
the destination. If the Data Domain system is a source for collection replication, snapshots are
also replicated.
Collection replication is the fastest and lightest type of replication offered by the DD OS. There
is no on-going negotiation between the systems regarding what to send. Collection replication
is mostly unaware of the boundaries between files. Replication operates on segment locality
containers that are sent after they are closed.
Because there is only one collection per Data Domain system, this is specifically an approach to
system mirroring. Collection replication is the only form of replication used for true disaster
recovery. The destination system cannot be shared for other roles. It is read-only and shows
data only from one source. After the data is on the destination, it is immediately visible for
recovery.

Copyright © 2014 EMC Corporation. All rights reserved Module 6: Data Replication and Recovery 9
Collection replication replicates the entire /data/col1 area from a source Data Domain
system to a destination Data Domain system. This is useful when all the contents being written
to the DD system need to be protected at a secondary site.

The Data Domain system to be used as the collection replication destination must be empty
before configuring replication. The destination immediately offers all backed up data, as a read-
only mirror, after it is replicated from the source.

Snapshots cannot be created on the destination of a collection replication because the


destination is read-only.

With collection replication, all user accounts and passwords are replicated from the source to
the destination.

Copyright © 2014 EMC Corporation. All rights reserved Module 6: Data Replication and Recovery 10
Data Domain Replicator software can be used with the optional Encryption of Data at Rest
feature, enabling encrypted data to be replicated using collection replication. Collection
replication requires the source and target to have the exact same encryption configuration
because the target is expected to be an exact replica of the source data. In particular, the
encryption feature must be turned on or off at both source and target and if the feature is
turned on, then the encryption algorithm and the system passphrases must also match. The
parameters are checked during the replication association phase. During collection replication,
the source system transmits the encrypted user data along with the encrypted system
encryption key. The data can be recovered at the target, because the target machine has the
same passphrase and the same system encryption key.

Collection replication topologies can be configured in the following ways.


• One-to-One Replication: This topology can be used with collection replication where
the entire /backup directory from a source Data Domain system is mirrored to a
destination Data Domain system. Other than receiving data from the source, the
destination is a read-only system.
• Cascaded Replication: In a cascaded replication topology, directory replication is
chained among three or more Data Domain systems. The last system in the chain can
be configured as collection replication. Data recovery can be performed from the non-
degraded replication pair context.

Copyright © 2014 EMC Corporation. All rights reserved Module 6: Data Replication and Recovery 11
With directory replication, a replication context pairs a directory, under
/data/col1/backup and all files and directories below it on a source system with a
destination directory on a different system. During replication, deduplication is preserved since
data segments that already reside on the destination system will not be resent across the
network. The destination directory is read-only, and it can coexist on the same system with
other replication destination directories, replication source directories, and other local
directories, all of which share deduplication in that system’s collection.

The directory replication process is triggered by a file closing on the source. In cases where file
closures are infrequent, Data Domain Replicator forces the data transfer periodically.

If the Data Domain system is a source for directory replication, snapshots within that directory
are not replicated. You must create and replicate snapshots separately.

Copyright © 2014 EMC Corporation. All rights reserved Module 6: Data Replication and Recovery 12
During directory replication, a Data Domain system can perform normal backup and restore
operations. A destination Data Domain system must have available storage capacity that is at
least the post-compressed size of the expected maximum size of the source directory. In a
directory replication pair, the destination is always read-only. In order to write to the
destination outside of replication, you must first break replication.

When replication is initialized, a destination directory is created automatically if it does not


already exist. After replication is initialized, ownership and permissions of the destination
directory are always identical to those of the source directory.

Directory replication can receive backups from both CIFS and NFS clients, but cannot not mix
CIFS and NFS data in same directory.

Directory replication supports encryption and retention lock.

Copyright © 2014 EMC Corporation. All rights reserved Module 6: Data Replication and Recovery 13
Directory replication can be configured in the following ways:
• One-to-One Replication: The simplest type of replication is from a Data Domain source
system to a Data Domain destination system.
• Bi-Directional Replication: In a bi-directional replication pair, data from the source is
replicated to the destination directory on the destination system, and from the source
directory on the destination system to destination directory on the source system. This
topology can be used only with directory replication.
• Many-to-One Replication: In many-to-one replication, data flows from several source
directory contexts to a single destination system. This type of replication occurs, for
example, when several branch offices replicate their data to the corporate
headquarters IT systems.
• One-To-Many Replication: In a one-to-many replication, multi-streamed optimization
maximizes replication throughput per context.
• Cascaded Replication: In a cascaded replication topology, directory replication is
chained among three or more Data Domain systems. Data recovery can be performed
from the non-degraded replication pair context.

Copyright © 2014 EMC Corporation. All rights reserved Module 6: Data Replication and Recovery 14
MTree replication enables the creation of disaster recovery copies of MTrees at a secondary
location by the /data/col1/mtree pathname. A Data Domain system can simultaneously
be the source of some replication contexts and the destination for other contexts. The Data
Domain system can also receive data from backup and archive applications while it is
replicating data.

One fundamental difference between MTree replication and directory replication is the
method used for determining what needs to be replicated between the source and destination.
MTree replication creates periodic snapshots at the source and transmits the differences
between two consecutive snapshots to the destination. At the destination Data Domain
system, the latest snapshot is not exposed until all of the data for that snapshot is received.
This ensures the destination is always a point-in-time image of the source Data Domain system.
In addition, files do not show out of order at the destination. This provides file-level
consistency, simplifying recovery procedures. It also reduces recovery time objectives (RTOs).
Users are also able to create a snapshot at the source Data Domain system for application
consistency (for example, after a completion of a backup), which is replicated on the
destination where the data can be used for disaster recovery.

MTree replication shares some common features with directory replication. It uses the same
WAN deduplication mechanism as used by directory replication to avoid sending redundant
data across the network. It also supports the same topologies that directory replication
supports. Additionally, you can have directory and MTree contexts on the same pair of systems.

Copyright © 2014 EMC Corporation. All rights reserved Module 6: Data Replication and Recovery 15
The destination of the replication pair is read-only.

The destination must have sufficient available storage to avoid replication failures.

CIFS and NFS clients should not be used within the same MTree.

MTree replication duplicates data for an MTree specified by the /data/col1/mtree pathname –
including the destination MTree.

Some replication command options with MTree replication may target a single replication pair
(source and destination directories) or may target all pairs that have a source or destination on
the Data Domain system.

MTree replication is usable with encryption and Data Domain Retention Lock Compliance on an
MTree-level at the source that is replicated to the destination.

Copyright © 2014 EMC Corporation. All rights reserved Module 6: Data Replication and Recovery 16
• A destination Data Domain system must have available storage capacity that is at least
the post-compressed size of the expected maximum size of the source MTree.
• A destination Data Domain system can receive backups from both CIFS clients and NFS
clients as long as they are separate.
• MTree replication can receive backups from both CIFS and NFS clients – each in their
own replication pair. (But not in the same MTree.)
• When replication is initialized, a destination MTree is created automatically – it cannot
already exist.
• After replication is initialized, ownership and permissions of the destination MTree are
always identical to those of the source MTree.
• At any time, due to differences in global compression, the source and destination
MTree can differ in size.
• MTree replication supports 1-to-1, bi-directional, one-to-many, many-to-one, and
cascaded replication topologies.

Copyright © 2014 EMC Corporation. All rights reserved Module 6: Data Replication and Recovery 17
Replication is a major feature that takes advantage of MTree structure on the Data Domain
system. MTree structure and flexibility provides greater control over its data being replicated.
Careful planning of your data layout will allow the greatest flexibility when managing data
under an MTree structure.

MTree replication works only at the MTree level. If you want to implement MTree replication,
you must move data from the existing directory structure within the /backup MTree to a new
or existing MTree, and create a replication pair using that MTree.

For example, suppose that a Data Domain system has shares mounted in locations under
/backup as shown in the directory-based layout.

If you want to use MTree replication for your production (prod) data, but are not interested in
replicating any of the development (dev) data, the data layout can be modified to create two
MTrees: /prod and /dev, with two directories within each of them. The old shares would
then be deleted and new shares created for each of the four new subdirectories under the two
new MTrees. This would look like the structure shown in the MTree-based layout.

The Data Domain system now has two new MTrees, and four shares as earlier. You can set up
MTree replication for the /prod MTree to replicate all of your production data and not set up
replication for the /dev MTree as you are not interested in replicating your development data.

Copyright © 2014 EMC Corporation. All rights reserved Module 6: Data Replication and Recovery 18
If the source Data Domain system has a high volume of data prior to configuring replication,
the initial replication seeding can take some time over a slow link. To expedite the initial
seeding, you can bring the destination system to the same location as the source system to use
a high-speed, low-latency link.

After data is initially seeded using the high-speed network, you then move the system back to
its intended location.

After data is initially seeded, only new data is sent from that point onwards.

All replication topologies are supported for this process, which is typically performed using
collection replication.

Copyright © 2014 EMC Corporation. All rights reserved Module 6: Data Replication and Recovery 19
This lesson shows how to configure replication using DD System Manager, including low-
bandwidth optimization (LBO), encryption over wire, using a non-default connection port, and
setting replication throttle.

Copyright © 2014 EMC Corporation. All rights reserved Module 6: Data Replication and Recovery 20
To create a replication pair in the System Manager:
1. Navigate to Replication > Summary and click Create Pair.
2. Select the type of replication you want to configure: Directory, Collection or MTree
using the Replication Type dropdown menu.
3. Select the source system hostname from the Source System dropdown menu. Enter the
hostname of the source system, if it is not listed.
4. Select the destination system hostname from the Destination System menu. Enter the
hostname of the destination system, if it is not listed.
5. Enter the source path in the Source Path field.
Notice that the source path changes depending on the type of replication chosen. Since
directory replication is chosen, the source path will begin with /backup. If MTree is
chosen, the source path will begin with /data/col1 and for collection replication, it
will simply identify the entire system.
6. Enter the destination path in the Destination Path field.
Notice that the source and destination paths change depending on the type of
replication chosen. Since directory replication is chosen, the source and destination
paths begin with /backup. If MTree is chosen, the source and destination paths begin
with /data/col1 and for collection replication, it identifies the entire system.

Copyright © 2014 EMC Corporation. All rights reserved Module 6: Data Replication and Recovery 21
You can also configure replication from the command line:
# replication add source <source> destination <destination>
[low-bw-optim {enabled | disabled}] [encryption {enabled |
disabled}] [propagate-retention-lock {enabled | disabled}]
[ipversion {ipv4 | ipv6}]
Creates a replication pair

# replication break {<destination> | all}


Removes the source or destination DD system from a replication pair

# replication disable {<destination> | all}


Disables replication

# replication enable {<destination> | all}


Enables replication

# replication initialize <destination>


Initialize replication on the source (configure both source and destination first)

# replication modify <destination> {source-host |


destination-host} <new-host-name>
# replication modify <destination> connection-host <new-host-
name> [port <port>]
# replication modify <destination> low-bw-optim {enabled |
disabled}
# replication modify <destination> encryption {enabled |
disabled}
# replication modify <destination> ipversion {ipv4 | ipv6}
Modifies connection host, hostname, encryption, and LBO

# replication option reset {bandwidth|delay|listen-port


|default-sync-alert-threshold}
Resets system bandwidth

# replication option set {bandwidth | default-sync-alert-


threshold | delay | listen-port} <value>
Sets variable rates such as bandwidth default sync alert threshold, delay, and listen-port

Copyright © 2014 EMC Corporation. All rights reserved Module 6: Data Replication and Recovery 22
Low bandwidth optimization (LBO) is an optional mode that enables remote sites with limited
bandwidth to replicate and protect more of their data over existing networks. LBO:
• Can optionally reduce WAN bandwidth utilization.
• Is useful if file replication is being performed over a low-bandwidth WAN link.
• Provides additional compression during data transfer.
• Is recommended only for file replication jobs that occur over WAN links with less than 6
Mb/s of available bandwidth. Do not use this option if maximum file system write
performance is required.

LBO can be applied on a per-context basis to all file replication jobs on a system.

Additional tuning might be required to improve LBO functionality on your system. Use
bandwidth and network-delay settings together to calculate the proper TCP buffer size, and set
replication bandwidth for replication for greater compatibility with LBO.

Copyright © 2014 EMC Corporation. All rights reserved Module 6: Data Replication and Recovery 23
Delta compression is a global compression algorithm that is applied after identity filtering. The
algorithm looks for previous similar segments using a sketch-like technique that sends only the
difference between previous and new segments. In this example, segment S1 is similar to S16.
The destination can ask the source if it also has S1. If it does, then it needs to transfer only the
delta (or difference) between S1 and S16. If the destination doesn’t have S1, it can send the full
segment data for S16 and the full missing segment data for S1.

Delta comparison reduces the amount of data to be replicated over low-bandwidth WANs by
eliminating the transfer of redundant data found with replicated, deduplicated data. This
feature is typically beneficial to remote sites with lower-performance Data Domain models.

Replication without deduplication can be expensive, requiring either physical transport of tapes
or high capacity WAN links. This often restricts it to being feasible for only a small percentage
of data that is identified as critical and high value.

Reductions through deduplication make it possible to replicate everything across a small WAN
link. Only new, unique segments need to be sent. This reduces WAN traffic down to a small
percentage of what is needed for replication without deduplication. These large factor
reductions make it possible to replicate over a less-expensive, slower WAN link or to replicate
more than just the most critical data.

As a result, the lag is as small as possible.

Copyright © 2014 EMC Corporation. All rights reserved Module 6: Data Replication and Recovery
LBO is enabled on a per-context basis. LBO must be enabled on both the source and
destination Data Domain systems. If the source and destination have incompatible LBO
settings, LBO will be inactive for that context. This feature is configurable in the Create
Replication Pair settings in the Advanced Tab.

To enable LBO on a Data Domain system using the System Manager:


1. In the System Manager, navigate to Replication > Summary.
2. Click the Create Pair to create a new replication pair or select a replication pair from
the list and click Modify Settings.
3. Click the Advanced tab and select the checkbox Use Low Bandwidth Optimization.
4. Click OK when finished.

Key points of LBO:


• Must be enabled on both source and destination.
• Can be monitored through the Data Domain System Manager.
• Encrypted replication uses the ADH-AES256-SHA cipher suite.

Copyright © 2014 EMC Corporation. All rights reserved Module 6: Data Replication and Recovery 25
You can also enable LBO from the command line:
# replication add source <source> destination <destination>
[low-bw-optim {enabled | disabled}] [encryption {enabled |
disabled}] [propagate-retention-lock {enabled | disabled}]
[ipversion {ipv4 | ipv6}]
Add a replication pair

# replication modify <destination> low-bw-optim {enabled |


disabled}
Modify the source or destination host name, the connection host, or the context
attributes.

Copyright © 2014 EMC Corporation. All rights reserved Module 6: Data Replication and Recovery 26
Encryption over wire or live encryption is supported as an advanced feature to provide further
security during replication. This feature is configurable in the Create Replication Pair settings in
the Advanced tab.

To enable encrypted file replication on a Data Domain system using the System Manager:
1. In the System Manager, navigate to Replication > Summary.
2. Click Create Pair to create a new replication pair or select a replication pair from the list
and click Modify Settings.
3. Click the Advanced tab and select the checkbox Enable Encryption Over Wire.
4. Click OK when finished.

It is important to note, when configuring encrypted file replication, that it must be enabled on
both the source and destination Data Domain systems. Encrypted replication uses the ADH-
AES256-SHA cipher suite and can be monitored through the Data Domain System Manager.

Copyright © 2014 EMC Corporation. All rights reserved Module 6: Data Replication and Recovery 27
You can also enable encryption Over Wire from the command line:
# replication add source <source> destination <destination>
[low-bw-optim {enabled | disabled}] [encryption {enabled |
disabled}] [propagate-retention-lock {enabled | disabled}]
[ipversion {ipv4 | ipv6}]
Add a replication pair

# replication modify <destination> encryption {enabled |


disabled}
Modify the source or destination host name, the connection host, or the context
attributes.

Copyright © 2014 EMC Corporation. All rights reserved Module 6: Data Replication and Recovery 28
The source system transmits data to a destination system listen port. As a source system can
have replication configured for many destination systems (each of which can have a different
listen port), each context on the source can configure the connection port to the
corresponding listen port of the destination.

To change the connection or listen port on a Data Domain system using the System Manager:
1. In the System Manager, navigate to Replication > Summary.
2. Click Create Pair to create a new replication pair or select a replication pair from the list
and click Modify Settings.
3. Click the Advanced tab and select the checkbox Use Non-default Connection Host.
4. Change the listen Port to a new value.
5. Click OK when finished.

You can also use the command line to set the connection or listen port:
# replication option set listen-port <value>
Sets the listen port for the Data Domain system.
# replication modify <destination> connection-host <new-host-
name> [port <port>]
Modify the source or destination host name, the connection host, or the context
attributes.

Copyright © 2014 EMC Corporation. All rights reserved Module 6: Data Replication and Recovery 29
The Throttle Settings area shows the current settings for:
• Temporary Override: If configured, shows the throttle rate, or 0 which means all
replication traffic is stopped.
• Permanent Schedule: Shows the time for days of the week on which scheduled
throttling occurs.

To add throttle settings:


1. Click the Replication > Advanced Settings tabs, and click Add Throttle Setting.
The Add Throttle Setting dialog box appears.
2. Set the days of the week that throttling is active by clicking the checkboxes next to the
days.
3. Set the time that throttling starts with the Start Time selectors for the hour, minute and
A.M./P.M.

In the Throttle Rate area:


1. Click Unlimited to set no limits.
2. Enter a number in the text entry box (for example, 20000) and select the rate from the
drop-down menu (bps, Bps, Kibps, or KiBps).
3. Select the 0 Bps (Disabled) option to disable all replication traffic.
4. Click OK to set the schedule.

Copyright © 2014 EMC Corporation. All rights reserved Module 6: Data Replication and Recovery 30
The new schedule is shown in the Throttle Settings Permanent Schedule area. Replication runs
at the given rate until the next scheduled change or until a new throttle setting forces a
change.

You can also use the command line to enable and modify throttle settings:

# replication throttle add <sched-spec> <rate>


Add a throttle schedule

# replication throttle del <sched-spec>


Delete a throttle schedule

# replication throttle reset {current | override | schedule


| all}
Reset (to default) throttle configuration

# replication throttle set current <rate>


Set a current override

# replication throttle set override <rate>


Set a permanent override

# replication throttle show [KiB]


Show throttle configuration

Copyright © 2014 EMC Corporation. All rights reserved Module 6: Data Replication and Recovery 31
This lesson covers the following two reports; the Replication Summary report and the
Replication Status report.

Copyright © 2014 EMC Corporation. All rights reserved Module 6: Data Replication and Recovery 32
Data Domain System Manager allows you to generate reports to track space usage on a Data
Domain system for a period of up to two years back. In addition, you can generate reports to
help understand replication progress. You can view reports on file systems daily and
cumulatively, over a period of time.
Access the Reports view by selecting the Reports stack in the left-hand column of the Data
Domain System Manager beneath the listed Data Domain systems.
The Reports view is divided into two sections. The upper section allows you to create various
space usage and replication reports. The lower section allows you to view and manage saved
reports.
The reports display historical data, not real-time data. After the report is generated, the charts
remain static and do not update.
The replication status reports includes the status of the current replication job running on the
system. This report is used to provide a snapshot of what is happening for all replication
contexts, to help you understand the overall replication status on a Data Domain System.

The replication summary reports includes network-in and network-out usage for all
replication, in addition to per-context levels on the system during the specified duration. This
report is used to analyze network utilization during the replication process to help understand
the overall replication performance on a Data Domain system.

Copyright © 2014 EMC Corporation. All rights reserved Module 6: Data Replication and Recovery 33
The replication status report generates a summary of all replication contexts on a given Data
Domain system with the following information:
• ID: the context number or designation or a particular context. The context number is
used for identification; 0 is reserved for collection replication, and directory replication
numbering begins at 1.
• Source > Destination: The path between both Data Domain systems in the context.
• Type: The type of replication context, will be Directory, MTree, or Collection .
• Status: Error or Normal.
• Sync as of Time: Time and date stamp of the most recent sync.
• Estimated Completion: The estimated time at which the current replication operation
should be complete.
• Pre-Comp Remaining: The amount of storage remaining pre-compression (applies only
to collection contexts)
• Post-Comp Remaining: The amount of storage remaining post-compression (applies
only to directory, MTree, and collection contexts).

If an error exists in a reported context, a section called “Replication Context Error Status” is
added to the report. It includes the ID, source/destination, the type, the status, and a
description of the error.

Copyright © 2014 EMC Corporation. All rights reserved Module 6: Data Replication and Recovery 34
The last section of the report is the Replication Destination Space Availability, showing the
destination system name and the total amount of storage available in GiB.

Related CLI command:


# replication show performance {<obj-spec-list> | all}
[interval <sec>] [count <count>]
Displays current replication activity.

Copyright © 2014 EMC Corporation. All rights reserved Module 6: Data Replication and Recovery 35
Copyright © 2014 EMC Corporation. All rights reserved Module 6: Data Replication and Recovery 36
Onsite Data Domain systems are typically used to store backup data onsite for short periods
such as 30, 60, or 90 days, depending on local practices and capacity. Lost or corrupted files are
recovered easily from the onsite Data Domain system since it is disk-based, and files are easy
to locate and read at any time.

In the case of a disaster destroying onsite data, the offsite replica is used to restore operations.
Data on the replica is immediately available for use by systems in the disaster recovery facility.
When a Data Domain system at the main site is repaired or replaced, the data can be
recovered using a few simple recovery configuration and initiation commands.

If something occurs that makes the source replication data inaccessible, the data can be
recovered from the offsite replica. Either collection or directory replicated data can be
recovered to the source. During collection replication, the destination context must be fully
initialized for the recover process to be successful. Recover a selected data set if it becomes
necessary to recover one or more directory replication pairs.

Note: If a recovery fails or must be terminated, the replication recovery can be aborted.

Copyright © 2014 EMC Corporation. All rights reserved Module 6: Data Replication and Recovery 37
For directory replication:
1. Go to Replication > More Tasks > Start Recovery.
2. Select the replication type.
3. In the Recovery Details section, select the system to recover to.
4. In the Recovery Details section, select the system to recover from.
5. Select the appropriate context if more than one is listed.
6. Click OK.

Note: A replication recover cannot be performed on a source context whose path is the source
path for other contexts; the other contexts first need to be broken and then resynchronized
after the recovery is complete.

If a recovery fails or must be terminated, the replication recover can be aborted.

Recovery on the source should be restarted again as soon as possible by restarting the
recovery.
1. Click the More menu and select Abort Recover. The Abort Recover dialog box appears,
showing the contexts that are currently performing recovery.
2. Click the checkbox of one or more contexts to abort from the list.
3. Click OK.

Copyright © 2014 EMC Corporation. All rights reserved Module 6: Data Replication and Recovery 38
Resynchronization is the process of recovering (or bringing back into sync) the data between a
source and destination replication pair after a manual break in replication. The replication pair
are resynchronized so both endpoints contain the same data.

Resynchronization can be used:


• To convert a collection replication to directory replication.
This is useful when the system is to be a source directory for cascaded replication. A
conversion is started with a replication resynchronization that filters all data from the
source Data Domain system to the destination Data Domain system. This implies that
seeding can be accomplished by first performing a collection replication, then breaking
collection replication, then performing a directory replication resynchronization.
• To re-create a context that was lost or deleted.
• When a replication destination runs out of space and the source system still has data to
replicate.

Copyright © 2014 EMC Corporation. All rights reserved Module 6: Data Replication and Recovery 39
To resynchronize a replication pair:
1. Break existing replication by selecting the source Data Domain system, and choosing
Replication. Select the context to break, and select Delete Pair and click OK.
2. From either the source or the destination replication system, click the More menu and
select Start Resync. The Start Resync dialog box appears.
3. Select the source system hostname from the Source System menu.
4. Select the destination system hostname from the Destination System menu.
5. Enter the directory path in the Source text box.
6. Enter the directory path in the Destination text box.
7. Click OK.

This process adds the context back to both the source and destination DDRs and starts the
resync process. The resync process can take between several hours and several days,
depending on the size of the system and current load factors.

You can also run the resynchronization process from the command line:

# replication resync <destination>


Resynchronize replication between the source and destination (configure both source
and destination first)

Copyright © 2014 EMC Corporation. All rights reserved Module 6: Data Replication and Recovery 40
Copyright © 2014 EMC Corporation. All rights reserved Module 6: Data Replication and Recovery 41
Copyright © 2014 EMC Corporation. All rights reserved Module 6: Data Replication and Recovery 42
In this module, you learn about things to consider when planning, configuring, and managing
a virtual tape library (VTL).

Copyright © 2014 EMC Corporation. All rights reserved Module 7: Tape Library and VTL Concepts 1
In this lesson, you become familiar with the virtual tape library (VTL) environment that is
configurable on a Data Domain system.

Copyright © 2014 EMC Corporation. All rights reserved Module 7: Tape Library and VTL Concepts 2
In some environments, the Data Domain system is configured as a virtual tape library (VTL).
This practice may be motivated by the need to leverage existing backup policies that were
built using a strategy of physical tape libraries. Using a VTL can be an intermediate step in a
longer range migration plan toward disk-based media for backup. It might also be driven by
the need to minimize the effort to recertify a system to meet compliance needs.

A Fibre Channel HBA-equipped host connecting to an FC SAN can ultimately connect to a


Fibre Channel HBA-equipped Data Domain system. When properly zoned, the host can send
its backups via VTL protocol directly to the Data Domain system as if the Data Domain system
were an actual tape library complete with drives, robot, and tapes.

This host can be a Windows, Linux, Unix, Solaris, IBM i, NetApp, VNX, or any NAS that
support having a Fibre Channel card in it.

Virtual tape libraries emulate the physical tape equipment and function. Virtual tape drives
are accessible to backup software in the same way as physical tape drives. Once drives are
created in the VTL, they appear to the backup software as SCSI tape drives. A virtual tape
library appears to the backup software as a SCSI robotic device accessed through standard
driver interfaces.

Copyright © 2014 EMC Corporation. All rights reserved Module 7: Tape Library and VTL Concepts 3
When disaster recovery is needed, pools and tapes can be replicated to a remote Data
Domain system using the Data Domain replication process and later archived to tape.

Data Domain systems support backups over the SAN via Fibre Channel HBA. The backup
application on the backup host manages all data movement to and from Data Domain
systems. The backup application also directs all tape creation. Data Domain replication
operations manages virtual tape replication, and vaulting. The Data Domain System Manager
is used to configure and manage tape emulations.

Copyright © 2014 EMC Corporation. All rights reserved Module 7: Tape Library and VTL Concepts 4
NDMP (Network Data Management Protocol) is an open-standard protocol for enterprise-
wide backup of heterogeneous network-attached storage. NDMP was co-invented by
Network Appliance and PDC Software (acquired by Legato Systems, Inc., and now part of
EMC).

Data Domain systems support backups using NDMP over TCP/IP via standard Ethernet as an
alternate method. This offers a VTL solution for remote office/back office use.

Data servers configured only with Ethernet can also back up to a Data Domain VTL when
used with an NDMP tape server on the Data Domain system. The backup host must also be
running NDMP client software to route the server data to the related tape server on the Data
Domain system.

When a backup is initiated, the host tells the server to send its backup data to the Data
Domain VTL tape server. Data is sent via TCP/IP to the Data Domain system where it is
captured to virtual tape and stored.

While this process can be slower than Fibre Channel speeds, a Data Domain can function as
an NDMP tape server in an NDMP environment over IP.

Copyright © 2014 EMC Corporation. All rights reserved Module 7: Tape Library and VTL Concepts 5
A Data Domain virtual tape library (VTL) offers a simple integration, leveraging existing
backup policies. A Data Domain VTL can leverage existing backup policies in a backup system
currently using a strategy of physical tape libraries.

Any Data Domain system running VTL can also run other backup operations using NAS,
NDMP, and DD Boost simultaneously.

A Data Domain VTL eliminates the use of tape and the accompanying tape-related issues
(large physical storage requirement, off-site transport, high time to recovery, and tape shelf
life) for the majority of restores. Compared to normal tape technology, a Data Domain VTL
provides resilience in storage through the benefits of Data Invulnerability Architecture (DIA)
(end-to-end verification, fault avoidance and containment, continuous fault detection and
healing, and file system recoverability).

Compared to physical tape libraries, Data Domain systems configured for VTL, simplify and
speeds up backups through the use of deduplication technology. Backups are also speedier
with the use of virtual tape does not need to wind, rewind, or position to a particular spot.
Robotic movement of tapes is also eliminated, which speeds up the overall performance of
the tape backup.

Disk-based network storage provides a shorter RTO by eliminating the need for handling,
loading, and accessing tapes from a remote location.

Copyright © 2014 EMC Corporation. All rights reserved Module 7: Tape Library and VTL Concepts 6
Different tape library products may package some components in different ways, and the
names of some elements may differ among products, but the fundamental function is
basically the same. The Data Domain features VTL configuration including tape libraries,
tapes, cartridge access ports, and barcodes.

• Access Group (VTL Group)


A collection (list) of initiator worldwide port names (WWPNs) or initiator names and
the drives and changers they are allowed to access. It is the equivalent of LUN
masking. For multiple hosts to use the same devices, the Data Domain Storage
System requires you to create different access groups for each host. A group consists
of exactly one host (initiator) , one or more target FC ports on the Data Domain
Storage System, and one or more devices. The Data Domain Storage System does not
permit multiple hosts to access the same group.

• Barcode
A unique ID for a virtual tape. Barcodes are assigned when the user creates the virtual
tape cartridge.

Copyright © 2014 EMC Corporation. All rights reserved Module 7: Tape Library and VTL Concepts 7
• CAP
An abbreviation for cartridge access port. A CAP enables the user to deposit and
withdraw volumes in an autochanger without opening the door to the autochanger.
In a VTL, a CAP is the emulated tape enter/eject point for moving tapes to or from a
library.
Also called: mail slot.

• Changer (Tape Backup Medium Changer)


The device that handles the tape between a tape library and the tape drive. In the
virtual tape world, the system creates an emulation of a specific type of changer.

Although no tapes are physically moved within the Data Domain VTL system, the
virtual tape backup medium changer must emulate the messages your backup
software expects to see when tapes are moved to and from the drives. Selecting and
using the incorrect changer model in your VTL configuration causes the system to
send incorrect messages to the backup software, which can cause the VTL system to
fail.

• Initiator
Any Data Domain Storage System client’s HBA WWPN. An initiator name is an alias
that maps to a client’s WWPN.

• Library
A collection of magnetic tape cartridges used for long-term data backup. A virtual
tape library emulates a physical tape library with tape drives, changer, CAPs, and slots
(cartridge slots).
Also called: autoloader, tape silo, tape mount, tape jukebox, vault.

• Pool
A collection of tapes that maps to a directory on a file system, used to replicate tapes
to a destination.

Note: Data Domain pools are not the same as backup software pools. Most backup
software, including EMC NetWorker, has its own pooling mechanism.

Copyright © 2014 EMC Corporation. All rights reserved Module 7: Tape Library and VTL Concepts 8
• Slot
A storage location within a library. For example, a tape library has one slot for each
tape that the library can hold.

• Tape
A cartridge holding magnetic tape used to store data long term. Tapes are virtually
represented in a system as grouped data files. The user can export/import from a
vault to a library, and move within a library across drives, slots, and CAPs.
Also called: cartridge.

• Tape Drive
The device that records backed-up data to a tape cartridge. In the virtual tape world,
this drive still uses the same Linear Tape-Open (LTO) technology standards as physical
drives with the following capacities:
 LTO-1: 100 GB per tape
 LTO-2: 200 GB per tape
 LTO-3: 400 GB per tape
 LTO-4: 800 GB per tape

Copyright © 2014 EMC Corporation. All rights reserved Module 7: Tape Library and VTL Concepts 9
• There are additional generations of LTO, but only LTO -1, -2, -3, and -4 are currently
supported by Data Domain. Each drive operates as a single data stream on your
network.

• Vault
A holding place for tapes not currently in any library. Tapes in the vault eventually
have to be inserted into the tape library before they can be used.

Copyright © 2014 EMC Corporation. All rights reserved Module 7: Tape Library and VTL Concepts 10
In this lesson, you become familiar with the evaluation process to determine the capacity
and throughput requirements of a Data Domain system.

This lesson is intended to be a simplified overview of Data Domain VTL configuration


planning. Typically, any production Data Domain system running VTL has been assessed,
planned, and configured by Data Domain implementation experts prior to installation and
production.

Copyright © 2014 EMC Corporation. All rights reserved Module 7: Tape Library and VTL Concepts 11
In setting up a virtual tape library (VTL) on a Data Domain system, you configure parameters
in the environment to structure the number and size of elements within each library. The
parameters you choose are dictated by the tape technology and library you are emulating.
Efficiencies are dictated by the processing power and storage capacity of the Data Domain
restorer being used as the VTL systems. Larger, faster systems allow more streams to write to
a higher number of virtual tape drives, thus providing faster virtual tape backups.

Libraries: All systems are currently limited to a maximum of 64 libraries, (64 concurrently
active VTL instances on each Data Domain system).

Drives: Up to 540 tape drives are supported, depending on the Data Domain model. A
DD6xx, model can have a maximum of 64 drives. A DD890 model can have a maximum of
256 drives.

Note: Although a DD890 can configure up to 256 tape devices, the system is limited to a
maximum stream limit of 180 streams. Additional drives beyond the 180 can be configured
for provisioning per backup policies.

Initiators: A maximum of 92 initiator names or WWPNs can be added to a single access


group.

Copyright © 2014 EMC Corporation. All rights reserved Module 7: Tape Library and VTL Concepts 12
Slots: The maximum numbers of slots in the library are:
• 32,000 slots per library
• 64,000 slots per system
• The system automatically adds slots to keep the number of slots equal to or greater
than the number of drives.

CAPs: The maximum numbers of cartridge access ports (CAPs) are:


• 100 CAPs per library
• 2000 CAPs per system

Tapes: Can be configured to 4000 GiB per tape.

Note: The information presented on this slide indicates some of the maximum capacities for
the various features in a Data Domain VTL configuration. Your backup host may not support
these capacities. Refer to your backup host software support for correct sizing and capacity
to fit your software.

Understand that the Data Domain VTL is scalable and should accommodate most
configurations. Standard practices suggest creating only as many tape cartridges as needed
to satisfy backup requirements, and enough slots to hold the number of tapes you create.
Creating additional slots is not a problem. The key in good capacity planning is to not be
excessive beyond the system needs and add capacity as needed.

For further information about the definitions and ranges of each parameter, consult the DD
OS 5.4 System Administration Guide and the most current VTL Best Practices Guide. Both are
available through the Data Domain Support Portal.

Copyright © 2014 EMC Corporation. All rights reserved Module 7: Tape Library and VTL Concepts 13
As you plan your VTL configuration, be sure to give special consideration to the following:
• VTL License
VTL is a licensed feature of the Data Domain system. Only one license is needed to
back up to a Data Domain configured for VTL.

• Fibre Channel Hardware Considerations


There are many 4 GB and 8 GB Fibre Channel port solutions for target mode Fibre
Channel attachment. All connections to these ports should be via a Fibre Channel
switch or direct attachment of a device. Check the DD OS 5.2 Backup Compatibility
Guide found in the Data Domain Support Portal to see if a specific Fibre Channel HBA
card is supported. The DD OS 5.2 Backup Compatibility Guide indicates which driver
and DD OS versions are required.

• Fibre Channel Switch Compatibility


Data Domain systems can be connected to hosts through FC switches or directors.
When adding or changing a switch/director, consult the DD OS 5.2 Backup
Compatibility Guide found in the Data Domain Support Portal to determine
compatibility and the firmware, DD OS version, and type of support (VTL, IBM i, or
gateway) it offers prior to installation and use.

Copyright © 2014 EMC Corporation. All rights reserved Module 7: Tape Library and VTL Concepts 14
When you establish fabric zones via FC switches, the best way to avoid problems with VTL
configurations is to include only one initiator and one target port in one zone. Avoid having
any other targets or initiators in any zones that contain a gateway target HBA port.

The following recommendations apply when connecting the Data Domain system to a
backup host via Fibre Channel:
• Only initiators that need to communicate with a particular set of VTL target ports on a
Data Domain system should be zoned with that Data Domain system.
• The host-side FC port must be dedicated to Data Domain VTL devices.
• All host-side FC HBAs should be upgraded to the latest driver version for the OS being
used. If you are uncertain about compatibility with your FC HBAs installed in an
application server and operating as initiators for VTL, consult the DD OS 5.2 Backup
Compatibility Guide, available on the Support Portal – or contact Support for
assistance and advice.
• When establishing fabric zones via FC switches, the best way to avoid problems with
VTL configurations is to include only one initiator and one target port in one zone.

The following recommendations apply to target HBAs:


• Consider spreading the backup load across multiple FC ports on the Data Domain
system in order to avoid bottlenecks on a single port.
• Verify the speed of each FC port on the switch to confirm that the port is configured
for the desired rate.
• Set secondary ports to None unless explicitly necessary for your particular
configuration.

Number of Slots and Drives for a Data Domain VTL Configuration


• In a physical tape library setting, multiplexing – sending data from multiple clients
interleaving the data onto a single tape drive simultaneously – is a method to gain
efficiency by sending data from multiple clients to a single tape drive. Multiplexing
was useful for clients with slow throughput since a single client could not send data
fast enough to keep the tape drive busy.

With Data Domain VTL, multiplexing causes existing data to land on a Data Domain
system in a different order each time a backup is performed. Multiplexing makes it
nearly impossible for a system to recognize repeated segments, thus ruining
deduplication efficiency. Do not enable multiplexing on your backup host software
when writing to a Data Domain system.

Copyright © 2014 EMC Corporation. All rights reserved Module 7: Tape Library and VTL Concepts 15
• To increase throughput efficiency and maintain deduplication-friendly data, establish
multiple data streams from your client system to the Data Domain system. Each
stream will require writing to a separate virtual drive.

• The number of slots and drives in a VTL are governed by the number of simultaneous
backup and restore streams that are expected to run. Drive counts are also
constrained by the configuration and overall performance limits of your particular
Data Domain system. Slot counts are typically based on the number of tapes are used
over a retention policy cycle.

• Data Domain Space Management Considerations:


It is important to note that the same considerations for capacity planning also apply when
you are planning a VTL environment. Space management considerations include:
• The size of your backups: The larger the overall amount you need to back up, the
more time should be allotted to perform the backups. Using multiple drives and data
streams should be a consideration. The more powerful your Data Domain system, the
greater number of concurrent streams you can employ.
• The source data type: How many files are you backing up? If you are backing up
larger files, perhaps you should consider using larger capacity tapes.
• Retention periods and data space: How long do you need to hold on to your
backups? You cannot recover the data space used by a tape if the tape is still holding
unexpired data. This can be a problem if you are managing smaller file sets on large
tapes. Smaller tapes give you more flexibility when dealing with smaller data sets.
Expired media is not available for space reclamation (file system cleaning) until the
volume is also relabeled. Relabeling the expired tape volume places it in a state that
allows the space reclamation process to dereference and subsequently delete the
unique blocks associated with the backups on that volume.
You may want to use a backup script using backup software commands to force
relabeling volumes as they are expired. Some backup software will always use a blank
tape in preference to one with customer data, and if there are a lot of unnecessary
tapes, space reclamation will be inefficient.
• Replication: Replication and VTL operations require substantial resources and will
complete faster if they are run separately. It is good practice to run VTL and
replication operations separately.

Be sure to work closely with your EMC implementation team to properly size, configure, and
test your VTL system design before running it in a production backup scenario.

Copyright © 2014 EMC Corporation. All rights reserved Module 7: Tape Library and VTL Concepts 16
Choosing the optimal size of tapes for your needs depends on multiple factors, including the
specific backup application being used, and the characteristics of the data being backed up.
In general, it’s better to use a larger number of smaller capacity tapes than a smaller number
of large capacity tapes, in order to control disk usage and prevent system full conditions.

When choosing a tape size, you should also consider the backup application being used. For
instance, Hewlett Packard Data Protector supports only LTO-1 /200 GB capacity tapes.

Data Domain systems support LT0-1, LTO-2, LTO-3, and LTO-4 formats.
• LTO-1: 100 GB per tape
• LTO-2: 200 GB per tape
• LTO-3: 400 GB per tape
• LTO-4: 800 GB per tape

If the data you are backing up is large, (over 200 GB, for example), you may want larger-sized
tapes since some backup applications are not able to span across multiple tapes.

The strategy of using smaller tapes across many drives gives your system greater throughput
by using more data streams between the backup host and Data Domain system.

Copyright © 2014 EMC Corporation. All rights reserved Module 7: Tape Library and VTL Concepts 17
Larger capacity tapes pose a risk to system full conditions. It is more difficult to expire and
reclaim the space on data being held on a larger tape than on smaller tapes. A larger tape
can have more backups on it, making it potentially harder to expire because it might contain
a current backup on it. Expired tapes are not deleted, and the space occupied by that tape is
not reclaimed until it is relabeled, overwritten, or deleted. Consider a situation in which 30%
of your data is being held on a 1TB tape. You could recover half of that data space (500 GB)
and still not be able to reclaim any of that space while the tape is still holding unexpired
data.

Copyright © 2014 EMC Corporation. All rights reserved Module 7: Tape Library and VTL Concepts 18
All backups on a tape must expire, by policy or manually, before the space in the cartridge
can be relabeled and made available for reuse. If backups with different retention policies
exist on a single piece of media, the youngest image will prevent file system cleaning and
reuse of the tape. You can avoid this condition by initially creating and using smaller tape
cartridges – in most cases, tapes in the 100GB to 200GB range.

Unless you are backing up larger-size files, backing up smaller files to larger-sized tapes will
contribute to this issue by taking longer to fill a cartridge with data. Using a larger number of
smaller-sized tapes can reduce the chances of a few young files preventing cleaning older
data on a larger tape.

Copyright © 2014 EMC Corporation. All rights reserved Module 7: Tape Library and VTL Concepts 19
When deciding how many tapes to create for your VTL configuration, remember, that
creating more tapes than you actually need might cause the system to fill up prematurely
and cause unexpected system full conditions. In most cases, backup software will use blank
tapes before recycling tapes. It is a good idea to start with a tape count less than twice the
available space on the Data Domain system.

Copyright © 2014 EMC Corporation. All rights reserved Module 7: Tape Library and VTL Concepts 20
When a tape is created, a logical, eight-character barcode is assigned that is a unique
identifier of a tape. When creating tapes, the administrator must provide the starting
barcode. The barcode must start with six numeric or uppercase alphabetic characters (from
the set {0-9, A-Z}). The barcode may end with a two-character tag for the supported LT0-1,
LT0-2, LT0-3, and LTO-4 tape types.

A good practice is to use either two or three of the first characters as the identifier of the
group in which the tapes belong. If you use two characters as the identifier, you can then use
four numbers in sequence to number up to 10,000 tapes. If you use three characters, you are
able to sequence only 1000 tapes.

Note: If you specify the tape capacity when you create a tape through the Data Domain
System Manager, you will override the two-character tag capacity specification.

Copyright © 2014 EMC Corporation. All rights reserved Module 7: Tape Library and VTL Concepts 21
In this lesson, you see the steps you would take to create a library and tapes, and set the
logical interaction between the host initiators and their related access groups.

Basic NDMP tape server configuration with a Data Domain VTL library and a brief overview of
VTL support for IBM i products are also presented.

Copyright © 2014 EMC Corporation. All rights reserved Module 7: Tape Library and VTL Concepts 22
The System Manager Configuration Wizard walks you through the initial VTL configuration,
using the VTL configuration module. Typically, the Configuration Wizard is run initially by the
EMC installation team in your environment.

To open the System Manager Configuration Wizard, go to the System Manager, and select
Maintenance > More Tasks > Launch Configuration Wizard.

Navigate to the VTL configuration, and click No until you arrive at the VTL Protocol
configuration section. Select Yes to configure VTL.

The wizard steps you through library, tape, initiator, and access group configuration.

Manual configuration is also possible. Manually configuring the tape library and tapes,
importing tapes, configuring physical resources, setting initiators, and creating VTL access
groups are covered in the following slides.

Copyright © 2014 EMC Corporation. All rights reserved Module 7: Tape Library and VTL Concepts 23
Libraries identify the changer, the drives, the drives’ associated slots and CAPs, and tapes to
be used in a VTL configuration.
• To create a library outside of the configuration manager, go to Data Management >
VTL
• Click the Virtual Tape Libraries stack > More Tasks menu > Library > Create…

Pictured here is the Create Library window in the Data Domain System Manager.

If the VTL is properly planned ahead of time, you should know the values to enter when
creating a library.

Copyright © 2014 EMC Corporation. All rights reserved Module 7: Tape Library and VTL Concepts 24
Keep in mind the capacities and scalability of the elements configured when creating a
library (see the earlier slide on capacity and scalability).
1. Check the backup software application documentation on the Data Domain support
site for the model name you should use with your application. Typically, Restorer-
L180 is used only with Symantec NetBackup and BackupExec software. TS3500 is
used with various backup applications and various OS versions. If you intend to use
TS3500 as your changer emulator, check the DD OS 5.4 Backup Compatibility Guide to
be sure TS3500 is supported with your selected OS version and backup application.

2. Click OK.
The new library appears under the Libraries icon in the VTL Service stack. Options
configured above appear as icons under the library. Clicking the library displays the
configuration details in the informational pane.

You can also use the command line to manage VTL functionality:
# vtl add
Creates/adds a tape library.

# vtl enable
Enables VTL subsystem.

# vtl disable
Closes all libraries and shuts down the VTL process.

Copyright © 2014 EMC Corporation. All rights reserved Module 7: Tape Library and VTL Concepts 25
To create tapes:
1. Select the Virtual Tape Library stack, then click the library for which you want to
create tapes. In this case the library titled “VTL” is selected.
2. From the More Tasks menu (not pictured), select Tapes > Create…
The Create Tapes pane appears as shown in this slide.

Refer to your implementation planning, to find the number, capacity, and starting barcode for
your tape set.
• A VTL supports up to 100,000 tapes, and the tape capacity can be up to 4000 GiBs.
• You can use the System Manager or command line interface to create tapes.
• You can create tapes from within a library, a vault, or a pool.

To add tapes using the command line, use the following command:

# vtl tape add


Adds one or more virtual tapes and inserts them into the vault. Optionally, associates
the tapes with an existing pool for replication.

Copyright © 2014 EMC Corporation. All rights reserved Module 7: Tape Library and VTL Concepts 26
When tapes are created, they are added into the vault. From the vault, tapes can be
imported, exported, moved, searched, and removed. Importing moves existing tapes from
the vault to a library slot, drive, or cartridge access port (CAP). The number of tapes you can
import at one time is limited by the number of empty slots in the library.

To import tapes:
1. Select Data Management > VTL > VTL Service > Libraries.
2. Select a library and view the list of tapes, or click More Tasks and select Tapes >
Import…
3. Enter the search criteria about the tapes you want to import and click Search.
4. Select the tapes to import from the search results.

or

1. Select Data Management > VTL > VTL Service > Libraries.
2. Select the tapes to import by clicking the checkbox next to a tape, a barcode column
or select all by clicking the top of the checkbox column.
3. Only tapes showing Vault in the location are imported.
4. Click Import from Vault.

Copyright © 2014 EMC Corporation. All rights reserved Module 7: Tape Library and VTL Concepts 27
You can also use the command line interface to import and export taps:

# vtl import
Moves existing tapes from the vault into a slot, drive, or cartridge access port (CAP).

# vtl export
Removes tapes from a slot, drive, or cartridge access port (CAP) and sends them to
the vault.

Copyright © 2014 EMC Corporation. All rights reserved Module 7: Tape Library and VTL Concepts 28
There are three steps to configuring the physical resources used for VTL communication:
1. Enable the Endpoints (HBA ports) to be used with your VTL configuration.
2. Work with Networking resources that the SAN switch is connected and zoned
properly between the host and the Data Domain system.
3. Locate and set the alias of the initiators in the Physical Resources stack in the Data
Domain System Manager.
4. Configure the VTL access groups.

Copyright © 2014 EMC Corporation. All rights reserved Module 7: Tape Library and VTL Concepts 29
To enable endpoints using the System Manager:
1. Navigate to Hardware > Fibre Channel > Physical resources.
2. Select the endpoint(s) you wish to enable, and then select More Tasks > Endpoints >
Enable.
3. Select the endpoint(s) you wish to enable in the Enable Endpoints dialog and click
Next.
4. Click Next again.
5. Click Close when the Enable Endpoints Status displays Completed.

You can also manage endpoints using the command line:

# vtl port disable


Disables a single Fibre Channel port or all Fibre Channel ports in the list.

# vtl port enable


Enables a single Fibre Channel port or all Fibre Channel ports in the list.

Copyright © 2014 EMC Corporation. All rights reserved Module 7: Tape Library and VTL Concepts 30
An initiator is any Data Domain Storage System client’s HBA worldwide port name (WWPN)
that belongs to the backup host. An initiator name is an alias that maps to a client’s WWPN.
The Data Domain system interfaces with the initiator for VTL activity. Initiator aliases are
useful because it is easier to reference a name than an eight-pair WWPN number when
configuring access groups.

For instance, you might have a host server with the name HP-1, and you want it to belong to
a group HP-1. You can name the initiator coming from that host server as HP-1. You can then
create an access group also named HP-1 and ensure that the associated initiator has the
same name.

To set the alias of an initiator:


1. Navigate to Hardware > Fibre Channel> Physical Resources > Initiators.
2. Select the initiator you want to alias.
3. Click the modify (pencil) icon.
4. Set the name for the Initiator in the Name field.

Copyright © 2014 EMC Corporation. All rights reserved Module 7: Tape Library and VTL Concepts 31
To set the name for an endpoint:
1. Navigate to Hardware > Fibre Channel> Physical Resources > Endpoints.
2. Select an endpoint to configure, and click the Configure button.
3. Set the name for the Endpoint in the Name field.

You can also use the command line to configure initiators and endpoints:

# vtl initiator set alias


Adds an initiator alias

# vtl initiator show initiator


Shows configured initiators.

# vtl initiator reset alias


Removes an initiator alias.

Copyright © 2014 EMC Corporation. All rights reserved Module 7: Tape Library and VTL Concepts 32
A VTL access group (or VTL group) is created to manage a collection of initiator WWPNs or
aliases and the drives and changers they are allowed to access. Access group configuration
allows initiators in backup applications to read and write data only to the devices included in
the access group list. An access group may contain multiple initiators (a maximum of 128),
but an initiator can exist in only one access group. A maximum of 512 initiators can be
configured for a Data Domain system.

A default access group exists named TapeServer, to which you can add devices that support
NDMP-based backup applications. Configuration for this group is discussed in the next slide.

Access groups are similar to LUN masking. They allow clients to access only selected LUNs
(media changers or virtual tape drives) on a system through assignment. A client set up for
an access group can access only those devices in the access group to which it is assigned.

Note: Avoid making access group changes on a Data Domain system during active backup or
restore jobs. A change may cause an active job to fail. The impact of changes during an active
job depends on a combination of backup software and host configurations.

Copyright © 2014 EMC Corporation. All rights reserved Module 7: Tape Library and VTL Concepts 33
To create an access group in the Data Domain System Manager:
1. Navigate to Data Management > VTL > Access Groups > Groups > More Tasks > Group
> Create.
2. In the Configuration window, name the access group, and select the initiators to add
to it.
3. Click Next.
A window appears in which you can add devices by selecting the library, and choosing
from a list of devices, and identifying the LUN number, as well as the primary and
secondary (failover) ports it should use.

You can also use the command line to manage VTL Access Groups:

# vtl group add


Adds an initiator or a device to a group.

# vtl group create


creates a group.

# vtl group del


Removes an initiator or device from a group.

# vtl group destroy


Destroys a group.

# vtl group modify


Modifies a device in a group.

# vtl group rename


Renames a group.

# vtl group show


Shows configured groups.

# vtl group use


Switches the ports in use in a group or library to the primary or secondary port list.

Copyright © 2014 EMC Corporation. All rights reserved Module 7: Tape Library and VTL Concepts 34
The Initiators tab of the Access Group shows the Initiator alias and its related WWPN that is
grouped to the LUNs listed in the LUNs tab.

It is showing the administrator that the host associated to this initiator can see the changers
and drives listed in the LUNs tab.

Copyright © 2014 EMC Corporation. All rights reserved Module 7: Tape Library and VTL Concepts 35
When configuring an NDMP over TCP/IP configuration, a Data Domain system starts an
NDMP tape server. NDMP tape servers are accessed via a standard NDMP protocol. For
more details see http://ndmp.org.

The host server must have NDMP client software installed and running. This client software
is used to remotely access the Data Domain VTL.

Devices assigned to the access group TapeServer on the Data Domain system can be
accessed only by the NDMP TapeServer

The NDMP tape server on the Data Domain system converts this data to tape I/O, and writes
to the Data Domain VTL.

An NDMP user is associated with the configuration for authentication purposes. DDOS users
can be used but the password is plain over the network. NDMPD adds the user and can
enable password encryption for added security.

The top level CLI command is ndmpd.

Copyright © 2014 EMC Corporation. All rights reserved Module 7: Tape Library and VTL Concepts 36
The following steps configure an NDMP tape server on the Data Domain system.
1. Enable the NDMP daemon by typing the CLI command # ndmpd enable.
2. Verify that the NDMP daemon sees the devices created in the TapeServer access
group
Note: You must first create a VTL per the instructions discussed earlier in this module,
then assign the access group, TapeServer, before performing this step. Enter the
command
# ndmpd show devicenames.
The VTL device names will appear as a table as shown in this slide.
3. Add an NDMP user for the ndmpd service. Enter the command,
# ndmpd user add ndmp.
4. When prompted, enter and verify the password for this user.
5. Verify the created user by entering the command, # ndmpd user show. The
username appears below the command.

Copyright © 2014 EMC Corporation. All rights reserved Module 7: Tape Library and VTL Concepts 37
(Continued from previous slide)

6. Check the options for the ndmpd daemon. Enter the command ndmpd option
show all. A table showing the names of the options appears as shown in this slide.
Note that the authentication value is set to text. That means your authentication to
the ndmp daemon is transmitted as plain text: this is a possible security risk.
7. Set the ndmpd service authentication to MD5. Enter the command, ndmpd option
set authentication md5.
8. Verify the service.

Copyright © 2014 EMC Corporation. All rights reserved Module 7: Tape Library and VTL Concepts 38
The IBM power systems utilize a hardware abstraction layer, commonly referred to as the to
physical hardware. All peripheral equipment must emulate IBM equipment, including IBM
tape libraries and devices, when presented to the operating system.

Additionally, the hardware drivers used by these systems are embedded in the LIC and IBM i
operating system. LIC PTFs, or program temporary fixes, are IBM's method of updating and
activating the drivers. In most cases, hardware configuration settings cannot be manually
configured, as only IBM, or equipment that emulates IBM equipment is attached, requiring
only fixed configuration settings.

Fibre Channel devices can be connected directly to host (direct attach) through arbitrated
loop (FC-AL) topology or through a switched fabric (FC-SW) topology. Please note that the
Data Domain VTL supports only switched fabric for connectivity. The Fibre Channel host bus
adapters or IOAs (input/output adapters) can negotiate at speeds of 2 Gbps, 4 Gbps, and 8
Gbps in an FC-SW environment without any configuration on the operating system other
than plugging in the cable at the host. Fibre Channel IOPs and IOAs are typically installed by
an IBM business partner.

Copyright © 2014 EMC Corporation. All rights reserved Module 7: Tape Library and VTL Concepts 39
Virtual Libraries
Data Domain VTL supports one type of library configuration for IBM i use. This is an IBM
TS3500 configured with IBM LT03 virtual tape drives. Virtual library management is done
from the Virtual Tape Libraries tab. From Virtual Tape Libraries > More Tasks > Library >
Create, you can set the number of virtual drives and the number of slots.

A special VTL license that supports IBM i use is required. This special license supports other
VTL configurations as well, but the standard VTL license does not directly support IBM i
configurations.

IBM i virtual libraries are not managed any differently from other operating systems. Once
the library and tapes are created, they are managed either by BRMS (IBM's tape
management on the i) or through other IBM i native command access or third-party tape
management systems. The only library supported on the IBM i is the TS3500, and LTO3
drives. They must be created after you add the i/OS license to the DD system to have the
correct IBM i configuration.

Refer to the Virtual Tape Library for IBM System i Integration Guide for current configuration
instructions available in the support portal for all configuration and best practices
information when using VTL in an IBM i environment.

Copyright © 2014 EMC Corporation. All rights reserved Module 7: Tape Library and VTL Concepts 40
Copyright © 2014 EMC Corporation. All rights reserved Module 7: Tape Library and VTL Concepts 41
Copyright © 2014 EMC Corporation. All rights reserved Module 7: Tape Library and VTL Concepts 42
This module discusses how DD Boost incorporates several features to significantly reduce
backup time and manage replicated data for easier access in data recovery operations.

Copyright © 2014 EMC Corporation. All rights reserved Module 8: DD Boost 1


EMC Data Domain Boost extends the optimization capabilities of Data Domain systems for
other EMC environments, such as Avamar and NetWorker, as well as Greenplum, Quest
vRanger, Oracle RMAN, Symantec NetBackup, and Backup Exec.

In this lesson, you will get an overview of the DD Boost functionality and the features that
make up this licensed addition to the Data Domain operating system.

Copyright © 2014 EMC Corporation. All rights reserved Module 8: DD Boost 2


This slide lists the three basic features to DD Boost.
• A private protocol that is more efficient than CIFS or NFS. DD Boost has a private,
efficient data transfer protocol with options to increase efficiencies.
• Distributed segment processing (DSP). An optional feature to DD Boost shares
portions of the deduplication process with the application host, improving data
throughput.

DSP distributes parts of the deduplication process to the NetWorker storage node
using the embedded DD Boost Library (or, for other backup applications, using the DD
BOOST plug-in), moving some of the processing normally handled by the Data
Domain system to the application host. The application host performs a comparison
of the data to be backed up with the library and looks for any unique segments. Thus
it sends only unique segments to the Data Domain system.

Benefits of DSP include:


• Increased throughput
• Reduced load on the Data Domain system
• Reduced bandwidth utilization
• Reduced load on the storage node/backup host. Managed file replication, an
optional feature of DD Boost, offers a replication environment where the
application host is both aware and can control replication.

Copyright © 2014 EMC Corporation. All rights reserved Module 8: DD Boost 3


• DD Boost provides systems with centralized replication awareness and management.
Using this feature, known as Managed File Replication, backups written to one Data
Domain system can be replicated to a second Data Domain system under the
management of the application host. The application host catalogs and tracks the
replica, making it immediately accessible for recovery operations. Administrators can
use their backup application to recover duplicate copies directly from a replica Data
Domain system.

Benefits of managed file replication include:


• Quicker access to recovery. All backups and clones are cataloged in your
backup application on your server.
• Full administrative control of all backups and replicas through the backup
software.

Copyright © 2014 EMC Corporation. All rights reserved Module 8: DD Boost 4


Advanced load balancing and link failover via interface groups
To improve data transfer performance and increase reliability, you can create a group
interface using the advanced load balancing and link failover feature. Configuring an interface
group creates a private network within the Data Domain system, comprised of the IP
addresses designated as a group. Clients are assigned to a single group by specifying client
name (client.emc.com) or wild card name (*.emc).

Benefits include:
• Potentially simplified installation management
• A system that remains operational through loss of individual interfaces
• Potentially higher link utilization
• In-flight jobs that fail over to healthy links, so jobs continue uninterrupted from the
point of view of the backup application.

Virtual synthetics
DD Boost in DD OS 5.2 supports optimized synthetic backups when integrated with backup
software. Currently, EMC NetWorker and Symantec NetBackup are the only supported
software applications using this feature.

Copyright © 2014 EMC Corporation. All rights reserved Module 8: DD Boost 5


Optimized synthetic backups reduce processing overhead associated with traditional
synthetic full backups. Just like a traditional backup scenario, optimized synthetic backups
start with an initial full backup followed by incremental backups throughout the week.
However, the subsequent full backup requires no data movement between the application
server and Data Domain system. The second full backup is synthesized using pointers to
existing segments on the Data Domain system. This optimization reduces the frequency of
full backups, thus improving recovery point objectives (RPO) and enabling single step
recovery to improve recovery time objectives (RTO). In addition, optimized synthetic backups
further reduce the load on the LAN and application host.

Benefits include:
• Reduces the frequency of full backups
• Improves RPO and RTO
• Reduces load on the LAN and application host

Both low bandwidth optimization and encryption of managed file replication data are
replication optional features and are both supported with DD Boost enabled.

Copyright © 2014 EMC Corporation. All rights reserved Module 8: DD Boost 6


DD Boost currently supports interoperability with the listed products on various backup host
platforms and operating systems. The interoperability matrix is both large and complex. To
be certain a specific platform and operating system is compatible with a version of DD Boost,
consult the EMC DD Boost Compatibility Guide found in the Support Portal at
https://support.emc.com.

Copyright © 2014 EMC Corporation. All rights reserved Module 8: DD Boost 7


To store backup data using DD Boost, the Data Domain system exposes user-created disk
volumes called storage units (SUs) to a DD Boost-enabled application host. In this example,
an administrator created an SU named “exchange_su.” As the system completes the SU
creation, an MTree is created, and the file, /.ddboost is placed within the created MTree.
Creating additional storage units creates additional MTrees under /data/col1 each with
its own /.ddboost file within. Access to the SU is OS independent. Multiple applications
hosts, when configured with DD Boost, can use the same SU on a Data Domain system as a
storage server.

Storage units can be monitored and controlled just as any data managed within an MTree.
You can set hard and soft quota limits and receive reports about MTree content.

Note: Storage units cannot be used with anything but a DD Boost replication context.

Copyright © 2014 EMC Corporation. All rights reserved Module 8: DD Boost 8


If you recall, the deduplication on a Data Domain system is a five-step process where the
system:
• Segments data to be backed up
• Creates fingerprints of segment data
• Filters the fingerprints and notes references to previously stored data
• Compresses unique, new data to be stored
• Writes the new data to disk

In normal backup operations, the backup host has no part in the deduplication process.
When backups run, the backup host sends all backup data to allow the Data Domain system
to perform the entire deduplication process to all of the data.

Copyright © 2014 EMC Corporation. All rights reserved Module 8: DD Boost 9


Distributed segment processing (DSP) shares deduplication duties with the backup host. With
DSP enabled, the backup host:
1. Segments the data to be backed up
2. Creates fingerprints of segment data and sends them to the Data Domain system
3. Optionally compresses data to be backed up
4. Sends only the requested unique data segments to the Data Domain system

The Data Domain system:


1. Filters the fingerprints sent by the backup host and requests data not previously
stored
2. Notes references to previously stored data and writes new data

The deduplication process is the same whether DSP is enabled or not. With DSP enabled, the
backup host will split the arriving data into 4-12 kb segments. A fingerprint (or segment ID) is
created for each segment. Each segment ID is sent over the network to the Data Domain
system to filter. The filter determines if the segment ID is new or a duplicate. The segment
IDs are checked against segment IDs already on the Data Domain system. The segment IDs
that match existing segments IDs are referenced and discarded, while the Data Domain
system tells the backup host which segment IDs are unmatched (new).

Copyright © 2014 EMC Corporation. All rights reserved Module 8: DD Boost 10


Unmatched or new segments are compressed using common compression techniques, such
LZ, GZ, or Gzfast. This is also called local compression. The compressed segments are sent to
the Data Domain system and written to the Data Domain system with the associated
fingerprints, metadata, and logs.

The main benefits of DSP are:


• More efficient CPU utilization.
• Improved utilization of network bandwidth. Less data throughput is required to send
with each backup.
• Less time to restart failed backup jobs. If a job fails, the data already sent to the Data
Domain system does not need to be sent again – reducing the load on the network
and improving the overall throughput for the failed backups upon retry.
• Distribution of the workload between the Data Domain system and the DD Boost
aware application.

DD Boost can operate with DSP either enabled or disabled. DSP must be enabled or disabled
on a per-system basis; individual backup clients cannot be configured differently than the
Data Domain system.

Copyright © 2014 EMC Corporation. All rights reserved Module 8: DD Boost 11


The network bandwidth requirements are significantly reduced because only unique data is
sent over the LAN to the Data Domain systems.

Consider DSP only if your application hosts can accommodate the additional processing
required by its share of the DSP workflow.

Copyright © 2014 EMC Corporation. All rights reserved Module 8: DD Boost 12


DD Boost integration enables the backup application to manage file replication between two
or more Data Domain systems configured with DD Boost software. It is a simple process to
schedule Data Domain replication operations and keep track of backups for both local and
remote sites. In turn, recovery from backup copies at the central site is also simplified
because all copies are tracked in the backup software catalog.

The Data Domain system uses a wide area network (WAN)-efficient replication process for
deduplicated data. The process can be optimized for WANs, reducing the overall load on the
WAN bandwidth required for creating a duplicate copy.

Copyright © 2014 EMC Corporation. All rights reserved Module 8: DD Boost 13


This example shows managed file replication with DD Boost. The example is specific to an EMC
NetWorker environment. Symantec and other backup applications using DD Boost will manage
replication in a similar manner.

In this environment, a backup server is sending backups to a local Data Domain system. A remote
Data Domain system is set up for replication and disaster recovery of the primary site.
• The NetWorker storage node initiates the backup job and sends data to the Data Domain
system. Backup proceeds.
• The Data Domain system signals that the backup is complete.
• Information about the initial backup is updated in the NetWorker media database.
• The NetWorker storage node initiates replication of the primary backup to the remote Data
Domain system through a clone request.
• Replication between the local and remote Data Domain systems proceed.
• When replication completes, the NetWorker storage node receives confirmation of the
completed replication action.
• Information about the clone copy of the data set is updated in the NetWorker media
database.
Replicated data is now immediately accessible for data recovery using the NetWorker media
database.

Copyright © 2014 EMC Corporation. All rights reserved Module 8: DD Boost 14


While it is acceptable for both standard MTree replication and managed file replication to
operate on the same system, be aware that managed file replication can be used only with
MTrees established with DD Boost storage units. MTree replication can be used only with
CIFS and NFS data.

You also need to be mindful not to exceed the total number of 100 MTrees on a system. The
100 MTree limit is a count of both standard MTrees and MTrees created as DD Boost storage
units.

Also remember to remain below the maximum total number of replication pairs (contexts)
recommended for your particular Data Domain systems.

Copyright © 2014 EMC Corporation. All rights reserved Module 8: DD Boost 15


For Data Domain systems that require multiple 1 GbE links to obtain full system
performance, it is necessary to set up multiple backup servers on the Data Domain systems
(one per interface) and target the backup policies to different servers to spread the load on
the interfaces. Using the DD Boost interface groups, you can improve performance on 1 Gb
Ethernet ports.

The Advanced Load Balancing and Link Failover feature allows for combining multiple
Ethernet links into a group. Only one of the interfaces on the Data Domain system is
registered with the backup application. DD Boost software negotiates with the Data Domain
system on the interface registered with the backup application to obtain an interface to send
the data. The load balancing provides higher physical throughput to the Data Domain system
compared to configuring the interfaces into a virtual interface using Ethernet-level
aggregation.

The links connecting the backup hosts and the switch that connects to the Data Domain
system are placed in an aggregated failover mode. A network-layer aggregation of multiple 1
GbE or 10 GbE links is registered with the backup application and is controlled on the backup
server.

This configuration provides network failover functionality from end-to-end in the


configuration. Any of the available aggregation technologies can be used between the
backup servers and the switch.

Copyright © 2014 EMC Corporation. All rights reserved Module 8: DD Boost 16


An interface group is configured on the Data Domain system as a private network used for
data transfer. The IP address must be configured on the Data Domain system and its interface
enabled. If an interface (or a NIC that has multiple interfaces) fails, all of the in-flight jobs to
that interface transparently fail-over to a healthy interface in the interface group (ifgroup).
Any jobs started subsequent to the failure are routed to the healthy interfaces. You can add
public or private IP addresses for data transfer connections.

Distributed segment processing (DSP) is not affected by DD Boost application-level groups.

Note: Do not use 1GbE and 10GbE connections in the same interface group.

Copyright © 2014 EMC Corporation. All rights reserved Module 8: DD Boost 17


A synthetic full or synthetic cumulative incremental backup is a backup assembled from
previous backups. Synthetic backups are generated from one previous, traditional full or
synthetic full backup, and subsequent differential backups or a cumulative incremental
backup. (A traditional full backup means a non-synthesized, full backup.) A client can use the
synthesized backup to restore files and directories in the same way that a client restores
from a traditional backup.

During a traditional full backup, all files are copied from the client to a media server and the
resulting image set is sent to the Data Domain system . The files are copied even though
those files may not have changed since the last incremental or differential backup. During a
synthetic full backup, the previous full backup and the subsequent incremental backups on
the Data Domain system are combined to form a new, full backup. The new, full synthetic
backup is an accurate representation of the client’s file system at the time of the most recent
full backup.

Because processing takes place on the Data Domain system under the direction of the
storage node, or media server, instead of the client, virtual synthetic backups help to reduce
the network traffic and client processing. Client files and backup image sets are transferred
over the network only once. After the backup images are combined into a synthetic backup,
the previous incremental and/or differential images can be expired.

Copyright © 2014 EMC Corporation. All rights reserved Module 8: DD Boost 18


The virtual synthetic full backup is a scalable solution for backing up remote offices with
manageable data volumes and low levels of daily change. If the clients experience a high rate
of change daily, the incremental or differential backups are too large. In this case, a virtual
synthetic backup is no more helpful than a traditional full backup. To ensure good restore
performance, it is recommended that you create a traditional full backup every two months,
presuming a normal weekly full and daily incremental backup policy.

The virtual synthetic full backup is the combination of the last full (synthetic or full) backup
and all subsequent incremental backups. It is time-stamped as occurring one second after
the latest incremental. It does NOT include any changes to the backup selection since the
latest incremental.

Copyright © 2014 EMC Corporation. All rights reserved Module 8: DD Boost 19


Synthetic backups can reduce the load on an application server and the data traffic between
an application server and a media server. Synthetic backups can reduce the traffic between
the media server and the DD System by performing the Virtual Synthetic Backup assembly on
the DD System.

You might want to consider using virtual synthetic backups when:


• Your backups are small, and localized, so that daily incrementals are small (<10% of a
normal, full backup).
• The Data Domain system you are using has a large number of disks (>10).
• Data restores are infrequent.
• Your intention is to reduce the amount of network traffic between the application
server, the media servers and the Data Domain system.
• Your media servers are burdened and might not handle DSP well.

Copyright © 2014 EMC Corporation. All rights reserved Module 8: DD Boost 20


It might not be appropriate to use virtual synthetic backups when:
• Daily incremental backups are high, or highly distributed (incrementals are > 15% of a
full backup).
• You are backing up large, non-file system data (such as databases).
• Data restores are frequent.
• The Data Domain system is small or has few disks.
• Your media server handles DSP well.

Restore performance from a synthetic backup will typically be worse than a standard full
backup due to poor data locality.

Copyright © 2014 EMC Corporation. All rights reserved Module 8: DD Boost 21


DD Boost over FC enables new use cases via Fibre Channel transport
• Leverages existing FC infrastructure

Using FC as the transport is transparent to backup application


• All the other Boost application level features run; Symantec AIR (Auto Image
Replication), Optimized Synthetics

DD Boost over FC presents Logical Storage Units (LSUs) to the backup application and
removes a number of limitations inherent to tape and VTL
• Enables concurrent read and write; Not allowed per virtual tape
• Backup image is smallest unit of replication or expiration vs. Virtual tape cartridge,
which results in efficient space management

Copyright © 2014 EMC Corporation. All rights reserved Module 8: DD Boost 22


Simplified Management
• No access group limitations, simple configuration using very few access groups
• Manage backup images, as opposed to tape cartridges

Advanced Load Balancing and Failover


• Path management, load balancing and Failover is done by plug-in / DD OS
• No need for expensive multi-pathing IO ( MPIO) Software

Replication is still over IP networks.

Copyright © 2014 EMC Corporation. All rights reserved Module 8: DD Boost 23


EMC Data Domain Boost integrates with many EMC, and a growing number of third-party,
applications.

This lesson discusses how DD Boost integrates with EMC NetWorker and Symantec
NetBackup.

Copyright © 2014 EMC Corporation. All rights reserved Module 8: DD Boost 24


The DD Boost feature is built-into the Data Domain operating system. Unlock the DD Boost
feature on each Data Domain system with separate license keys. If you are planning not to
use Managed File Replication, the destination Data Domain system does not require a DD
Boost license.

For EMC, Oracle, and Quest users, the Data Domain Boost library is already included in
recent versions of software. Before enabling DD Boost on Symantec Backup Exec, and
NetBackup, a special OST plug-in must be downloaded and installed on the backup host. The
plug-in contains the appropriate DD Boost Library for use with compatible Symantec product
versions. Consult the most current DD Boost Compatibility Guide to verify compatibility with
your specific software and Data Domain operating system versions. Both the compatibility
guide and versions of OpenStorage (OST) plug-in software are available through the EMC
Data Domain support portal at: http://support.emc.com.

A second destination Data Domain system licensed with DD Boost is needed when
implementing centralized replication awareness and management.

Copyright © 2014 EMC Corporation. All rights reserved Module 8: DD Boost 25


Data Domain Boost configuration is the same for all backup environments:

On each of the Data Domain systems:


• License DD Boost on the Data Domain system(s): System Settings > Licenses > Add
Licenses.
• Enable DD Boost on all Data Domain systems: Data Management > DD Boost > DD
Boost Status > Enable.
• Set a backup host as a client by hostname (the configuration does not accept IP
addresses in this case). Define a Data Domain local user as the DD Boost User: Data
Management > DD Boost > DD Boost User > Modify.
• Create at least one storage unit. You must create one or more storage units for each
Data Domain system enabled for DD Boost: Data Management > DD Boost > Storage
Units > Create Storage Unit.

Network Note
Enable the following ports:
• UDP 2049 (enables NFS communication)
• TCP 2051 (enables file replication communication)
• TCP 111 (enables RPC portmapper services comms)

Copyright © 2014 EMC Corporation. All rights reserved Module 8: DD Boost 26


The following are optional configuration parameters:
• Configure distributed segment processing: DD Boost > Activities > Distributed
Segment Processing Status: > Enable (default)/Disable
Note: DSP is enabled by default.
• Configure advanced load balancing and link failover: DD Boost > Activities > Interface
Group Status > Configure (then Enable).
• Enable low-bandwidth optimization : DD Boost > Active File Replications > Low
Bandwidth Optimization status > Disable (default)/Enable.
Note: Low-bandwidth optimization is disabled by default.
• Enable encrypted optimized deduplication: DD Boost > Active File Replications > File
Replication Encryption status > Disable (default)/Enable.
Note: Encrypted optimized duplication is disabled by default.

For the backup host:


• License the backup software for DD Boost as required by the software manufacturer.
• Create devices and pools through the management console/interface.
• Configure backup policies and groups to use the Data Domain system for backups
with DD Boost.
• Configure clone or duplicate operations to use Data Domain managed replication
between Data Domain systems.

On the Network:
• Open the following ports if you plan to use any of the related features through a
network firewall:
• UDP 2049 (enables NFS communication)
• TCP 2051 (enables file replication communication)
• TCP 111 (enables RPC portmapper services communication)

Copyright © 2014 EMC Corporation. All rights reserved Module 8: DD Boost 27


Enable DD Boost by navigating in the Data Domain Enterprise Manager to Data Management
> DD Boost > Settings.

In the example on the slide, see that the current DD Boost Status is disabled. Click the Enable
button to enable DD Boost on a system.

You can also enable and manage DD Boost from the command line interface:

# ddboost enable
Enable DD Boost

# ddboost show connections


Show DD Boost connections

# ddboost show stats [interval <seconds>] [count <count>]


Show DD Boost statistics

# ddboost status
Show DD Boost enable or disable status

Copyright © 2014 EMC Corporation. All rights reserved Module 8: DD Boost 28


To add or change a DD Boost user for the system, click the Modify button. In the Modify DD
Boost User window, select from an existing user or add a new user, give them a password
and assign them a role (the data-access or admin roles should be used for the DD Boost
user). In order to change the DD Boost user on a Data Domain system, DD Boost must first be
disabled.

In the Allowed Clients field, click the green plus button to add a new client whom you are
allowing to access DD Boost on the system. Add the client name as a domain name since IP
addresses are not allowed.

You can also add users and clients using the command line:

# ddboost set user-name <user-name>


Set DD Boost user

# ddboost access add clients <client-list>


Add clients to DD Boost access list

Copyright © 2014 EMC Corporation. All rights reserved Module 8: DD Boost 29


Create a storage unit by navigating to Data Management > DD Boost > Storage Units >
Create.

Note: The section, Storage Unit Details is new to DD OS 5.2. It provides a good summary of a
storage unit and the status of file count, compression ratio, SU status, and quota function.

Name the storage unit and set any quota settings you wish. Be aware that these quota
settings are not enforced unless MTree quotas are enabled.

The command line can also be used to create and manage storage units:

# ddboost storage-unit create <storage-unit-name> [quota-


soft-limit <n> {MiB|GiB|TiB|PiB}] [quota-hard-limit <n>
{MiB|GiB|TiB|PiB}]
Create storage-unit, setting quota limits

# ddboost storage-unit delete <storage-unit-name>


Delete storage-unit

Copyright © 2014 EMC Corporation. All rights reserved Module 8: DD Boost 30


# ddboost storage-unit show [compression] [<storage-unit-
name>]
List the storage-units and images in a storage-unit

Copyright © 2014 EMC Corporation. All rights reserved Module 8: DD Boost 31


To enable or disable distributed segment processing, bandwidth optimization for file
replication, and file replication encryption, click More Tasks > Set Options.

You can also set DD Boost options from the command line:

# ddboost option reset {distributed-segment-processing |


virtual-synthetics | fc}
Reset DD Boost options

# ddboost option set distributed-segment-processing {enabled


| disabled}
Enable or disable distributed-segment-processing for DD Boost

# ddboost option set virtual-synthetics {enabled | disabled}


Enable or disable virtual-synthetics for DD Boost

# ddboost option show [distributed-segment-processing |


virtual-synthetics | fc]
Show DD Boost options

Copyright © 2014 EMC Corporation. All rights reserved Module 8: DD Boost 32


DD Boost over Fibre Channel can be configured in the System Manager from Data
Management > DD Boost > Fibre Channel.

You can also configure and manage DD Boost over Fibre Channel from the command line:

# ddboost option set fc {enabled | disabled}


Enable or disable fibre-channel for DD Boost

# ddboost fc dfc-server-name set <server-name>


DDBoost Fibre-Channel set Server Name

# ddboost fc dfc-server-name show


Show DDBoost Fibre-Channel Server Name

# ddboost fc group add <group-name> initiator <initiator-


spec>

Copyright © 2014 EMC Corporation. All rights reserved Module 8: DD Boost 33


# ddboost fc group add <group-name> device-set [count
<count>] [endpoint {all | none | <endpoint-list>}]
Add initiators or DDBoost devices to a DDBoost FC group

# ddboost fc group create <group-name>


Create a DDBoost FC group

# ddboost fc group show list [<group-spec>] [initiator


<initiator-spec>]
List configured DDBoost FC groups

# ddboost fc show detailed-stats


Show DDBoost Fibre-Channel Detailed Statistics

# ddboost fc show stats [endpoint <endpoint-spec>]


[initiator <initiator-spec>] [interval <interval>]
[count <count>]
Show DDBoost Fibre-Channel Statistics

# ddboost fc status
DDBoost Fibre Channel Status

Copyright © 2014 EMC Corporation. All rights reserved Module 8: DD Boost 34


In this lab, you configure DD Boost on your Data Domain system and verify the configuration
using EMC NetWorker.

Copyright © 2014 EMC Corporation. All rights reserved Module 8: DD Boost 35


In this lesson, we discuss the use of various backup applications with DD Boost.

Copyright © 2014 EMC Corporation. All rights reserved Module 8: DD Boost 36


Data Domain Boost software provides integration between Data Domain storage systems and
NetWorker software. It provides NetWorker with visibility into the properties and capabilities
of the Data Domain system, control of the backup images stored in the system, and efficient
wide area network (WAN) replication to remote Data Domain systems.

DD Boost for NetWorker has two components, the DD Boost library that is integrated into the
storage node and the DD Boost server that runs on the Data Domain system. The DD Boost
library is provided as the NetWorker Data Domain Device Type and provides the following
key enhancements for disk-based data protection strategies:
• Simplifies device setup and configuration by using wizards
• Increases aggregate backup throughput
• Provides NetWorker clone-controlled replication – Backup cloning available using
EMC Data Domain Replicator, which provides network-efficient replication that is
controlled, monitored, and cataloged by the NetWorker software
• Integrates NetWorker Advanced reporting of the Data Domain systems
• Provides recovery of replicated backup images in their entirety or at a granular level
via the NetWorker user interface
• Tape Consolidation – Using the NetWorker clone-controlled replication functionality,
backup images can be moved to a centralized location where they can be cloned to
tape

Copyright © 2014 EMC Corporation. All rights reserved Module 8: DD Boost 37


Data Domain systems can be used as storage for Avamar backup data. Backup data is sent
directly from the client to the Data Domain system using DDBoost technology. Backups can
then be managed through the Avamar system. This can provide faster backup and recovery,
especially for large active databases. Data Domain integration is supported for File system
data, NDMP data, Lotus Domino, DB2, Microsoft Exchange VSS, Hyper-V VSS, Microsoft SQL
Server, Microsoft SharePoint VSS, Oracle, SAP with Oracle, Sybase, and VMware image
backup and restore.

Maintenance activities that are performed on the Avamar server are also performed on any
data stored on the Data Domain. This means that a backup that has expired or been deleted
on the Avamar server will be deleted from the Data Domain. Avamar garbage collection,
checkpoints, rollbacks, and HFS checks and replication trigger similar processes on the Data
Domain system. More information on these maintenance activities are discussed later in this
course.

Integrating Avamar with Data Domain provides a few additional features. When performing
image level backups of virtual machines, a virtual machine may be booted from the backup
data without the need to perform a restore. This feature is called “instant access” because
the virtual machine can be used instantly. The VM runs off of Data Domain storage.
Eventually, the VM can be moved back to VMware storage using vMotion.

Copyright © 2014 EMC Corporation. All rights reserved Module 8: DD Boost 38


Single-node and AVE servers have the option of performing backups of checkpoint data to a
Data Domain. This provides disaster recovery without the need for a second Avamar server.
In the event of a disaster, the checkpoint can be restored to the replacement Avamar then
restored.

Copyright © 2014 EMC Corporation. All rights reserved Module 8: DD Boost 39


EMC Data Domain Boost for Recovery Manager enables database servers to communicate
with Data Domain systems in an optimized way, without the need to use a backup
application, and to improve performance while reducing data transferred over the LAN. In
the context of Oracle RMAN, there are two components to the software:
• An RMAN plug-in that you install on each database server. This plug-in includes the
DD Boost libraries for communicating with the DD Boost service running on the Data
Domain system.
• The DD Boost server that runs on Data Domain systems.

RMAN sets policies that control when backups and replications occur. Administrators
manage backup, replication, and restores from a single console and can use all of the
features of DD Boost, including WAN-efficient replicator software. RMAN manages all files
(collections of data) in the catalog, even those created by the DD system.

The DD system exposes pre-made disk volumes called storage units to a DD Boost-enabled
database server. Multiple database servers can use the same storage unit provided they
have the DD Boost plug-in installed. Additionally, each database server can run a different
operating system provided it is supported by Data Domain.

Copyright © 2014 EMC Corporation. All rights reserved Module 8: DD Boost 40


OpenStorage software provides API-based integration between Data Domain storage systems
and NetBackup. The API gives NetBackup visibility into the properties and capabilities of the
Data Domain storage system, control of the backup images stored in the system, and wide
area network (WAN) efficient replication to remote Data Domain storage systems.

DD Boost for NetBackup has two components. The DD Boost Library is embedded in the Data
Domain Boost plug-in that runs on NetBackup media servers. The DD Boost Server is built
into DD OS and runs on a Data Domain system. The two components integrate seamlessly
across the IT infrastructure to enable these benefits. NetBackup, Data Domain Boost -
enabled Data Domain deduplication storage systems and the Symantec NetBackup
OpenStorage Disk Option provide the following key enhancements for disk-based data
protection strategies:

• NetBackup optimized duplication – Backup image duplication using EMC Data Domain
Replicator, providing network-efficient replication and able to be controlled,
monitored, and cataloged by NetBackup.
• Integrated NetBackup reporting of Data Domain replication job status.
• Recovery of replicated backup images in their entirety or at a granular level via the
NetBackup user interface.

Copyright © 2014 EMC Corporation. All rights reserved Module 8: DD Boost 41


• NetBackup media server load balancing, eliminating the need to manually divide
client backups across NetBackup media servers utilizing OpenStorage storage units.
• Tape consolidation – Backup images from remote locations and branch offices can be
replicated to a centralized location.

Copyright © 2014 EMC Corporation. All rights reserved Module 8: DD Boost 42


Prior to DD Boost, Backup Exec media servers would send all data to a Data Domain system
for deduplication processing. With the distributed segment processing feature of DD Boost,
parts of the deduplication process are distributed to the media server, enabling it to send
only unique data segments to a Data Domain system. This increases the aggregate
throughput and reduces the amount of data transferred over the network.

In addition to performance improvements and network bandwidth benefits, the reduction in


data transferred over the network also decreases CPU utilization on the media servers since
sending data is significantly more CPU intensive than the distributed deduplication process.

The combination of a Data Domain system and DD Boost for Backup Exec creates an
optimized connection to provide a tightly integrated solution. DD Boost for Backup Exec
offers operational simplicity by enabling the media server to manage the connection
between the backup application and one or more Data Domain systems.

Copyright © 2014 EMC Corporation. All rights reserved Module 8: DD Boost 43


Copyright © 2014 EMC Corporation. All rights reserved Module 8: DD Boost 44
In this module, you learn about security and protecting your data with a Data Domain
system.

Copyright © 2014 EMC Corporation. All rights reserved Module 9: Data Security 1
As data ages and becomes seldom used, EMC recommends moving this data to archive
storage where it can still be accessed, but no longer occupies valuable storage space.

Unlike backup data, which is a secondary copy of data for shorter-term recovery purposes,
archive data is a primary copy of data and is often retained for several years. In many
environments, corporate governance and/or compliance regulatory standards can mandate
that some or all of this data be retained “as-is.” In other words, the integrity of the archive
data must be maintained for specific time periods before it can be deleted.

The EMC Data Domain Retention Lock (DD Retention Lock) feature provides immutable file
locking and secure data retention capabilities to meet both governance and compliance
standards of secure data retention. DD Retention Lock ensures that archive data is retained
for the length of the policy with data integrity and security.

This lesson presents an overview of Data Domain Retention Lock, its configuration and use.

Copyright © 2014 EMC Corporation. All rights reserved Module 9: Data Security 2
EMC Data Domain Retention Lock is an optional, licensed software feature that allows
storage administrators and compliance officers to meet data retention requirements for
archive data stored on an EMC Data Domain system. For files committed to be retained, DD
Retention Lock software works in conjunction with the application’s retention policy to
prevent these files from being modified or deleted during the application’s defined retention
period, for up to 70 years. It protects against data management accidents, user errors and
any malicious activity that might compromise the integrity of the retained data. The
retention period of a retention-locked file can be extended, but not reduced.

After the retention period expires, files can be deleted, but cannot be modified. Files that are
written to an EMC Data Domain system, but not committed to be retained, can be modified
or deleted at any time.

Copyright © 2014 EMC Corporation. All rights reserved Module 9: Data Security 3
DD Retention Lock comes in two, separately licensed, editions:
• DD Retention Lock Governance edition maintains the integrity of the archive data
with the assumption that the system administrator is generally trusted, and thus any
actions taken by the system administrator are valid as far as the data integrity of the
archive data is concerned.
• DD Retention Lock Compliance edition is designed to meet strict regulatory
compliance standards such of those of the United States Securities and Exchange
Commission. When DD Retention Lock Compliance is installed and deployed on an
EMC Data Domain system, it requires additional authorization by a Security Officer for
system functions to safeguard against any actions that could compromise data
integrity.

Copyright © 2014 EMC Corporation. All rights reserved Module 9: Data Security 4
The capabilities built into Data Domain Retention Lock are based on governance and
compliance archive data requirements.

Governance archive data requirements:


Governance standards are considered to be lenient in nature – allowing for flexible control of
retention policies, but not at the expense of maintaining the integrity of the data during the
retention period. These standards apply to environments where the system administrator is
trusted with his administrator actions.

The storage system has to securely retain archive data per corporate governance standards
and must meet the following requirements:
• Allow archive files to be committed for a specific period of time during which the
contents of the secured file cannot be deleted or modified.
• Allow for deletion of the retained data after the retention period expires.
• Allow for ease of integration with existing archiving application infrastructure through
CIFS and NFS.
• Provide flexible policies such as allow extending the retention period of a secured file,
revert of locked state of the archived file, etc.
• Ability to replicate both the retained archive files and retention period attribute to a
destination site to meet the disaster recovery (DR) needs for archived data.

Copyright © 2014 EMC Corporation. All rights reserved Module 9: Data Security 5
Compliance archive data requirements:
Securities and Exchange Commission (“SEC”) rules define compliance standards for archive
storage to be retained on electronic storage media, which must meet certain conditions:
• Preserve the records exclusively in a non-writeable, non-erasable format.
• Verify automatically the quality and accuracy of the storage media recording process.
• Serialize the original, and any duplicate units of storage media, and the time-date for
the required retention period for information placed on the storage media.
• Store, separately from the original, a duplicate copy of the record on an SEC-approved
medium for the time required.

Data Domain Retention Lock Governance edition maintains the integrity of the archive data
with the assumption that the system administrator is trusted, and that any actions they take
are valid to maintain the integrity of the archive data.

Data Domain Retention Lock Compliance edition is designed to meet the regulatory
compliance standards such as those set by the SEC standards, for records (SEC 17a-4(f)).
Additional security authorization is required to manage the manipulation of retention
periods, as well as renaming MTrees designated for retention lock.

Note: DD Retention Lock software cannot be used with EMC Data Domain GDA models or
with the DD Boost protocol. Attempts to apply retention lock to MTrees containing files
created by DD Boost will fail.

Copyright © 2014 EMC Corporation. All rights reserved Module 9: Data Security 6
As discussed in the Basic Administration module, a security privilege can be assigned to user
accounts:
• In the Enterprise Manager when user accounts are created.
• In the CLI when user accounts are added.
This security privilege is in addition to the user and admin privileges.

A user assigned the security privilege is called a security officer.


The security officer can run a command via the CLI called the runtime authorization policy.

Updating or extending retention periods, and renaming MTrees, requires the use of the
runtime authorization policy. When enabled, runtime authorization policy is invoked on the
system for the length of time the security officer is logged in to the current session.

Runtime authorization policy, when enabled, authorizes the security officer to provide
credentials, as part of a dual authorization with the admin role, to set-up and modify both
retention lock compliance features, and data encryption features as you will learn later in
this module.

Copyright © 2014 EMC Corporation. All rights reserved Module 9: Data Security 7
• Enable DD Retention Lock Governance, Compliance, or both on the Data Domain
system. (You must have a valid license for DD Retention lock Governance and/or
Compliance.)
• Enable MTrees for governance or compliance retention locking using the System
Manger or CLI commands.
• Commit files to be retention locked on the Data Domain system using client-side
commands issued by an appropriately configured archiving or backup application,
manually, or using scripts.
• (Optional) Extend file retention times or delete files with expired retention periods
using client-side commands.

Copyright © 2014 EMC Corporation. All rights reserved Module 9: Data Security 8
After an archive file has been migrated onto a Data Domain system, it is the responsibility of
the archiving application to set and communicate the retention period attribute to the Data
Domain system. The archiving application sends the retention period attribute over standard
industry protocols.

The retention period attribute used by the archiving application is the last access time--the
atime. DD Retention Lock software allows granular management of retention periods on a
file-by-file basis. As part of the configuration and administrative setup process of the DD
Retention Lock software, a minimum and maximum time-based retention period for each
MTree is established. This ensures that the atime retention expiration date for an archive file
is not set below the minimum, or above the maximum, retention period.

The archiving application must set the atime value, and DD Retention Lock must enforce it, to
avoid any modification or deletion of files under retention of the file on the Data Domain
system. For example, Symantec Enterprise Vault retains records for a user-specified amount
of time. When Enterprise Vault retention is in effect, these documents cannot be modified or
deleted on the Data Domain system. When that time expires, Enterprise Vault can be set to
automatically dispose of those records.

Copyright © 2014 EMC Corporation. All rights reserved Module 9: Data Security 9
Locked files cannot be modified on the Data Domain system even after the retention period
for the file expires. Files can be copied to another system and then be modified. Archive data
retained on the Data Domain system after the retention period expires is not deleted
automatically. An archiving application must delete the remaining files, or they must be
removed manually.

Copyright © 2014 EMC Corporation. All rights reserved Module 9: Data Security 10
You can configure DD Retention Lock Governance using the Enterprise Manager or by using
CLI commands. Enterprise Manager provides the capability to modify the minimum and
maximum retention period for selected MTrees. In the example above, the Modify dialog is
for the MTree /data/col1/IT.

To configure retention lock:


1. Select the system in the navigation pane.
2. Select Data Management > MTree.
3. Select the MTree you want to edit with DD Retention Lock.
4. Go to the Retention Lock pane at the bottom of the window.
5. Click Edit.
6. Check the box to enable retention lock.
7. Enter the retention period or select Default.
8. Click OK.

Copyright © 2014 EMC Corporation. All rights reserved Module 9: Data Security 11
Related CLI commands:
# mtree retention-lock disable mtree
Disables the retention-lock feature for the specified MTree.

# mtree retention-lock enable mtree


Enables the retention-lock feature for the specified MTree.
Note: You cannot rename non-empty folders or directories within a retention-locked
MTree; however, you can rename empty folders or directories and create new ones.

# mtree retention-lock reset


Resets the minimum or maximum retention period for the specified MTree to its
default value.

# mtree retention-lock revert


Reverts the retention lock for all files on a specified path.

# mtree retention-lock set


Sets the minimum or maximum retention period for the specified MTree.

# mtree retention-lock show


Shows the minimum or maximum retention period for the specified MTree.

# mtree retention-lock status mtree


Shows the retention-lock status for the specified MTree. Possible values are enabled,
disabled, and previously enabled.

Copyright © 2014 EMC Corporation. All rights reserved Module 9: Data Security 12
The DD Retention Lock Compliance edition meets the strict requirements of regulatory
standards for electronic records, such as SEC 17a-4(f), and other standards that are practiced
worldwide.

DD Retention Lock Compliance, when enabled on an MTree, ensures that all files locked by
an archiving application, for a time-based retention period, cannot be deleted or overwritten
under any circumstances until the retention period expires. This is archived using multiple
hardening procedures:
• Requiring dual sign-on for certain administrative actions. Before engaging DD
Retention Lock Compliance edition, the System Administrator must create a Security
Officer role. The System Administrator can create the first Security Officer, but only
the Security Officer can create other Security Officers on the system.
Some of the actions requiring dual sign-on are:
• Extending the retention periods for an MTree.
• Renaming the MTree.
• Deleting the Retention Lock Compliance license from the Data Domain system.
• Securing the system clock from illegal updates
If the system clock is skewed more than 15 minutes or more than 2 weeks in a year,
the file system will shut down and can be resumed only by providing Security Officer
credentials.

Copyright © 2014 EMC Corporation. All rights reserved Module 9: Data Security 13
• Completely disallowing operations that could lead to a compromise in the state of
locked and retained archive data.
• Note: Retention lock is not currently supported with DD Boost and VTL Pool MTrees.

Removing retention lock compliance requires a fresh installation of the DD OS using a USB
key installation. Contact Data Domain Support for assistance in performing this operation as
it is not covered in this course.

Copyright © 2014 EMC Corporation. All rights reserved Module 9: Data Security 14
Copyright © 2014 EMC Corporation. All rights reserved Module 9: Data Security 15
In this lesson, you learn the function of data sanitization and how to run a command from
the CLI to sanitize data on a Data Domain system.

Copyright © 2014 EMC Corporation. All rights reserved Module 9: Data Security 16
Data sanitization is sometimes referred to as electronic shredding.

With the data sanitization function, deleted files are overwritten using a DoD/NIST-compliant
algorithm and procedures. No complex setup or system process disruption is required.
Current, existing data is available during the sanitization process, with limited disruption to
daily operations. Sanitization is the electronic equivalent of data shredding. Normal file
deletion provides residual data that allows recovery. Sanitization removes any trace of
deleted files with no residual remains.

Sanitization supports organizations (typically government organizations) that:


• Are required to delete data that is no longer needed.
• Need to resolve (remove and destroy) classified message incidents. Classified
message incident (CMI) is a government term that describes an event where data of a
certain classification is inadvertently copied into another system that is not certified
for data of that classification.

Copyright © 2014 EMC Corporation. All rights reserved Module 9: Data Security 17
The system sanitize command erases content in the following locations:
• Segments of deleted files not used by other files
• Contaminated metadata
• All unused storage space in the file system
• All segments used by deleted files that cannot be globally erased, because some
segments might be used by other files

Sanitization can be run only by using the CLI.

Copyright © 2014 EMC Corporation. All rights reserved Module 9: Data Security 18
When you issue the system sanitize start command, you are prompted to consider the length
of time required to perform this task. The system advises that it can take longer than the
time it takes to reclaim space holding expired data on the system (filesys clean). This can be
several hours or longer, if there is a high percentage of space to be sanitized.

During sanitization, the system runs through five phases: merge, analysis, enumeration, copy,
and zero.
• Merge: Performs an index merge to flush all index data to disk.
• Analysis: Reviews all data to be sanitized. This includes all stored data.
• Enumeration: Reviews all of the files in the logical space and remembers what data is
active.
• Copy: Copies live data forward and frees the space it used to occupy.
• Zero: Writes zeroes to the disks in the system.

You can view the progress of these five phases by running the system sanitize
watch command.

Copyright © 2014 EMC Corporation. All rights reserved Module 9: Data Security 19
Related CLI commands:
# system sanitize abort
Aborts the sanitization process

# system sanitize start


Starts sanitization process immediately

# system sanitize status


Shows current sanitization status

# system sanitize watch


Monitors sanitization progress

Copyright © 2014 EMC Corporation. All rights reserved Module 9: Data Security 20
Copyright © 2014 EMC Corporation. All rights reserved Module 9: Data Security 21
In this lesson, you learn about the features, benefits, and function of the encryption of data
at rest feature.

You also learn about the purpose of other security features, such as file system locking, and
when and how to use this feature.

Copyright © 2014 EMC Corporation. All rights reserved Module 9: Data Security 22
Data encryption protects user data if the Data Domain system is stolen, or if the physical
storage media is lost during transit, and eliminates accidental exposure of a failed drive if it is
replaced. In addition, if an intruder ever gains access to encrypted data, the data is
unreadable and unusable without the proper cryptographic keys.

Encryption of data at rest:


• Enables data on the Data Domain system to be encrypted, while being saved and
locked, before being moved to another location.
• Is also called inline data encryption.
• Protects data on a Data Domain system from unauthorized access or accidental
exposure.
• Requires an encryption software license.
• Encrypts all ingested data.
• Does not automatically encrypt data that was in the system before encryption was
enabled. Such data can be encrypted by enabling an option to encrypt existing data.

Furthermore, you can use all of the currently supported backup applications described in the
Backup Application Matrix on the Support Portal with the Encryption of Data at Rest feature.

Copyright © 2014 EMC Corporation. All rights reserved Module 9: Data Security 23
There are two available key management options:
• Starting with DD OS 5.2, an optional external encryption key management capability
has been added, the RSA Data Protection Manager (DPM) Key Manager. The
preexisting local encryption key administration method is still in place. You can
choose either method to manage the Data Domain encryption key.
• The Local Key Manager provides a single encryption key per Data Domain system.

A single internal Data Domain encryption key is available on all Data Domain systems.

The first time Encryption of Data at Rest is enabled, the Data Domain system randomly
generates an internal system encryption key. After the key is generated, the system
encryption key cannot be changed and is not accessible to a user.

The encryption key is further protected by a passphrase, which is used to encrypt the
encryption key before it is stored in multiple locations on disk. The passphrase is user-
generated and requires both an administrator and a security officer to change it.

• The RSA DPM Key Manager enables the use of multiple, rotating keys on a Data
Domain system.

Copyright © 2014 EMC Corporation. All rights reserved Module 9: Data Security 24
• The RSA DPM Key Manager consists of a centralized RSA DPM Key Manager Server
and the embedded DPM client on each Data Domain system.

• The RSA DPM Key Manager is in charge of the generation, distribution, and lifecycle
management of multiple encryption keys. Keys can be rotated on a regular basis,
depending on the policy. A maximum number of 254 keys is supported.

• If the RSA DPM Key Manager is configured and enabled, the Data Domain systems
uses keys provided by the RSA DPM Key Manager Server.

Note: Only one encryption key can be active on a Data Domain system. The DPM Key
Manager provides the active key. If the same DPM Key Manager manages multiple Data
Domain systems, all will have the same active key—if they are synced, and the Data Domain
file system has been restarted.

Copyright © 2014 EMC Corporation. All rights reserved Module 9: Data Security 25
With the encryption software option licensed and enabled, all incoming data is encrypted
inline before it is written to disk. This is a software-based approach, and it requires no
additional hardware. It includes:
• Configurable 128-bit or 256-bit advanced encryption standard (AES) algorithm with
either:
 Confidentiality with cipher-block chaining (CBC) mode.
Or
 Both confidentiality and message authenticity with Galois/Counter (GCM) mode

• Encryption and decryption to and from the disk is transparent to all access protocols:
DD Boost, NFS, CIFS, NDMP tape server, and VTL (no administrative action is required
for decryption).

Copyright © 2014 EMC Corporation. All rights reserved Module 9: Data Security 26
When data is backed up, data enters via NFS, CIFS, VTL, DD Boost, and NDMP tape server
protocols. It is then:
• Segmented
• Fingerprinted
• Deduplicated (or globally compressed)
• Grouped
• Locally compressed
• Encrypted

Note: When enabled, the encryption at rest feature encrypts all data entering the Data
Domain system. You cannot enable encryption at a more granular level.

Copyright © 2014 EMC Corporation. All rights reserved Module 9: Data Security 27
Procedures requiring authorization must be dual-authenticated by the security officer and
the user in the admin role.

For example, to set encryption, the admin enables the feature, and the security officer
enables runtime authorization.

A user in the administrator role interacts with the security officer to perform a command
that requires security officer sign off.

In a typical scenario, the admin issues the command, and the system displays a message that
security officer authorizations must be enabled. To proceed with the sign-off, the security
officer must enter his or her credentials on the same console at which the command option
was run. If the system recognizes the credentials, the procedure is authorized. If not, a
Security alert is generated. The authorization log records the details of each transaction.

Copyright © 2014 EMC Corporation. All rights reserved Module 9: Data Security 28
With encryption active in the Data Domain system, the Encryption tab within the File System
section of the Data Domain Enterprise Manager shows the current status of system
encryption of data at rest.
The status indicates Enabled, Disabled, or Not configured. In the slide, the encryption status
is Not configured.

(continued on the next slide)

Copyright © 2014 EMC Corporation. All rights reserved Module 9: Data Security 29
To configure encryption:
1. Click Configure. You are prompted for a passphrase. The system generates an
encryption key and uses the passphrase to encrypt the key. One key is used to
encrypt all data written to the system. After encryption is enabled, the passphrase is
used by system administrators only when locking or unlocking the file system, or
when disabling encryption. The current passphrase size for DD OS 5.4 is 256
characters.
Caution: Unless you can reenter the correct passphrase, you cannot unlock the file system
and access the data. The data will be irretrievably lost.
2. Enter a passphrase and then click Next.
3. Choose the encryption algorithm:
 Configurable 128-bit or 256-bit Advanced Encryption Standard (AES) algorithm
with either:
 Confidentiality with Cipher Block Chaining (CBC) mode
 Both confidentiality and message authenticity with Galois/Counter (GCM)
mode
 In this configuration window, you can optionally apply encryption to data that
existed on the system before encryption was enabled.
4. Select whether you will obtain the encryption key from the Data Domain system or an
external RSA Key Manager.

Copyright © 2014 EMC Corporation. All rights reserved Module 9: Data Security 30
5. Once configured, click Next.
6. Verify the settings in the Summary dialog, and restart the file system to enable
encryption. If you do not select to restart the file system at this time, you need to
disable and re-enable the file system before encryption will begin.
7. Click OK.
8. Click Close to finish the configuration.

Related CLI commands:


# filesys disable
Disables the file system

# filesys encryption enable


Enables encryption. Enter a passphrase when prompted

# filesys encryption algorithm set algorithm


Sets an alternative cryptographic algorithm (optional). Default algorithm is
aes_256_cbc. Other options are: aes_128_cbc, aes_128_gcm, or aes_256_gcm

# filesys enable
Enables the file system

Copyright © 2014 EMC Corporation. All rights reserved Module 9: Data Security 31
Only administrative users with security officer credentials can change the encryption
passphrase.

To change the existing encryption passphrase:


1. Disable the file system by clicking the disable button on the State line of the File
System section.
The slide shows the file system state as disabled and shut down after the disable
button clicked.
2. Click Change Passphrase.
3. Enter the security officer credentials to authorize the passphrase change.
4. Enter the current passphrase.
5. Enter the new passphrase twice.
6. Click Enable file system now if you want to reinstate services with the new
passphrase; otherwise the passphrase does not go into effect until the file system is
re-enabled.
7. Click OK to proceed with the passphrase change.

Copyright © 2014 EMC Corporation. All rights reserved Module 9: Data Security 32
Only administrative users with security officer credentials can disable encryption.
To disable encryption on a Data Domain system:
1. Click Disable on the Encryption status line of the Encryption tab.
2. Enter the security officer credentials.
3. Click Restart file system now in order to stop any further encryption of data at rest.
Note: Restarting the file system will interrupt any processes currently running on the
Data Domain system.
4. Click OK to continue.

Related CLI commands:


# filesys encryption disable
Disables encryption. You are prompted for a security officer username and password
in order to disable encryption from the command line.

# filesys disable
Disables the file system.

# filesys enable
Enables the file system. The file system must be disabled and re-enabled to effect
encryption operations.

Copyright © 2014 EMC Corporation. All rights reserved Module 9: Data Security 33
Use file system locking when an encryption-enabled Data Domain system and its external
storage devices (if any) are being transported. Without the encryption provided in file system
locking, user data could possibly be recovered by a thief with forensic tools (especially if local
compression is turned off). This action requires two-user authentication – a sysadmin and a
security officer – to confirm the lock-down action.

File system locking:


• Requires the user name and password of a security officer account to lock the file
system.
• Protects the Data Domain system from unauthorized data access.
• Is run only with the file system encryption feature enabled. File system locking
encrypts all user data, and the data cannot be decrypted without the key.
• A passphrase protects the encryption key, which is stored on disk, and is encrypted by
the passphrase. With the system locked, this passphrase cannot be retrieved.
• Allows only an admin, who knows the set passphrase, to unlock an encrypted file
system.

Copyright © 2014 EMC Corporation. All rights reserved Module 9: Data Security 34
Before you can lock the file system, the file system must be stopped, disabled, and shut
down.

To lock the file system:


1. In the passphrase area, enter the current passphrase (if one existed before) followed
by a new passphrase that locks the file system for transport. Repeat the passphrase
in the Confirm New Passphrase field.
2. Click OK to continue.
After the new passphrase is entered, the system destroys the cached copy of the
current passphrase. Therefore, anyone who does not possess the new passphrase
cannot decrypt the data.
Caution: Be sure to take care of the passphrase. If the passphrase is lost, you will
never be able to unlock the file system and access the data. There is no backdoor
access to the file system. The data is irretrievably lost.
3. Shut down the system using the system poweroff command from the command line
interface (CLI).

Caution: Do not use the chassis power switch to power off the system. There is no
other method for shutting down the system to invoke file system locking.

Copyright © 2014 EMC Corporation. All rights reserved Module 9: Data Security 35
To unlock the file system:
1. Power on the Data Domain system.
2. Return to the Encryption view in the Data Domain Enterprise Manager and click the
Unlock File System button.
3. Enter the current lock file system passphrase. The file system re-enables itself.

Related CLI commands:


# filesys encryption lock
Locks the system by creating a new passphrase and destroying the cached copy of the
current passphrase. Before you run this command, you must run filesys disable and
enter security officer credentials.

# filesys encryption passphrase change


Changes the passphrase for system encryption keys. Before running this command,
you must run filesys disable and enter security officer credentials.

# filesys encryption show


Checks the status of the encryption feature.

# filesys encryption unlock


Prepares the encrypted file system for use after it has arrived at its destination.

Copyright © 2014 EMC Corporation. All rights reserved Module 9: Data Security 36
Copyright © 2014 EMC Corporation. All rights reserved Module 9: Data Security 37
Copyright © 2014 EMC Corporation. All rights reserved Module 9: Data Security 38
In any backup environment, it is critical to plan capacity and throughput adequately. Planning
ensures your backups complete within the time required and are securely retained for the
needed times. Data growth in backups is also a reality as business needs change. Inadequate
capacity and bandwidth to perform the backup can cause backups to lag, or fail to complete.
Unplanned growth can fill a backup device sooner than expected and choke backup
processes.

The main goal in capacity planning is to design your system with a Data Domain model and
configuration that is able to hold the required data for the required retention periods and
have plenty of space left over to avoid system full conditions.

For throughput planning, the goal is to ensure the link bandwidth is sufficient to perform
daily and weekly backups to the Data Domain system within the backup window allotted.
Good throughput planning takes into consideration network bandwidth sharing, along with
adequate backup and system housekeeping timeframes (windows).

Copyright © 2014 EMC Corporation. All rights reserved Module 10: Sizing, Capacity and Throughput Planning and Tuning 1
In this lesson, you become familiar with the testing and evaluation process that helps to
determine the capacity requirements of a Data Domain system.

• Collecting information
• Determining and calculating capacity needs

EMC Sales uses detailed software tools and formulas when working with its customers to
identify backup environment capacity and throughput needs. Such tools help systems
architects recommend systems with appropriate capacities and correct throughput to meet
those needs. This lesson discusses the most basic considerations for capacity and throughput
planning.

Copyright © 2014 EMC Corporation. All rights reserved Module 10: Sizing, Capacity and Throughput Planning and Tuning 2
Using information collected about the backup system, you calculate capacity needs by
understanding the amount of data (data size) to be backed up, the types of data, the size of a
full (complete) backup, and the expected data reduction rates (deduplication).

Data Domain system internal indexes and other product components use additional, variable
amounts of storage, depending on the type of data and the sizes of files. If you send different
data sets to otherwise identical systems, one system may, over time, have room for more or
less actual backup data than another.

Data reduction factors depend on the type of data being backed up. Some types of
challenging (deduplication-unfriendly) data types include:
• pre-compressed (multimedia, .mp3, .zip, and .jpg)
• pre-encrypted data

Secondly, retention policies greatly determine the amount of deduplication that can be
realized on a Data Domain system. The longer data is retained, the greater the data reduction
that can be realized. A backup schedule where retained data is repeatedly replaced with new
data ensures very little data reduction.

Copyright © 2014 EMC Corporation. All rights reserved Module 10: Sizing, Capacity and Throughput Planning and Tuning 3
The reduction factors listed in this slide are examples of how changing retention rates can
improve the amount of data reduction over time.

The compression rates shown are approximate.

A daily full backup held only for one week on a Data Domain system may realize no more
than a compression factor of 5x, while holding weekly backups plus daily incrementals for up
to 90 days may result in 20x or higher compression.

Data reduction rates depend on a number of variables including data types, the amount of
similar data, and the length of storage. It is difficult to determine exactly what rates to expect
from any given system. The highest rates are usually achieved when many full backups are
stored.

When calculating capacity planning, use average rates as a starting point for your calculations
and refine them after real data is available.

Copyright © 2014 EMC Corporation. All rights reserved Module 10: Sizing, Capacity and Throughput Planning and Tuning 4
Calculate the required capacity by adding up the space required in this manner:
• First Full backup plus
• Incremental backups (the number of days incrementals are run—typically 4-6) plus
• Weekly cycle (one weekly full and 4-6 incrementals) times the number of weeks data
is retained.

For example, 1 TB of data is backed up, and a conservative compression rate is estimated at
5x (which may have come from a test or is a reasonable assumption to start with). This gives
200 GB needed for the initial backup. With a 10 percent change rate in the data each day,
incremental backups are 100 GB each, and with an estimated compression on these of 10x,
the amount of space required for each incremental backup is 10 GB.

As subsequent full backups run, it is likely that the backup yields a higher data reduction rate.
25x is estimated for the data reduction rate on subsequent full backups. 1 TB of data
compresses to 40 GB.

Four daily incremental backups require 10 GB each, and one weekly backup needing 40 GB
yields a burn rate of 80 GB per week. Running the 80 GB weekly burn rate out over the full 8-
week retention period means that an estimated 640 GB is needed to store the daily
incremental backups and the weekly full backups.

Copyright © 2014 EMC Corporation. All rights reserved Module 10: Sizing, Capacity and Throughput Planning and Tuning 5
Adding this to the initial full backup gives a total of 840 GB needed. On a Data Domain
system with 1 TB of usable capacity, this means the unit operates at about 84% of capacity.
This may be okay for current needs. You might want to consider a system with a larger
capacity or that can have additional storage added, which might be a better choice to allow
for data growth.

Again, these calculations are for estimation purposes only. Before determining true capacity,
use the analysis of real data gathered from your system as a part of an EMC BRS sizing
evaluation.

Copyright © 2014 EMC Corporation. All rights reserved Module 10: Sizing, Capacity and Throughput Planning and Tuning 6
In this lesson, you will become familiar with the testing and evaluation process that helps to
determine the throughput requirements of a Data Domain system.

EMC Sales uses detailed software tools and formulas when working with customers to
identify backup environment capacity and throughput needs. Such tools help systems
architects recommend systems with appropriate capacities and correct throughput to meet
those needs. This lesson discusses the most basic considerations for capacity and throughput
planning.

Copyright © 2014 EMC Corporation. All rights reserved Module 10: Sizing, Capacity and Throughput Planning and Tuning 7
While capacity is one part of the sizing calculation, it is important not to neglect the
throughput of the data during backups.

An assumption would be that the greatest backup need is to process a full 200 GB backup
within a 10-hour backup window. Incremental backups should require much less time to
complete, and we could safely presume that incremental backups would easily complete
within the backup window.

Dividing 200 GB by 10 hours yields a raw processing requirement of at least 20 GB per hour.

Over an unfettered 1 GB network with maximum bandwidth available (with a theoretical 270
GB per hour throughput), this backup would take less than 1 hour to complete. If the
network were sharing throughput resources during the backup time window, the amount of
time required to complete the backup would increase considerably.

It is important to note the effective throughput of both the Data Domain system and the
network on which it runs. Both points in data transfer determine whether the required
speeds are reliably feasible. Feasibility can be assessed by running network testing software
such as iperf.

Copyright © 2014 EMC Corporation. All rights reserved Module 10: Sizing, Capacity and Throughput Planning and Tuning 8
This lesson applies the formulae from the previous two lessons to selecting the best Data
Domain system to fit specific capacity and throughput requirements.

Copyright © 2014 EMC Corporation. All rights reserved Module 10: Sizing, Capacity and Throughput Planning and Tuning 9
The system capacity numbers of a Data Domain system assume a mix of typical enterprise
backup data (such as file systems, databases, mail, and developer files). The low and high
ends of the range are also determined by how often data is backed up.

The maximum capacity for each Data Domain model assumes the maximum number of
drives (either internal or external) supported for that model.

Maximum throughput for each Data Domain model is dependent mostly on the number and
speed capability of the network interfaces being used to transfer data. Some Data Domain
systems have more and faster processors so they can process incoming data faster.

Note: Advertised capacity and throughput ratings for Data Domain products are best case
results, based on tests conducted in laboratory conditions. Your throughput will vary
depending on your network conditions.

The number of network streams you may expect to use depends on your hardware model.
Refer to the specific model Data Domain system guide to learn specific maximum supported
stream counts.

Copyright © 2014 EMC Corporation. All rights reserved Module 10: Sizing, Capacity and Throughput Planning and Tuning 10
Standard practices are to be conservative in calculating capacity and throughput required for
the needs of a specific backup environment; estimate the need for greater throughput and
capacity rather than less. Apply your requirements against conservative ratings (not the
maximums) of the Data Domain system needed to meet requirements. Allow for a minimum
20% buffer in both capacity and throughput requirements.

• Required capacity divided by maximum capacity of a particular model times 100


equals the capacity percentage.
• Required throughput divided by the maximum throughput of a particular model times
100 equals the throughput percentage.

If the capacity or throughput percentage for a particular model does not provide at least a
20% buffer, then calculate the capacity and throughput percentages for a Data Domain
model of the next higher capacity. For example, if the capacity calculation for a DD620 yields
a capacity percentage of 91%, only a 9% buffer is available, so you should look at the DD640
next to calculate its capacity.

Sometimes one model provides adequate capacity, but does not provide enough throughput,
or vice versa. The model selection must accommodate both throughput and capacity
requirements with an appropriate buffer.

Copyright © 2014 EMC Corporation. All rights reserved Module 10: Sizing, Capacity and Throughput Planning and Tuning 11
In this example, the capacity requirement of 3248 GB fills Model A to 97% of capacity.

Model B has a capacity of 7.2 TB. The capacity percentage estimated for Model B is 45%, and
the 55% buffer is more than adequate.

Copyright © 2014 EMC Corporation. All rights reserved Module 10: Sizing, Capacity and Throughput Planning and Tuning 12
In this example 3,248 GB capacity is needed.

It appears by the capacity specifications that Model A does not meet this need with only
3,350 GB capacity. It leaves only a 3% buffer.

Model A with an additional shelf, offers 7,974 GB capacity. A 60% buffer is clearly a better
option.

Model B is also a viable option with 7,216 GB capacity – a 55% buffer.

Copyright © 2014 EMC Corporation. All rights reserved Module 10: Sizing, Capacity and Throughput Planning and Tuning 13
This calculation is similar to calculating the capacity buffer for selected models.

Select a model that meets throughput requirements with no more than 80% of the model’s
maximum throughput capacity.

In this example, the throughput requirement of 1,200 GB per hour would load Model A to
more than 89% of capacity, with a buffer of 11%.

A better selection is a model with higher throughput capability, such Model B, rated with
2,252 GB per hour throughput and offering a 47% buffer in estimated throughput.

Copyright © 2014 EMC Corporation. All rights reserved Module 10: Sizing, Capacity and Throughput Planning and Tuning 14
In summary, Model A with an additional shelf might meet the capacity requirement; Model B
is the minimum model that would meet the throughput performance requirement.

While Model A meets the storage capacity requirement, Model B is the best choice based
upon the need for greater throughput.

Another option is to consider implementing DD Boost with Model A to raise the throughput
rating.

Copyright © 2014 EMC Corporation. All rights reserved Module 10: Sizing, Capacity and Throughput Planning and Tuning 15
This lesson covers basic throughput monitoring and tuning on a Data Domain System.

There are three primary steps to throughput:


• Identifying potential bottlenecks that might reduce the data transfer rates during
backups and restores.
• Displaying and understanding Data Domain system performance metrics.
• Identifying and implementing viable solutions to resolve slower-than-expected
throughput issues.

Copyright © 2014 EMC Corporation. All rights reserved Module 10: Sizing, Capacity and Throughput Planning and Tuning 16
Integrating Data Domain systems into an existing backup architecture can change the
responsiveness of the backup system. Bottlenecks can appear and restrict the flow of data
being backed up.

Some possible bottlenecks are:


• Clients
 Disk Issues
 Configuration
 Connectivity
• Network
 Wire speeds
 Switches and routers
 Routing protocols and firewalls
• Backup Server
 Configuration
 Load
 Connectivity

Copyright © 2014 EMC Corporation. All rights reserved Module 10: Sizing, Capacity and Throughput Planning and Tuning 17
• Data Domain System
 Connectivity
 Configuration
 Log level set too high

As demand shifts among system resources – such as the backup host, client, network, and
Data Domain system itself – the source of the bottlenecks can shift as well.

Eliminating bottlenecks where possible, or at least mitigating the cause of reduced


performance through system tuning, is essential to a productive backup system. Data
Domain systems collect and report performance metrics through real-time reporting and in
log files to help identify potential bottlenecks and their causes.

Copyright © 2014 EMC Corporation. All rights reserved Module 10: Sizing, Capacity and Throughput Planning and Tuning 18
If you notice backups running slower than expected, it is useful to review system
performance metrics.

From the command line, use the command system show performance

The command syntax is:


# system show performance [ {hr | min | sec} [ {hr | min | sec} ]]

For example:
# system show performance 24 hr 10 min
This shows the system performance for the last 24 hours at 10 minute intervals. 1
minute is the minimum interval.

Servicing a file system request consists of three steps: receiving the request over the
network, processing the request, and sending a reply to the request.

Copyright © 2014 EMC Corporation. All rights reserved Module 10: Sizing, Capacity and Throughput Planning and Tuning 19
Utilization is measured in four states:
• ops/s
Operations per second.
• load
Load percentage (pending ops/total RPC ops *100).
• data (MB/s in/out)
Protocol throughput. Amount of data the file system can read from and write to the
kernel socket buffer.
• wait (ms/MB in/out)
Time taken to send and receive 1MB of data from the file system to kernel socket
buffer.

Copyright © 2014 EMC Corporation. All rights reserved Module 10: Sizing, Capacity and Throughput Planning and Tuning 20
An important section of the system show performance output is the CPU and disk utilization.

• CPU avg/max: The average and maximum CPU utilization; the CPU ID of the most-
loaded CPU is shown in the brackets.
• Disk max: Maximum disk utilization over all disks; the disk ID of the most-loaded disk
is shown in the brackets.

If the CPU utilization shows 80% or greater, or if the disk utilization is 60% or greater for an
extended period of time, the Data Domain system is likely to run out of disk capacity or is the
CPU processing maximum. Check that there is no cleaning or disk reconstruction in progress.
You can check cleaning and disk reconstruction in the State section of the system show
performance report.

Copyright © 2014 EMC Corporation. All rights reserved Module 10: Sizing, Capacity and Throughput Planning and Tuning 21
The following is a list of states and their meaning indicated in the system show
performance output:
• C – Cleaning
• D – Disk reconstruction
• B – GDA (also known as multinode cluster [MNC] balancing)
• V – Verification (used in the deduplication process)
• M – Fingerprint merge (used in the deduplication process)
• F – Archive data movement (active to archive)
• S – Summary vector checkpoint (used in the deduplication process)
• I – Data integrity

Typically the processes listed in the State section of the system show performance report
impact the amount of CPU utilization for handling backup and replication activity.

Copyright © 2014 EMC Corporation. All rights reserved Module 10: Sizing, Capacity and Throughput Planning and Tuning 22
In addition to watching disk utilization, you should monitor the rate at which data is being
received and processed. These throughput statistics are measured at several points in the
system to assist with analyzing the performance to identify bottlenecks.

If slow performance is happening in real-time, you can also run the following command:

# system show stats interval [interval in seconds]

Example:
# system show stats interval 2
Adding 2 produces a new line of data every 2 seconds.

The system show stats command reports CPU activity and disk read/write amounts.

In the example report shown, you can see a high and steady amount of data inbound on the
network interface, which indicates that the backup host is writing data to the Data Domain
device. We know it is backup traffic and not replication traffic as the Repl column is reporting
no activity.

Copyright © 2014 EMC Corporation. All rights reserved Module 10: Sizing, Capacity and Throughput Planning and Tuning 23
Low disk-write rates relative to steady inbound network activity are likely because much of
the incoming data segments are duplicates of segments already stored on disk. The Data
Domain system is identifying the duplicates in real time as they arrive and writing only those
new segments it detects.

Copyright © 2014 EMC Corporation. All rights reserved Module 10: Sizing, Capacity and Throughput Planning and Tuning 24
If you experience system performance concerns, for example, you are exceeding your backup
window, or if throughput appears to be slower than expected, consider the following:
• Check the Streams columns of the system show performance command to make sure
that the system is not exceeding the recommended write and read stream count.
Look specifically under rd (active read streams) and wr (active write streams) to
determine the stream count. Compare this to the recommended number of streams
allowed for your system. If you are unsure about the recommended streams number,
contact Data Domain Support for assistance.
• Check that CPU utilization (1 – process) is not unusually high. If you see CPU
utilization at or above 80%, it is possible that the CPU is under-powered for the load
it is required to currently process.
• Check the State output of the system show performance command. Confirm that
there is no cleaning (C) or disk reconstruction (D) in progress.
• Check the output of the replication show performance all command. Confirm that
there is no replication in progress. If there is no replication activity, the output reports
zeros. Press Ctrl + c to stop the command. If replication is occurring during data
ingestion and causing slower-than-expected performance, you might want to
separate these two activities in your backup schedule.
• If CPU utilization (1 – process) is unusually high for any extended length, and you are
unable to determine the cause, contact Data Domain Support for further assistance.

Copyright © 2014 EMC Corporation. All rights reserved Module 10: Sizing, Capacity and Throughput Planning and Tuning 25
• When you are identifying performance problems, it is important to note the actual
time when poor performance was observed to know where to look in the system
show performance output chronology.

An example of a network-related problem occurs when the client is trying to access the Data
Domain system over a 100 MBit network, rather than a 1 GB network.
• Check network settings, and ensure the switch is running 1 GB to the Data Domain
system and is not set to 100 MBit
• If possible, consider implementing link aggregation.
• Isolate the network between the backup server and the Data Domain system. Shared
bandwidth adversely impacts optimum network throughput.
• Consider implementing DD Boost to improve overall transfer rates between backup
hosts and Data Domain systems.

Copyright © 2014 EMC Corporation. All rights reserved Module 10: Sizing, Capacity and Throughput Planning and Tuning 26
Copyright © 2014 EMC Corporation. All rights reserved Module 10: Sizing, Capacity and Throughput Planning and Tuning 27
Copyright © 2014 EMC Corporation. All rights reserved Module 10: Sizing, Capacity and Throughput Planning and Tuning 28

You might also like