You are on page 1of 17

EMC Data Domain :

Data Protection and Deduplication

Copyright 2010 EMC Corporation. All rights reserved.

Why backup?
Goals
Backups are done for restores
Operational
Disaster Recovery

Disaster recovery requires offsite backup


Operational recovery requires onsite backup
Need both onsite and offsite copies on disk
Need quick restores
Dont have time for moving physical assets

Protection of personal data & intellectual property

Copyright 2010 EMC Corporation. All rights reserved.

Why So Much Interest in


Data Deduplication?
Backup & Archive processes have been
overwhelmed by information growth
Primary storage efficiency has become a
necessity to cope with massive growth
ROI drives the compelling appeal of Dedupe

Reduced Storage Capacities


Lower Infrastructure Costs
Improved SLAs
Efficient Replication for Business Continuance/DR

One of the top 10 Technology Considerations

Deduplication

59%
Very important

Deploying Deduplication

24%
In use

Copyright 2010 EMC Corporation. All rights reserved.

55%
Evaluating / In Near Long Term plan

21%
Not in Plan

- Source: TheInfoPro Wave 11 Storage Study, 2008

Why Do Enterprises Still Use Tape?


Primary
Storage

Low upfront cost


DISK
TAPE

Tape can store the massive


amount of redundant data
created by backups
Transportable for offsite DR

Backup
Storage
5x-10x
Primary

Copyright 2010 EMC Corporation. All rights reserved.

EMC Data Domain:


Leadership and Innovation
Deduplication storage systems
More than 12,000 systems installed
More than 4,300 customers
More than 2,600 PB under Data Domain protection worldwide

A history of industry firsts


2003

2004

2005

First Deduplication
NAS

2006

2007

First Deduplication
Virtual Tape Library

First Deduplication
Volume Replication

2008

Largest
Deduplication
Array

2010

Fastest Backup
Controller

First
Deduplication
Encryption

First Deduplication
Directory Replication
First Deduplication
Nearline Storage

Copyright 2010 EMC Corporation. All rights reserved.

2009

Cascaded
Replication
First Distributed
Processing

Data Domain works with what you have

Backup

Archive
Database

VMware

Copyright 2010 EMC Corporation. All rights reserved.

De-duplication principles

Unique segments (4KB-12KB) varies on-the-fly


7

Copyright 2010 EMC Corporation. All rights reserved.

Confidential
7

De-duplication principles

Unique segments (4KB-12KB) varies on-the-fly


8

Copyright 2010 EMC Corporation. All rights reserved.

Confidential
8

Data Deduplication: Technology Overview


Store more backups in a smaller footprint

Friday Full Backup

A B C D A E F G
Mon Incremental

Tues Incremental

Weds Incremental

Thurs Incremental

Logical

Estimated Physical
Reduction

FRIDAY FULL

1 TB

24x

250 GB

Monday Incremental

100 GB

710x

10 GB

Tuesday Incremental

100 GB

710x

10 GB

Wednesday Incremental 100 GB

710x

10 GB

Thursday Incremental

100 GB

710x

10 GB

Second FRIDAY FULL

1 TB

5060x

18 GB

2.4 TB

7.8x

308 GB

Backup
Data

Second Friday Full Backup

B C D E

L G H

A BCDE FGH I J K L
Copyright 2010 EMC Corporation. All rights reserved.

TOTAL

Deduplication Dramatically Reduces Storage


Capacity Requirements
Deduplication
1030 times less data stored versus fulls + incrementals with typical retention policies

Data Stored

30

20

10

0
1

10

15

20

Weeks in Use
Deduplication storage
Traditional storage
Copyright 2010 EMC Corporation. All rights reserved.

10

Data Domain Scale


s
m
te ion
Data Domain SISL Scalable Architecture: CPU-Centric
s
Sy icat
r
l
5
lle dup
r
o
r
Fo
e
t
g
s
on a l D
in em
C
s
b
ti
es yst
c
ul Glo
S
o
M th
Pr ller
i
d
w
3
t e tr o
u
n
ri b - co
t
s
e
Di ngl
Si

1.5

DD880, 7/09
Industrys Fastest
Backup Storage Controller

Throughput
GB/sec.

0.04

2011 (est.)

6-Year Improvement
Throughput: ~90x
Capacity: ~225x

DD200 (2004)
1.25

70

Addressable Capacity in TB

>PB

Post-RAID (Physical)
Copyright 2010 EMC Corporation. All rights reserved.

11

Inline vs Post-Process Deduplication:


Provisioning & Admin
Post Process:
Deduplication After Storing

Inline:
Deduplication Before Storing

At least 3x disk accesses to


shared store

Store

Dedupe

Dedupe

Replicate

Replicate?

Restore
Updedupe?

Process contention increases with


#processes
Copy to tape: Too slow to stream tape
Recovery: SLA predictability
Replication: Poor time-to-DR
Deduplication itself if interleaved with backup or
restore

Restore

Other activities unimpeded


Predictable
Simpler

More admin needed to fight these issues


Copyright 2010 EMC Corporation. All rights reserved.

12

Data Integrity: Data Invulnerability Architecture


Trust but verifyhope is not a strategy
Data verification
Checksum
Deduplication, write to disk
Verify

Generate
Checksum

Verify
Data

File System
Global Compression

Self-healing file system


Cleaning
Expired data
Defrag
Verify

Local Compression
RAID

Verify the file system


metadata integrity

Verify user data


integrity

Verify stripe integrity

Other
RAID 6
NVRAM
Snapshots

Copyright 2010 EMC Corporation. All rights reserved.

13

Network-Efficient Replication for True


Disaster Recovery

Lowers WAN costs; improves service level agreements


Flexible replication
15%
DB

Data Domain system

One-to-many
Many-to-one
Bi-directional
System-tosystem
Cascaded

DIR A

Home

Archive data
WAN
Backup data

Data Domain system

15%

15%
Home

Data Domain system

Source:
Remote sites

9599% cross-site bandwidth reduction

Data Domain DDX Array


with DD880s

Destination:
Data Center Hub
Supports hundreds
of remote sites

Copyright 2010 EMC Corporation. All rights reserved.

14

Industrys Most Scalable Inline Deduplication


Systems
New
Global Deduplication Array
DD880

DD600
Appliance Series

Software options:
DD Boost, DD Virtual Tape Library,
DD Replicator, Retention Lock,
and DD Encryption

DD140 Remote Office


Appliance

DDX Array Series


Up to 16 Controllers

Global
Deduplication
Array

DD140

DD610

DD630

DD660

DD690

DD880

450 GB/hr

675 GB/hr

1.1 TB/hr

2.0 TB/hr

2.7 TB/hr

5.4 TB/hr

Speed (DD Boost) 490 GB/hr

1.3 TB/hr

2.1 TB/hr

2.7 TB/hr

3.9 TB/hr

8.8 TB/hr

12.8 TB/hr

140 TB/hr

Logical capacity

1743 TB

75195 TB

165420 TB

.5201.31 PB .7101.7 PB

2.87.1 PB

5.714 .2 PB

45.6114 PB

Raw capacity

1.5 TB

Up to 6 TB

Up to 12 TB

Up to 36 TB

Up to 192 TB

Up to 384 TB

Up to 3.07 PB

Usable capacity

0.86 TB

Up to 3.98 TB Up to 8.4 TB

Up to 285 TB

Up to 2.28 PB

Speed (Other)

Copyright 2010 EMC Corporation. All rights reserved.

Up to 48 TB

Up to 26.1 TB Up to 35.3 TB Up to 142.5 TB

DDX Array
86.4. TB/hr

15

Why Data Domain?


Less disk to resource, less to manage
CPU-centric deduplication
Inline
Green

Simple, mature, and flexible


Simple, mature appliance
Nearline tier: any fabric, any software, backup or nearline
applications

Resilience and disaster recovery


Storage of last resort
Cross-site global compression: data center or remote office

Copyright 2010 EMC Corporation. All rights reserved.

16

You might also like