Professional Documents
Culture Documents
June 2009
Abstract: As IT organizations are increasingly faced with more stringent SLAs, data replication has
become a common component of many companies’ data protection plans. This white paper defines six
types of replication and looks at the advantages and drawbacks of each. It also provides an overview of
factors (such as protocol and total cost of ownership) that should be taken into account when developing
a replication strategy. This white paper is intended to provide a technical overview as well as high level
considerations for IT professionals who are considering augmenting their infrastructure with a replication
strategy.
Introduction .......................................................................................................................... 1
Replication as part of data protection ......................................................................... 1
Replication overview ................................................................................................... 1
Types of replication ............................................................................................................... 1
Methods by category .................................................................................................. 1
Replication modes................................................................................................................. 3
Several options available ............................................................................................ 3
Synchronous replication ............................................................................................. 4
Asynchronous replication............................................................................................ 5
Semisynchronous replication ...................................................................................... 6
TCO ....................................................................................................................................... 7
Several components play a role ................................................................................. 7
Connectivity ........................................................................................................................... 8
Connection type .......................................................................................................... 8
Considerations for assessing service providers ......................................................... 8
Protocol ...................................................................................................................... 9
Data encapsulation ..................................................................................................... 9
Remote connection options ................................................................................................ 10
Dual data center configurations ................................................................................ 10
Multiple data center configurations ........................................................................... 10
Methods of replication ......................................................................................................... 11
Weighing the pros and cons ..................................................................................... 11
Methods of replication: Controller-based ........................................................................... 11
NetApp® Snap-based replication .............................................................................. 11
EMC SRDF replication ............................................................................................ 11
Hitachi Data Systems Universal Replicator .............................................................. 12
Methods of replication: Fabric-based ................................................................................. 12
Methods of replication: Server-based ................................................................................ 13
Overview ................................................................................................................... 13
Market leaders .......................................................................................................... 13
Methods of replication: Application-based .......................................................................... 14
Overview ................................................................................................................... 14
Oracle® application ................................................................................................... 14
DB2 application......................................................................................................... 14
SQL application ........................................................................................................ 15
Clustering impact on replication .......................................................................................... 15
Overview ................................................................................................................... 15
Veritas Cluster Server™ ......................................................................................................................................15
Oracle RAC application clustering ........................................................................... 16
IBM Cluster Systems Management™ ........................................................................................................16
Methods of replication: Logical volume managers or SAN ................................................. 17
Overview ................................................................................................................... 17
Symantec Storage Foundation ................................................................................. 17
IBM High Availability Cluster Multiprocessing (HACMP) replication using LVM ...... 17
Linux Logical Volume Manager ................................................................................ 17
Methods of replication: Tape cloning .................................................................................. 18
Symantec Veritas NetBackup and EMC NetWorker ................................................ 18
Methods of replication: Virtual tape .................................................................................... 18
Data Domain VTL Replication Technology .............................................................. 18
Summary ............................................................................................................................. 18
Datalink can play a role ............................................................................................ 20
A Detailed Look at Data Replication Options for Disaster Recovery Planning
Introduction
Replication as part of data protection
For most companies, the difference between a solid data protection plan and
Replication technol-
a weak data protection plan can equate to hours of lost productivity, system
downtime, lost data, disenchanted customers, decreased revenue…the list
ogy can be used
goes on. For this very reason, prudent organizations invest the resources
necessary to ensure their IT infrastructure is built to withstand scenarios that
to make both local
range from small blips of downtime to natural disasters. Before computers,
this perhaps meant keeping all important files in a fireproof area under lock
and remote copies.
and key. In today’s world, it often involves replication of data.
Replication overview
The definition of replication is to make an exact copy of the original.
Replication technology can be used to make both local and remote copies.
For this paper, we will focus on the creation of remote copies. Copying data
from one location to a remote location can be achieved in a number of ways.
Types of replication
Methods by category
Replication of data can be achieved with one or more of these six methods:
Replication modes
Several options available
Replication implemented at the array, network, or host-level works in one of
three modes: synchronous (in database terms most like a two-phase commit);
asynchronous (much like a middleware product that leverages a queuing
engine); or semisynchronous (a combination of synchronous and asynchro-
nous replication). With each method offering various advantages and disad-
vantages, organizations will ultimately need to look at the type of data that is
being replicated, as well as many other factors, when they select the mode of
replication.
Synchronous replication
In a synchronous replication environment, data must be written to the target
device before the write operation completes on the primary device. This
assures the highest possible level of data currency for the target device. At
When considering
any point in time, the target will have the exact same data as the source.
a synchronous rep-
Synchronous replication, however, can introduce substantial performance lication model, an
delays on the source device, particularly if the network connection between organization must
the sites is slow or unavailable. Some solutions combine synchronous and
look at the rate
asynchronous operations, switching to asynchronous replication dynamically
of data change,
when the performance slows, and then reverting to synchronous when the
communication problems are resolved. available band-
width, and distance
When considering a synchronous replication model, an organization must between sites.
look at the rate of data change, available bandwidth, and distance between
sites.
In most cases, the trade-off for the increased performance of a hardware- An asynchronous
based replication strategy comes in the requirement to lock into a proprietary
replication archi-
replication strategy between like arrays or the fabric. For example, a hard-
ware-based replication solution from EMC® will require EMC arrays at each tecture is not as
site. That said, some storage virtualization products, such as Hitachi Data consistent as a
Systems® Universal Replicator, do not necessarily lock in a proprietary rep-
synchronous envi-
lication strategy between like arrays or the fabric. Instead, they can provide
replication services to unlike arrays. ronment; however,
it can span the
Given that hardware-based replication most often leverages Fibre Channel
(unless a translation device is used), hardware connectivity requirements earth.
for replication between arrays over distance involves investment in separate
infrastructure to support replication on the WAN. This is often referred to as
channel extension.
Asynchronous replication
With asynchronous replication, the source system does not wait for a con-
firmation from the target systems before proceeding. Data is queued and
batches of changes are sent between the systems during periods of network
availability. Server-based replication tools from vendors such as Symantec™
(Veritas™), Fujitsu, EMC and Topio support asynchronous replication by
using a journaling engine as a queuing mechanism to maintain data integrity
and ensure data is written to the target in the same order it was written on the
source. Some controller- and fabric-based replication applications or appli-
ances support asynchronous replication as well.
The key factors that affect synchronous models of replication also affect
asynchronous models, but—because of the queuing engine—not quite in
the same manner. In most cases, the queuing engine is bound by the avail-
able memory to which it has access. The more memory the larger the queue;
therefore, rate of change, available bandwidth and latency do not present as
much of a concern. However, if these factors impact the queue in a manner
where the queue can not fully empty, the queue itself could become a source
of contention, which might require a broader pipe or a closer site.
Semisynchronous replication
Organizations may use literally hundreds of applications, each having differ-
ent levels of criticality, performance implications, etc. As a result, it may be
difficult to meet all the specific needs with just one method of replication,
be it synchronous or asynchronous. In these cases, it may be necessary to
combine these two replication types for semisync replication. This hybrid
approach allows an organization to use the same pipe for different types of
replication, thereby allowing for customization to meet the unique demands
of the business while at the same time capitalizing on efficiencies.
Async–This allows the target site to be one write behind the pri-
mary site, which eases latency somewhat.
Connectivity
Connection type
Once organizations have designed their replication solution and identified It’s important to
an appropriate disaster recovery site, they can then determine the connection
type between the facilities and select a bandwidth provider or service pro- note that band-
vider. width providers sell
circuits; whereas
The bandwidth or service provider responsible for maintaining the connec-
service providers
tion between the two sites must have an understanding of the organization’s
requirements for connectivity and Quality of Service (QOS). After all, a are in the business
two-hour recovery SLA will produce different requirements than a two- of meeting client
minute recovery SLA. In clearly articulating its needs, an organization can needs for connec-
achieve the optimal solution for the situation. A DR site is of no value if the
tivity.
links between the two locations are intermittent or susceptible to failure. The
SLA may go as far as to understand the physical routing of redundant links.
Redundant links offer little value if a single event can disrupt both links
simultaneously.
Data encapsulation
Burying Fibre Channel cable underground to move data over a great distance
would be extremely costly. Meanwhile, a TCP/IP pipe can be leased rather
economically. Conversion devices (such as a Brocade switch or Cisco blades
in a director) convert the Fibre Channel protocol to TCP/IP protocol and
allow organizations to take advantage of the lower cost transport method.
In determining the connection type for TCP/IP protocol, the two most com-
mon transport mechanisms are Fibre Channel Internet Protocol (FCIP) and
Internet Small Computer System Interface (iSCSI).
iSCSI allows two hosts to negotiate and then exchange SCSI commands
using IP networks. By doing this, iSCSI takes a popular high-performance
local storage bus and emulates it over wide-area networks. Unlike some SAN
protocols, iSCSI rquires no dedicated cabling; it can be run over existing
switching and IP infrastructure.
Remote Copy
Two leading technologies in this area are FalconStor IPStor Replication and
IBM® SAN Volume Controller.
Figure 4: Replication
happens at the fabric
layer, between the
server and storage.
2) When choosing a custom target path, it is important to not select any path
with data that should be saved. Replication will, in most cases, overwrite/
delete all data in the selected directory.
Market leaders
The leading technology for server-based replication in the open systems
space is Veritas Volume Replication™ from Symantec. In the mainframe
space, the market leader is Softek®.
cation features.
Oracle® application
Oracle Data Guard can provide replication services controlled by Oracle
either synchronously or asynchronously.
DB2 application
IBM® DB2 Universal Database version 8.2 offers a high availability disaster
recovery (HADR) feature. HADR is a replication that takes place at the data-
base level. The HADR requires two active systems: a primary and a standby.
All database transactions take place in the primary system. The transaction
log entries are continuously shipped to the standby machine via TCP/IP. The
standby system receives and stores log entries from the primary system and
applies the transactions to the database. If the primary fails, the standby can
take over the transactional workload and become the new primary machine.
SQL application
Microsoft® SQL Server supports three types of database replication. They are:
- Snapshot replication. This type of replication acts in the manner its name
implies. The publisher simply takes a snapshot of the entire replicated Clustering indirectly
database and shares it with the subscribers. Of course, this is a very time- affects how replica-
and resource-intensive process. For this reason, most administrators don’t tion performs from
use snapshot replication on a recurring basis for databases that change
frequently. Snapshot replication is commonly used for two scenarios: 1) the application
for databases that rarely change; and 2) to set a baseline to establish rep- viewpoint by mak-
lication between systems while future updates are propagated using trans- ing the application
actional or merge replication.
aware that data
- Transactional replication. This replication offers a more flexible solution has been repli-
for databases that change on a regular basis. With transactional replica- cated.
tion, the replication agent monitors the publisher for changes to the data-
base and transmits those changes to the subscribers. This transmission can
take place immediately or on a periodic basis.
Most cluster solutions require TCP/IP connectivity for heartbeat and locking
for control resources. If an organization wants to stretch the cluster between
locations so that all of its applications transparently fail over, the distance
requirements for synchronous replication apply. If clustering is completed
over long distances with synchronous replication between sites, global clus- Most cluster solu-
tering (whether it is OS-based clustering or a third party solution like VCS)
will not maintain a reliable and consistent connection because of latency. tions require TCP/
VCS allows companies to link multiple independent high availability clusters IP connectivity
at multiple sites into a single, highly available disaster recovery framework.
for heartbeat and
It also enables administrators to monitor, manage, and report on multiple
Veritas clusters on different platforms from a single web-based console. VCS locking for control
is supported on Solaris™, HP-UX, AIX®, Linux, VMWare®, and Windows®
operating system platforms. resources.
Oracle RAC allows an Oracle database to run any package or custom appli-
cation, unchanged across a set of clustered servers. This provides the highest
levels of availability and the most flexible scalability. If a clustered server
fails, Oracle continues running on the remaining servers. And when more
processing is required, an organization can add more processing power with-
out an outage.
With LVM2, organizations no longer have to worry about how much space
each partition will contain. LVM2 also offers enhanced resilience (e.g., new
transaction metadata format); user configurability (e.g., device name filters,
metadata backup and archive paths, and metadata archive history); new out-
put display tools; and many performance optimizations such as asynchronous
snapshots as replication services.
Summary
Primary considerations for DR
3. A key step in developing a disaster recovery plan is to identify criti- ity and disaster
cal business processes. Once the core processes have been identified,
the applications, databases, and supporting infrastructure should be recovery planning
identified and classified. Normally, replication is reserved for revenue
generating or otherwise mission critical applications because the TCO require executive
associated with replication can be costly.
4. Once an organization has identified the core applications, all applica- commitment.
tion dependencies from a data source perspective should be analyzed.
As a leading information storage architect, Datalink analyzes, designs, imple- For more information,
ments, and supports information storage infrastructures. Our specialized
capabilities and solutions span storage area networks, networked-attached contact Datalink at
storage, direct-attached storage, and IP-based storage, using industry-leading
hardware, software, and technical services. (800) 448-6314.