You are on page 1of 34

08_137_170_DR1 4/2/02 2:52 PM Page 137

PA RT
2
INFORMATION
TECHNOLOGIES FOR
DISASTER RECOVERY

T
HIS PART DELVES INTO INFORMATION HANDLING AND PRO-
CESSING TECHNOLOGIES AND HOW THEY CAN BE USED TO
TUNE DISASTER RECOVERABILITY TO THE NEEDS OF THE
ENTERPRISE. THE CHAPTERS OF THIS PART FOLLOW A PROGRES-
SION FROM LONGER RECOVERY TIMES (RELYING PRIMARILY ON
OFFLINE DATA PROTECTION AND RECOVERY) TO SHORTER ONES
(RELYING ON NEAR-REAL-TIME REPLICATION OF DATA AND DATA
MANAGER AND APPLICATION CLUSTERING).

137
08_137_170_DR1 4/2/02 2:52 PM Page 138
08_137_170_DR1 4/2/02 2:52 PM Page 139

CHAPTER
8
Backup and Disaster Recovery
Never underestimate the bandwidth of a station wagon full of tapes
hurtling down the highway.
Andrew Tannenbaum

PROTECTING ENTERPRISE DATA


As a primary enterprise asset, online data must be protected from
loss or destruction, no matter what happens. Enterprises protect
their data so that:

Operations can resume as quickly as possible after a server,


application, storage device, or software failure, operational
error, or site disaster
Data can be moved to where its needed, when its needed by
the business
Regulatory and business policy record retention requirements
can be met

The foundation of a data protection strategyespecially one


that is expected to assist in disaster recoveryis backup. This
chapter discusses backup techniques that protect data against loss
due to failures and disasters.

The Essence of Data Protection


Data protection basically consists of making and manipulating
copies of critical data objects. For example:

Making backup and archival copies of databases and data


stored in les
139
08_137_170_DR1 4/2/02 2:52 PM Page 140

140 Part 2 Information Technologies for Disaster Recovery

Moving electronic archives from data centers to secure vaults


Replicating data from where it is generated to where it is used
Moving data from where it is used less to where it is used more

Underlying the seemingly simple task of copying data objects,


however, are signicant technical challenges:

Designing and implementing policies that allow data to get to


where it is needed when it is needed, even if failures or pro-
cedural errors occur
Keeping track of which copies of which les are at which loca-
tions (for example, which backups are on which tapes and
where those tapes are stored)
Guaranteeing the internal consistency of collections of data
objects as they are copied
Minimizing the information service outage time during which
data is unavailable to applications because it is being copied
Determining when changes in management policy would be
benecialfor example, when backups should be more fre-
quent, or when copies of product data or price lists should be
replicated at regional ofces to reduce network trafc

BACKUP: THE DATA PROTECTION FOUNDATION


Backup is central to any enterprise data protection architecture. A
backup is a copy of a dened set of data, ideally as it exists at a
point in time.1

The Goals of Backup


The goals of enterprise backup are:

Enable information services to resume as quickly as possible


after a failure, disaster, or application error
Enable data to be moved quickly and easily to where its needed

1. There are fuzzy le and database backup techniques that create copies of
changing data with limited currency and consistency guarantees. These can be
used to restore databases after certain failures; however, they have limited use as
durable business records.
08_137_170_DR1 4/2/02 2:52 PM Page 141

Chapter 8 Backup and Disaster Recovery 141

Enable regulatory and business policy record retention


requirements to be met

The Enterprise in Enterprise Backup


For a personal computer user, backup typically means making a
copy of the data on the computers hard disk drive onto a tape or
CD ROM. Personal backup media are often labeled by hand and
managed by storing them in a drawer or cabinet located in the
room with the computer.
In the enterprise, data protection is far more complex. Enter-
prise backup must be able to:

Make copies of precisely dened collections of data, whether


organized as les, databases, or the contents of logical volumes
or disks
Manage the backup media that contain these copies so that any
backup copy of any data object can be quickly and reliably
located when required, and so that large collections of media
can be tracked accurately
Provide mechanisms to duplicate sets of backed up data so
that it can be taken off site for archival or disaster protection
purposes
Track the location of all copies of all data objects accurately

Complexity in Enterprise Backup


These functions of enterprise tape backup may seem straightfor-
ward. But implementing a backup strategy that meets these
requirements in a large enterprise can be a complex undertaking.
When a backup strategy is designed or updated, complexity can
arise for several reasons:

Data Organization and Classification. For a backup to be


useful in recovering from a failure or disaster, it must include
all the data that can be lost. In an enterprise with dozens or
hundreds of information services, some of which may share
data with others, dening the right sets of data objects to be
backed up together can be complex.
Tension between Resource Consumption and Information
Service Availability. Backup frequency is essentially a
08_137_170_DR1 4/2/02 2:52 PM Page 142

142 Part 2 Information Technologies for Disaster Recovery

trade-off between resources (network and I/O bandwidth,


processor capacity, tape and library hardware, and application
access) and the need for currency. With many information ser-
vices needing data protection, nding the right balance between
backup frequency and resource consumption can be difcult.
Platforms and Data Managers. Enterprises with many infor-
mation services are likely to use multiple data managers (le
systems and database management systems), each with its own
mechanisms for backing up data objects that it recognizes.
Integrating these mechanisms into a schedule that provides a
consistent backup of all required data for a service, and keep-
ing them up to date as the service changes, is a nontrivial task.
Technology Choices. Continuous application availability is
increasingly required in the enterprise era. A variety of mech-
anisms exist that enable consistent backups with minimal appli-
cation downtime. Choosing among these and implementing
the choice can be a complex task.
Business Constraints. Business and regulatory requirements
can result in multiyear data retention requirements. Enter-
prises can nd themselves responsible for maintaining backups
and archives on tens or even hundreds of thousands of media.
Procedures for managing large numbers of media can be com-
plex in themselves.
Geography. Business considerations may require that servers
and data be in multiple widely separated locations. Maintain-
ing a consistent set of backup procedures across multiple data
centers can require extensive design or management talent.

An enterprise data backup strategy must consider all of these factors.

Backup Seems Simple . . .


Conceptually, backup is simple. A system administrator decides
which data are critical, denes a backup schedule that has minimal
effect on operations, and uses backup manager software to make
actual backups. The backups are stored in a safe place so they can
be used to recover from failures. Backups can be stored off site at
a distance from the data center to provide data recoverability after
a disaster. Conceptually, backup really is simple. The difculty is
in the details:
08_137_170_DR1 4/2/02 2:52 PM Page 143

Chapter 8 Backup and Disaster Recovery 143

Weight of Numbers. In large data centers, system adminis-


trators must back up data from many servers of different types.
This is a lot of work to do and manage; moreover, it requires
the development and maintenance of skills and experience
unique to each platform.
Reliable Execution. System administrators must ensure that
scheduled backups are actually performed. In a busy data cen-
ter, operational pressures can make this more difcult than it
sounds, because there is no business demand for backups unless
a mishap actually occurs. Busy system operators can neglect
backup if more immediate and pressing demands are made on
their time.
Media-Handling Errors. As an enterprise matures, it
inevitably accumulates a large collection of tapes or other
backup media. Particularly when handled by humans, backup
media can be abused, destroyed, lost, or overwritten.
Pressure to Perform. When loss of online data requires
restoration from a backup, the situation is always tense. When
seldom-exercised procedures to get applications back on line
are performed under pressure, it is easy to misread instruc-
tions, load the wrong media, or override safeguards, resulting
in excessive restore times, incorrect restores, or even complete
failure to recover data.

Thus, while backup is indeed fundamentally the process of


making and manipulating copies of important data objects, the
copying is the least challenging aspect of backup at the enterprise
level. Far more important are reliable scheduling, process automa-
tion, error handling, media management, and, above all, getting
the people out of the picture to the greatest extent possible.

Restore: The Reason for Enterprise Backup


The whole purpose of backup is to enable information services to
recover after a failure or disaster. Thus, the proper starting point
for dening an enterprise backup strategy is an assessment of
recovery requirements. If, for example, an order processing appli-
cation can tolerate an 8-hour outage without severe business con-
sequences, an incremental backup strategy that minimizes backup
time at the expense of restore time may be appropriate. For a Web
retail application, on the other hand, where every minute of
08_137_170_DR1 4/2/02 2:52 PM Page 144

144 Part 2 Information Technologies for Disaster Recovery

downtime means permanently lost sales, a strategy that replicates


data in real time is more appropriate, even though it has a greater
impact on application performance.

COMPONENTS OF ENTERPRISE BACKUP ARCHITECTURE


To understand enterprise backup technology, it is helpful to con-
sider the major functional elements of backup. Figure 81 illus-
trates major functional components of an enterprise backup
architecture.

Backup Clients (often simply called clients). These are any


computers with data to back up. The terminology can be con-
fusing, because enterprise backup clients are typically applica-
tion, database, or le servers. The term backup client is also
used to denote the software component that reads data from
online storage devices and sends it to a backup server.
Backup Servers (often simply called servers). These are com-
puter systems that copy data to backup media and maintain
historical information. Some enterprise backup managers dis-
tinguish between two types of backup servers:
Master Backup Servers. These backup servers schedule
backup and restore jobs and maintain catalogs that
describe what data is stored on what media. The software
component that performs these functions is often called
the backup manager.
Media Servers. These backup servers copy data to backup
media at the direction of a master backup server. Backup
storage units are connected to media servers.
Backup Storage Units. These are the tapes, magnetic disks,
or optical disks controlled by a media server.2

A backup is the result of a three-way cooperation among a master


backup server, a backup client, and a media server:

2. Throughout this chapter, the term tape is used generically to refer to any of
the various recording media that can be used to store data off line, because tape
is by far the most frequently occurring medium in computing today.
08_137_170_DR1 4/2/02 2:52 PM Page 145

Chapter 8 Backup and Disaster Recovery 145

FIGURE 81 Functional Components of Enterprise Backup

Backup Client Master Backup Backup


(Application Server) Backup Server Media Server Storage Unit

File System

File Daily
Backup Schedule Backup Backup
File Agent Scheduler Engine
Weekly
Schedule
File
Quarterly
Free Space Schedule

Command to start backup


at scheduled time

Backup data flow

A master backup server initiates and monitors backup jobs


according to a predened backup schedule. The master server
chooses a media server for each job based on a combination of
predened policies and current conditions.
The client whose data is to be backed up runs the job, sending
data from its online volumes to the designated media server,
and a list of the les actually backed up to the master server.
The media server chooses one or more backup storage units,
selects and loads media, receives client data on the network,
and writes it to the media.

Similarly, to restore data from a backup:

A client makes a request to the master server for data from a


particular backup to be restored
The master backup server identies the media server that con-
trols the requested backup and directs it to perform the restore
The media server locates and mounts the backup media con-
taining the data to be restored (possibly with human assis-
tance), and sends the data to the requesting backup client
The backup client receives data from the media server and
writes it to a local le system
08_137_170_DR1 4/2/02 2:52 PM Page 146

146 Part 2 Information Technologies for Disaster Recovery

Scaling Backup to the Enterprise


In small systems, all three backup functions typically run right in
the application server. The reason for this elaborate modular
backup architecture is so that each function can be moved to a spe-
cialized server as the operation grows or requirements change,
without disrupting dened backup procedures. Figure 82 illus-
trates enterprise backup scaling.
The advantages of a modular backup architecture become even
more apparent as an enterprise grows or when its information ser-
vices are distributed. Figure 83 illustrates how a scalable backup
architecture supports distributed enterprise information services.
Figure 83 illustrates the two major benets of a scalable backup
architecture:

Central Control. A master backup server maintains backup


schedules and data catalogs for the entire enterprise. One point
of control means that a single administrative team can manage
all backup operations. Of course, the single master backup
server should be a cluster so that a single computer failure
doesnt make it impossible for the enterprise to restore any of
its data. Moreover, for disaster recoverability, backup catalogs
should be replicated across a wide area.
Resource Scaling and Sharing. Media servers can be added
wherever and whenever they are required. Tape drives, espe-

FIGURE 82 A Scalable Backup Architecture

Tape
Drive
Backup Backup Backup Media
Client Client Manager Server
Media Growth
Server

Backup
Manager Tape
Media
At the outset: Library
All backup functions are performed
by the application/database server With growth:
Backup functions migrate to specialized servers as
performance or other operational needs dictate
08_137_170_DR1 4/2/02 2:52 PM Page 147

Chapter 8 Backup and Disaster Recovery 147

cially in combination with robotic media libraries, are expen-


sive resources with low duty cycles. Sharing them among sev-
eral application servers is extremely attractive economically.

A distributed architecture such as Figure 83 illustrates minimizes


administrative cost and makes optimal use of expensive hardware
resources, but it comes at a cost in enterprise network trafc.
There are several techniques that minimize the impact of backup
on online operations, but there will inevitably be occasions on
which large amounts of data must be transferred from backup
clients to backup servers at inopportune times. An enterprise
designing a backup architecture for distributed data centers must
evaluate the network trafc impact of distributed backup, as illus-
trated in Figure 83, and decide among:

Sharing the enterprise network between application and backup


trafc
A private backup network for host-based backup
Using a storage area network (SAN) for backup trafc
Local backup with media servers directly attached to applica-
tion servers

FIGURE 83 Large Enterprise Backup Architecture

Backup Commands
Backup
Client
Media
Server
Backup
Backup Manager
Client

Media
Backup Server
Client

Master Backup Server


Media Servers

Application Servers
(Backup Clients) Backup Data flow
08_137_170_DR1 4/2/02 2:52 PM Page 148

148 Part 2 Information Technologies for Disaster Recovery

ENTERPRISE BACKUP POLICIES


The data required to operate an enterprise must be backed up. But
because backup is a very resource-intensive operation, there is a
natural desire to minimize its impact on information services. Sys-
tem administrators express the balance between these two con-
icting objectives in backup policies. A backup policy is a set of
rules that species:

What data is to be backed up


When the data should be backed up
Where the data should be backed up

The following sections describe how enterprise backup man-


agers typically execute the backup policies specied by system
administrators.

What Data to Back Up


Deciding what data to back up requires knowledge of both enter-
prise business policies and computer system operations. Effective
backup policies distinguish between seldom changing data and fre-
quently changing data, and back up the former less frequently than
the latter.
Data to be backed up can be specied as a list of les. For large
or particularly active le systems, it is usually more appropriate to
specify that the entire contents of one or more directory trees be
backed up. This makes it unnecessary to reect le additions and
deletions in backup policy specications.
Backup specications can be even more complex. It is also
sometimes useful to express backup policies using exception lists
that cause named lists of les or directories to be excluded from
backup data specications.

When to Back Up
Deciding when to back up also requires both enterprise business
policy and system operations knowledge. System administrators
must balance between acceptable maximum backup age (which
determines the worst-case number of hours of data updates that
08_137_170_DR1 4/2/02 2:52 PM Page 149

Chapter 8 Backup and Disaster Recovery 149

must be recreated by other means) and the impact of backup


resource consumption on their information services.
If resources were not a consideration, the obvious backup pol-
icy would be to back up all online data constantlyto copy each
le in its entirety each time it changed. Resources are a consider-
ation, however. Constant backup would consume overwhelming
processing, I/O, and network capacity, as well as large amounts of
storage and catalog space, adversely affecting both cost and online
application performance. Backups are therefore usually scheduled
for minimal impact on information services. For information ser-
vices with predictable busy and idle periods, backups are easy to
schedule. Often, however, there are no predictable idle periods.
Ways must be found to minimize backup resource consumption so
backup can coexist with online information services.

Where to Back Up Data


On the surface, where to back up data appears to be a simple ques-
tion. The backup client is the data source. The destination is one
of (possibly several) media servers. The choice of media server
might differ depending on business cycle, equipment availability,
or other considerations. Typically, master server software keeps
track of executing backup jobs on each client, and chooses media
servers dynamically based on eligibility, relative loading, and
backup device availability.
With enterprise backup managers, the backup device(s) for a par-
ticular job are generally chosen dynamically by the designated media
server, using policy guidelines set by the system administrator.
Backup media are managed similarly. Enterprise backup man-
agers organize media in pools, each associated with one or more
scheduled backup jobs. Media servers typically choose available
media from a jobs pool using an algorithm designed to equalize
media usage (and therefore wear). Media managers also maintain
media cleaning and retirement schedules, and keep track of medias
physical location.

Backup Policies
All of the following parameters are typically bundled into an
abstraction called a backup policy:
08_137_170_DR1 4/2/02 2:52 PM Page 150

150 Part 2 Information Technologies for Disaster Recovery

Backup clients
File and directory lists
Eligible media servers, media types and pools, and device groups
Scheduling information

Backup policies typically also include attributes, such as priority


relative to other policies. Master servers manage an enterprises
backup policies, cooperating with backup clients and media servers
to initiate, monitor, and log scheduled backups.

INCREMENTAL BACKUP
Full and Incremental Backup
In most enterprise information services, only a small fraction of
online data changes between successive backups. In le-based sys-
tems, only a small percentage of the les change. Incremental
backup techniques make use of this fact to minimize backup
resource requirements. An incremental backup is a copy of only
the les changed since the preceding backup. A backup client uses
le system metadata to determine which les have changed, and
copies only those. Figure 84 illustrates the difference between
full and incremental backup.
Incremental backup augments rather than replaces full backup.
An incremental backup contains les that have changed since some
point in time for which a full backup exists. To restore a set of les
from incremental backups, a full backup must rst be restored to
establish a baseline. Incremental backups are then restored in age
order (oldest rst), replacing changed les in the baseline. Incre-
mental backup reduces the frequency with which time-consuming
full backups must be performed.

The Impact of Incremental Backup


If only a small percentage of the les in a large le system have
changed since the last backup, only a small percentage of the data
must be backed up. Incremental backups typically complete much
faster (by orders of magnitude), and consequently have signicantly
less impact on online information services than do full backups.
08_137_170_DR1 4/2/02 2:52 PM Page 151

Chapter 8 Backup and Disaster Recovery 151

FIGURE 84 Full and Incremental Backup

File H
File B
File System File System

File A File A
File A
File B File B
File B File B
File C File C
File C File H
File D File D
File D
File E File E
File E
File F File F
File F
File G File G
File G
File H File H Backup copy only
File H
File I File I contains files
File I changed since
File J File J
File J last backup

Full Backup (baseline) Incremental Backup (changes to baseline)

Enterprise backup managers typically maintain online catalogs


that list the location of each version of each backed up le. Restor-
ing an individual le is therefore about the same whether or not
incremental backups are in usethe tape containing the le must
be located and mounted, and the le read and copied.
Restoring an entire le system from incremental backups after
a disaster is more complex, however. The baseline full backup must
rst be restored, followed by restoration of all newer incremental
backups in order of age. Although enterprise backup managers typ-
ically guide administrators through the correct sequence of media
mounts, the full and incremental restore process may involve more
human decision making and media handling than is really desir-
able. Figure 85 illustrates the restoration of an entire le system
from full and incremental backups.
Installations typically schedule relatively infrequent (e.g.,
weekly) full backups at times when information service activity lev-
els are low (e.g., weekends), with more frequent (e.g., daily) incre-
mental backups. This results in lower impact than a pure full
backup policy, because only a few data are copied during the daily
incremental backups. Restore times, however, are necessarily
longer, and involve more media handling.
08_137_170_DR1 4/2/02 2:52 PM Page 152

152 Part 2 Information Technologies for Disaster Recovery

FIGURE 85 Restoring a File System from Full and Incremental Backups

Newest incremental
backup restored last

File E
File J Result: the most up-to-date
File A data possible from backups
File B
File C File System File System
File D
File E File A Result
File F File F
File A File A
File G File B File B
File H
File I File C File C
Newest update
File J File D File D
File C to File J is
File E File J File E online when all
Step File F File F incremental
1 backups have
File G File G
been restored
Newest File H File H
full File I Step File I
backup 2
File J File J

Baseline for
incremental Oldest incremental
restores backup restored first

Different Types of Incremental Backup


There are two distinct types of incremental backup. A differen-
tial backup contains copies of all les modied since the last
backup of any kind. Thus, with a policy of weekly full backups and
daily differential backups, a le system restore is accomplished by
restoring the newest full backup, and then each of the newer dif-
ferential backups in age order. The later in the week, the more
incremental restores are required, and therefore the longer a full
restore would take.
A cumulative backup is a copy of all les modied since the
last full backup. Restoring a le system from cumulative backups
requires only the newest full backup and the newest cumulative
backup. Restoring le systems is simpler and faster, but at a cost
of lengthening backup times as the time since the last full backup
increases (see Figure 86).
08_137_170_DR1 4/2/02 2:52 PM Page 153

Chapter 8 Backup and Disaster Recovery 153

FIGURE 86 Restoring a File System from Differential and Cumulative Backups

Both restores start by


File A restoring the nearest File A
File B full backup File B Newest
File C File C cumulative
File D File D backup
File E File E File E
File F File J File System File F File System
File G File G
File H File H File E
File I File A File I File A File A
File J File B File J File F File B
File C
File A File C File J File C
File F File D File D
File E File E
File F File F
File G File G
File C
File H File H
File J
File I File I
File J Restored file systems File J
are identical

All differential backups made


since newest full backup

Full, cumulative, and differential backups can be combined to


balance the impact of backup on operations against the time
required to restore a full le system or database. Table 81 illus-
trates a backup schedule in which full, differential, and cumulative
backups combine to balance backup time and restore complexity.
In this scenario, the largest number of backups required to do an
up-to-date restore is ve (for restores to the point in time after
the Saturday differential backup was made).
Enterprise backup managers typically allow system adminis-
trators to dene automatic backup schedules similar to that illus-
trated in Table 81. With robotic tape libraries, scheduled backups
can be completely automated. No system administrator or com-
puter operator action is required once a backup policy has been
dened.
08_137_170_DR1 4/2/02 2:52 PM Page 154

154 Part 2 Information Technologies for Disaster Recovery

TABLE 8 1 A Sample Weekly Backup Strategy

Sunday Monday Tuesday Wednesday Thursday Friday Saturday


Type of Full backup Differential Differential Differential Cumulative Differential Differential
Backup Incremental Incremental Incremental Incremental Incremental Incremental

Data in Full Files Files Files Files Files Files


Backup database as changed changed changed changed changed changed
Copy it stood on since since since since since since
Sunday Sundays full Mondays Tuesdays Sundays full Thursdays Fridays
backup backup full backup backup backup backup

Full Restore Restore Restore Restore Restore Restore Restore


Database Sundays Sundays Sundays Sundays, Sundays Sundays Sundays
Restore backup backup and backup, and Mondays, backup and backup, backup,
Procedure Mondays Mondays Tuesdays, Thursdays Thursdays Thursdays
differential and and cumulative cumulative, cumulative,
Tuesdays Wednesdays backup and Fridays Fridays and
differentials differentials differential Saturdays
differentials

BACKING UP DATABASES
Database management systems are typically capable of producing
point-in-time database backups. The technology is similar to that
of le system snapshots. Database activity is halted momentarily
so that backup can be initiated, and then resumes. Each applica-
tion update while the backup is in progress causes a copy of the
updated objects prior contents to be saved. The backup program
reads these before images. All other programs read current data
object contents.
A backup made in this way represents the contents of the data-
base at the point in time at which the backup was initiated. This
technique, often called hot database backup, is well accepted and
widely used. Some enterprise backup managers are integrated with
database manager backup facilities so that hot database backups
can be scheduled as part of an overall enterprise backup strategy.
Hot database backup increases database I/O activity signicantly,
due both to the backup itself and to the storing of database object
before images.
08_137_170_DR1 4/2/02 2:52 PM Page 155

Chapter 8 Backup and Disaster Recovery 155

FIGURE 87 Using File System Snapshots with Databases

Database

Table A
"Before images" of Snapshot
Application updates changed data
Table B
Map
"Before images"
of changed
Table C blocks
Storage for
snapshot allocated from
file system free space
File System
Free Space

Snapshots and Database Backup


Some enterprise backup managers can also make consistent point-
in-time hot database backups with minimal overhead by copying
data from le system snapshots. Each snapshot presents a point-
in-time image of database data. Snapshots may either use the
copy-on-write technique or they may represent full mirrored
copies of the database volumes split from an online image. Fig-
ure 87 represents a copy-on-write snapshot.
Snapshots taken for the purpose of database backup should be
initiated at instants when no database transactions are in progress
and all cached data is also reected on disks. Snapshot creation
therefore begins by quiescing the database. When the database
is quiescent, a le system snapshot is initiated (which takes a few
seconds), and the database can be restarted for application use.
Snapshots almost (but not quite) eliminate the need for database
backup windows. Both full and incremental database backups can
be made from snapshots.
Some le systems can maintain multiple snapshots, as Fig-
ure 88 illustrates. While each snapshot uses both storage capac-
ity and I/O resources when data is updated, this facility gives a
database administrator exible backup choices. Moreover, some
integrated backup managers can write before images of changed
blocks from a snapshot back into the main database image to roll
08_137_170_DR1 4/2/02 2:52 PM Page 156

156 Part 2 Information Technologies for Disaster Recovery

FIGURE 88 Multiple Snapshots and Database Rollback

Snapshot T1

Map
"Before images"
of changed
Main database blocks since T1
image
Snapshot T2

Table A
Map
Application updates "Before images" of "Before images"
Table B changed data of changed
blocks since T2
Snapshot T3
Table C
T3 T2 T1
Map
"Before images"
of changed
blocks since T3
Database can be backed up as of
T1, T2, or T3 or rolled back to
state at any of T1, T2, or T3

back the database to its state at the time of the snapshot. This can
be useful, for example, if an application error that causes database
corruption is discovered only after running for a period of time.

Block-Level Incremental Backup


While it is extremely useful for le-based applications, incremen-
tal backup is of limited use with databases. A typical database stores
data in a small number of large container les, most or all of which
change frequently (albeit slightly) as the database is updated. Thus,
an incremental backup that copies each changed le in its entirety
is likely to include all of a databases container les, even if only a
minuscule fraction of the data in them has changed since the last
backup. Figure 89 illustrates incremental backup of a database.
A copy-on-write snapshot, however, identies exactly the
blocks of the database container les that have changed since the
snapshot was created. The snapshot itself contains the blocks prior
08_137_170_DR1 4/2/02 2:52 PM Page 157

Chapter 8 Backup and Disaster Recovery 157

FIGURE 89 Limitation of Incremental Backups with Databases

File System

Table A Table A

Application updates
Table B Table B

Table C Incremental backup Table C


copies an entire table
space if any of the
data in it has changed

contents. The main database, however, contains data that has


changed since the snapshot (at corresponding block addresses).
Some enterprise backup managers can use snapshot block addresses
to create a block-level incremental backup of a database, as Fig-
ure 810 illustrates.

FIGURE 810 Block-Level Incremental Backup Using No Data Snapshots

Block Snapshot
addresses of
changed data Changed block map
Map indicates which blocks
File System need to be backed up
Address 1
Address 2
Table A

Application updates
Changed
Table B blocks

Table C Backup
Manager
Data for backup is read
from database image
(not from snapshot)
08_137_170_DR1 4/2/02 2:52 PM Page 158

158 Part 2 Information Technologies for Disaster Recovery

A block-level incremental backup contains the contents of only


those database blocks modied since snapshot creation. If only a
small percentage of the database has been updated, the block-level
incremental backup is correspondingly small. Compared to a full
database backup, block-level incremental backups typically use
very little time and very little storage and I/O bandwidth.
Like le system incremental backups, block-level incremental
backups are relative to a baseline full backup. To restore an entire
database from block-level incremental backups, a full backup must
rst be restored, followed by restoration of all newer block-level
incremental backups in order of age.
By greatly reducing their impact, block-level incremental back-
up encourages database administrators to schedule backups more
frequently. Frequent backups not only reduce resource require-
ments (I/O and storage capacity), they also enable databases to be
restored to points in time that are closer to the instant of failure.
As with incremental le system backups, the reduced resource
requirements of block-level incremental backup come at a cost of
increased restore complexity. Again, enterprise backup managers
typically track block-level incremental backups and guide system
administrators through the incremental restore process.

ARCHIVES
Over time, an enterprises stock of historical data grows. Monthly,
quarterly, and annual closing reports, sales, production, shipping,
and service records, and other data must be retained, but generally
need not be on line. Such data can be archived. Functionally, archiv-
ing is identical to backup. Designated les are copied to backup
media on predened schedules and catalogued so that they can be
located later. Archiving differs from backup, however, in that once
an archive job has completed, the archived les are deleted from
disk storage, freeing the space they occupied for other use.
Figure 811 illustrates a le system in which database tables
occupy one directory, and monthly roll-up and report information
occupies another. The database directory is scheduled for regular
backup as discussed in preceding sections. Data in the monthly
roll-up directory is only of interest for a limited time, but must
be retained for regulatory or policy reasons. The directory con-
taining monthly roll-up data is therefore scheduled for regular
08_137_170_DR1 4/2/02 2:52 PM Page 159

Chapter 8 Backup and Disaster Recovery 159

FIGURE 811 Archives and Backups

Backup
Database

Table A Table A

Backup Month-end
Operational Data archive
Directory Table B Table B
Data remains on line
after backup copy
is made
Table C Table C Table D
(monthly roll-up)

Month-End Data Report File


Directory Table D
Online space is (monthly roll-up)
reclaimed for next Archive
month's results after Report File
archival copy is made

archiving. Once the les in it have been copied to archival media,


they are deleted, and the space they occupied is released, presum-
ably for the next months roll-up data. With robotic media
libraries, archiving can be automated, requiring no human inter-
vention unless exceptional conditions occur.

BACKUP MANAGER PERFORMANCE TACTICS


Multiplexed Backups
In an enterprise with distributed information services such as that
illustrated in Figure 83, several variables affect the speed with
which a backup job can be accomplished:

Client Load. An application server busy with other work may


prevent the backup client from accessing data fast enough to
keep the backup data path busy.
Network Load. A network busy with application trafc may
prevent a backup client from sending data fast enough to keep
a media server or tape drive busy.
08_137_170_DR1 4/2/02 2:52 PM Page 160

160 Part 2 Information Technologies for Disaster Recovery

Media Server Load. A media server may be too busy with


other backup jobs (or other work, if it is also an application
server) to keep tape drives busy.
Tape Drive Data Transfer Rate. Tape drive performance de-
grades signicantly if data is not written fast enough to keep the
drive streaming (i.e., tape in motion and writing data). A brief gap
in the data stream supplied to a tape drive can result in a much
larger interruption of data ow as the drive repositions itself.

Additionally, effective media utilization can be an important


backup consideration. High-capacity tape cartridges typically have
two to four times as much capacity as a disk. Frequent incremen-
tal backups may result in many small backup data sets, each occu-
pying a small fraction of a tapes capacity. Not only is media
underutilization costly; unnecessarily large libraries of media
increase the chance of handling errors.
To optimize tape drive performance (as well as minimize the
negative impact of performance variations along the backup data
path) and to promote effective media use, some enterprise backup
managers can multiplex, or interleave blocks of data from several
concurrent backup jobs on a single tape. Backup data stream mul-
tiplexing can compensate for slow client data feeds, busy networks,
and speed mismatches between network and tape drives. When
multiple backup streams are interleaved, each streams data blocks
are tagged with a job identier, so that they can be properly iden-
tied in the event of a restore.
With more data arriving at the backup server, tape streaming
increases, improving data-center-wide backup throughput. Since
data from several jobs goes onto the same tape, media utilization
increases. If a single le or le system is restored from an inter-
leaved backup, the media server lters blocks read from the back-
up media. Thus, restoring from multiplexed backup tapes
inherently takes longer than restoring from a tape containing a
single stream. Users, however, are not generally aware that back-
ups are interleaved.

Parallel Backup
In systems with high-performance networks and storage volumes,
large backup jobs can be speeded up by distributing the backup
data across several tapes. This parallel backup can be effective,
for example, when full backups of large databases are made from
08_137_170_DR1 4/2/02 2:52 PM Page 161

Chapter 8 Backup and Disaster Recovery 161

snapshots. Each backup job processes one le at a time. If the


snapshots les are backed up by separate jobs that execute con-
currently, several streams can be active at once, using different net-
work links if they are available. Depending on the relative speeds
of client, network, server, and tape drive, it may be appropriate to
direct parallel jobs to different tape drives, or to multiplex them
onto one tape.

Flash Backup
So-called ash backup reads all of the disk blocks occupied by a le
system and writes them to tape without interpretation, including
blocks that are not allocated to les. Conventional backup man-
agers open les and copy them one by one, resulting in signicant
le system overhead I/O. Flash backup reads disk block contents
as fast as physically possible, whether they represent user data, le
system metadata, or unallocated space.
To retrieve a le from a ash backup, a backup manager must
rebuild the le system metadata, and then retrieve the le from
potentially scattered areas on the tapes. Because data is recon-
structed at restore time rather than at backup time, backups are
dramatically faster, but restores take more time.
Sparsely populated le systems are usually not well suited for
ash backup, since the technique copies the contents of unallo-
cated disk space as well as data. The method works best with le
systems containing many small les, since these introduce the most
overhead I/O during backups.

BACKUP MANAGER FEATURES


Backup manager capabilities can have an impact on the intrusive-
ness and effectiveness of backup. This section lists backup man-
ager features that should be considered when an enterprise backup
manager is being chosen. While different enterprises may assign
different values to different features, those decisions should be
conscious rather than inadvertent.

Effective Hardware Utilization. Tape drives are designed to


spin as much as possible, and will last longer and deliver
08_137_170_DR1 4/2/02 2:52 PM Page 162

162 Part 2 Information Technologies for Disaster Recovery

better performance when they do. If a backup manager sup-


ports both multiplexing of independent backup streams and
parallelization of a single high-speed stream, backup proce-
dures can be adjusted to accommodate the full range of appli-
cation requirements, hardware and network facilities, and
changing conditions.
Hot Backups. It should be possible to back up both le systems
and databases while they are being accessed by applications.
Open Tape Format. It should be possible to write tapes that
can be read and restored without special software or licenses.
The delay to obtain these items increases time to recover, espe-
cially during disaster recovery, when pressure is high and dis-
tances are likely to be involved.
Consolidated Management. In principle, it should be possi-
ble to administer backup for an entire enterprise from a sin-
gle console. Enterprise backup managers should support
central management of a scalable architecture similar to that
described earlier in this chapter.
Quick Disaster Recovery. Some backup managers can
rebuild catalogs by scanning tape contents. Though this is a
useful feature in extreme circumstances, it should be a last
resort, and backup managers that require it should be avoided.
Rebuilding a catalog by scanning every tape in a large library
can add hours or days to recovery time.
Hardware Support and Flexibility. It is unlikely that any
large enterprise will do a wholesale replacement of its tape
drives and media. Backup managers should demonstrate both
ongoing support for older devices and prompt support for
newer ones.
Broad Platform Support. Most enterprises have several dif-
ferent types of servers that manage critical data. Backup man-
agers should support client functionality for a broad range of
computer architectures and should demonstrate prompt sup-
port for new operating system and device releases.
Comprehensive Media Management. Keeping track of mil-
lions of les on tens of thousands of tape cartridges can be a
daunting task. Keeping track of the age, location, usage, and
contents of media is even more daunting. Effective media man-
agement includes tape labeling, bar code management, loca-
tion tracking, robotic controls, and coordination of shared
media usage.
08_137_170_DR1 4/2/02 2:52 PM Page 163

Chapter 8 Backup and Disaster Recovery 163

Copy-on-Write Snapshots. A le system occupies a contigu-


ous range of blocks on a disk. Some of the blocks contain
metadata, which describes the location and characteristics of
user data in other blocks. Snapshot technology makes before
image copies of user data and metadata as it is modied. These
before image copies are logically merged with unmodied data,
and the resulting snapshot can be mounted and presented to
backup managers (and other applications) as though it were the
original le system with its contents frozen at a single point in
time. Snapshot technology makes it possible to make a backup
of data that is being updated by applications.

TECHNIQUES FOR MINIMIZING BACKUP WINDOWS


Hot Backups
The optimal backup window is of zero lengthone that results in
no information service interruption whatsoever. This can be
achieved with some databases, and can very nearly be achieved
with le systems.
The biggest challenge with hot backups is data consistency.
Since backup is typically a lengthy operation, it is common for
applications to update data during its course. For hot backup to
produce a consistent image of data, there must be insulation
between the data and applications that may be updating it.
Most commercial backup managers support online, or hot,
backups of le systems and databases. To achieve reliable hot back-
ups, the backup manager must integrate with the data manager or
le system whose data it is backing up. File systems, database man-
agers, and even some applications provide interfaces that enable
backup managers to effectively freeze data images so that even
lengthy backups made while data is in use provide consistent
point-in-time images of all important data. These hot backup tech-
nologies are very specic to the application or data manager, and
backup managers must be engineered specically to support them.
Support is usually delivered through backup manager components
called agents.
While hot backup can reduce windows of application outage
to zero or near zero, they do inevitably have a discernable impact
08_137_170_DR1 4/2/02 2:52 PM Page 164

164 Part 2 Information Technologies for Disaster Recovery

on application, system, and network performance. Even with hot


backup, it is still desirable to minimize the elapsed time and
resources consumed, for example by using the incremental or
block-level incremental techniques described earlier.

Other Backup Data Reduction Techniques


In addition to incremental backup, off-host backup is another way
to reduce the impact of backup on information services. In a stor-
age network, any server can physically access any data. Some backup
managers take advantage of this by running backup client software
on a different server than the one that normally accesses the data.
This is commonly called off-host backup. Off-host backup eliminates
application performance degradation due to backup client process-
ing load. Typically, off-host backup is combined with some form of
frozen image technology (copy-on-write or split mirror) so that off-
host backups are consistent point-in-time data images. Some enter-
prise disk array subsystems are capable of variations of off-host
backup. These techniques are discussed in Chapter 10.

BACKUP BEST PRACTICES


Backup copies of critical data are the last line of defense against
disaster. Even if disaster strikes a building or an entire city, enter-
prise data, and therefore information services, can be restored
(using other computers) from properly generated and maintained
backup tapes. There are several best practices an enterprise can
adopt to minimize the impact of backup on information services
and maximize the probability of successful restores when design-
ing backup procedures. The following paragraphs enumerate some
important backup best practices.

Avoid using disk mirroring to replace backup. Mirroring


protects against the failure of online storage devices, but is of
no help if les are mistakenly deleted or become corrupted. If
a le on mirrored storage is deleted or corrupted, all mirrors
are identically deleted or corrupted, and the le must be recov-
ered by restoring it from a backup copy.
08_137_170_DR1 4/2/02 2:52 PM Page 165

Chapter 8 Backup and Disaster Recovery 165

Optimize consciously. The most common use of restoration


is not disaster; its accidental deletion of or damage to indi-
vidual les. Sparsely populated backup media (e.g., with incre-
mental backups on them) are likely to lead to faster average
restore times for individual les, but as noted earlier, use more
media and are not optimal for restoring entire le systems after
a disaster. This is a choice an enterprise must make.
Test restoration regularly. Unreadable backups are useless.
Backup tapes should be tested for readability and restorability
on a sampling basis, and every
tape drive should be tested regu-
An Ounce of Prevention larly to ensure that the tapes it
writes are readable.
A museum in California was producing an
exhibit showing the development of a partic-
ular land area over a period of several decades.
Maintain
damaged
tape drives. Dirty or
tape heads may result
The exhibit curators asked NASA and other in erroneous data on tape, even
government agency contacts for satellite photos though writes appear to com-
of the area, and were thrilled when told they
could have the actual data in its original form.
plete successfully. Manufacturer
What arrived, much to the museums chagrin, was recommendations for tape drive
a set of data drums, decades-old rotating storage maintenance should be followed
for which no read device was available. The pho- or exceeded. If a backup program
tos might have been a stellar addition to the reports tape errors, heads should
exhibit, but without the ability to read the data
be cleaned immediately. If errors
representing them, they were effectively lost.
Be aware of the longevity of media formats as persist, other tapes written on
well as medias magnetic properties. the drive should be tested for
readability.
Evan Marcus Keep tapes clean. Sometimes a
VERITAS Engineer single media defect, crease, fold,
or smudge of dirt on a tape can
prevent the entire tape from
being read. Tapes should be stored in clean, environmentally
controlled areas, according to manufacturers specications.
Avoid exceeding tapes useful lifetimes. It can be tempting
to reduce costs by continuing to use a tape beyond its speci-
ed lifetimeuntil a critical restore fails. Enterprise backup
managers typically keep track of tape usage and report when
its time to retire a tape.
Refresh archived data. Not only does data on tape deterio-
rate over time, but drive specications, tolerances, and media
data formats change. For a variety of reasons, it is prudent to
recopy data archives to new media on an infrequent, but reg-
ular, basis.
08_137_170_DR1 4/2/02 2:52 PM Page 166

166 Part 2 Information Technologies for Disaster Recovery

Protect critical data with multiple copies. For data needed to


run the enterprise, the cost of purchasing, writing, transporting,
and maintaining extra tapes is certain to be much lower than
the cost of recreating data stored on them. For disaster recover-
ability, it is prudent to store at least one copy of critical data at
a recovery site some distance from the primary data center.
Design for backup resiliency. There should be no single
point of failure in the backup environment for critical enter-
prise data. Media servers should be
congured with multiple tape
An Ounce of Prevention drives of each type. Adapters, buses,
I n speaking with information technology consul-
tants who responded in the aftermath of the
September 11, 2001 attacks, one consistent
network interfaces, and infrastruc-
tures should all be redundant. Mas-
theme was heard. End users had plenty of capac-
ter servers should be clustered;
ity to routinely perform tape backups, but found media servers should be clustered
themselves grossly underequipped to perform or there should be two or more
massive restores at their recovery sites. The bottle- media servers eligible to receive any
neck was the lack of tape drives. With more scheduled backup job. And most
drives, restores could have been completed
important, backup catalogs should
sooner, saving critical systems hours or days of ex-
pensive downtime. be replicated to the disaster recov-
Another common problem was tape catalogs ery site so that data to be restored
that were not properly backed up, or that were not can be identied and located after a
available at recovery sites. In many cases, nobody disaster.
had ever copied or carried the catalog tapes off site. Keep backup media secure.
As a result, administrators spent a great deal of time
scanning backup tapes to rebuild indexes.
Backup tapes are among an enter-
One alternative to copying or carrying backup prises most valuable data
tape catalogs is to replicate them to recovery sites. resources. Because backup tapes
A replicated catalog is always up to date and easy can reconstruct information ser-
to locate. Locating and activating backup catalogs vices after a failure or disaster, they
should be an explicit part of every disaster recov-
can also be used by hostile parties
ery plan.
to misappropriate enterprise infor-
Evan Marcus mation. Tapes should be physically
VERITAS Engineer secured and environmentally pro-
tected commensurately with their
value. (There is obviously a natural
tension between preventing unauthorized access and providing
fast, easy access for rapid recovery.) Physical and environmental
protection should be assured both for tapes kept at the data cen-
ter for rapid restore and for those moved off site for disaster pro-
tection. Tapes that contain sensitive data should be deliberately
erased before being recycled so that data cannot be scavenged.
08_137_170_DR1 4/2/02 2:52 PM Page 167

Chapter 8 Backup and Disaster Recovery 167

Encrypt backup data. If the entire network path between


client and media server is not secure (e.g., if shared public net-
work facilities are used), data being backed up can be read and
even modied by unauthorized parties while it is traversing the
network. If at all possible, sensitive data being backed up
should be encrypted at the source. Some enterprise backup
managers include encryption capability. Encryption is compute
intensive, and therefore impacts backup client CPU perform-
ance, so it is a good candidate for combination with off-host
backup. Data encryption also requires a secure, reliable, long-
term key storage mechanism.
Control data restores. Most backup managers allow users to
restore their own data. Using this feature requires that system
administrators dene careful data access security procedures so
that users cannot restore les they are not authorized to access.
Label tapes. Over time, an enterprise can accumulate tens of
thousands of tapes. Without a comprehensive labeling and
storage scheme, there is no hope of nding an arbitrary le
backup in a reasonable amount of time. Most automated
libraries have bar code capabilities, which are usually inte-
grated with backup media manager software. These capabili-
ties should be used under all circumstances.

Optimizing Restore Time


System administrators must balance between backup and restore
speed. Many of the techniques that speed up backups slow restores.
Here are some techniques for speeding up restores (some of which
may increase backup times).

Choose incremental backup schedules carefully. Of course,


fewer incremental backups means more full backups, which
take longer. One popular approach is to do cumulative incre-
mental backups as needed, with full backup on a weekly basis.
With this schedule, the worst-case restore is of two backup
sets. This technique also minimizes the consequences of los-
ing a backup tape (provided that the schedule requires that
each incremental backup be on a different tape).
Keep backup tapes accessible. Restoration time increases
while operators search for tapes. Keep tapes for restoring data
08_137_170_DR1 4/2/02 2:52 PM Page 168

168 Part 2 Information Technologies for Disaster Recovery

close to tape drives. For library-equipped systems, library con-


tents should be well organized.
Optimize disk write speed. Congure online storage to max-
imize write speed. Some disk arrays use cache memory to
capture large chunks of data before writing it to disk. If the
writes are le system based, use a le system that optimizes
performance.
Optimize the speed of the path from tape to disk. A ded-
icated high-speed network link will move the data fastest across
a LAN. Direct connections via 40-MBps ultrawide SCSI beats
100BASE-T (at 100 megabits per second) better than four to
one. 100-MBps FCAL beats them both. The downside here is
expense. More hardware costs money.
Avoid restores wherever possible. Numerically, most
restores are of single les or small groups of les, rather than
full le system restores. Taking periodic le system snapshots
may make it possible to restore deleted les without recourse
to backup tapes. Snapshots do not eliminate the need for back-
ups; backups are still required to protect against catastrophic
le system loss or failure.
Choose advanced backup software. Many enterprise backup
software packages not only keep track of which les have
changed between backups, but also track le deletions. Such
software simply skips over deleted les when restoring. This
saves both restore time and administrator time.

A FINAL WORD ON BACKUP


Backup is the last line of defense against loss of critical data in
many types of disasters. Whether an unmirrored disk crashes, crit-
ical les are mistakenly deleted, or a site disaster results in total
loss of a data center, backup tapes may be the only way to avoid
irretrievable data loss. Sometimes, backups are necessary to restore
data that has suffered hopeless corruption and cant be used
because theres nothing left to use. When designing a backup strat-
egy, its important to consider that last line of defense doesnt
mean lowest priority; reliable, clean, regular backups must be a
rst priority in any information service disaster recovery strategy.
Tapes that contain critical data must be treated as very valu-
able enterprise assets and managed accordingly. Backup processes
08_137_170_DR1 4/2/02 2:52 PM Page 169

Chapter 8 Backup and Disaster Recovery 169

must be completely trustworthy, and moreover, should be tested


on a regular basis by restoring backups. The worst time to learn
that tapes are defective or that backup processes dont work is
when the backup is required to recover from a disaster.
Backup techniques must be able to complete backup within an
acceptable window of application outage. Tapes must be reliably
cataloged and physically managed so that they can be easily located
when they are needed. Advanced techniques such as backing up
from frozen image snapshots, block-level incremental copies, and
so forth, can reduce windows of application outage to very small
levelsat a cost. Information service designers must balance cost
against acceptable outage windows when dening backup regimens
for key enterprise data sets.
08_137_170_DR1 4/2/02 2:52 PM Page 170

You might also like