Professional Documents
Culture Documents
Abstract
Users are faced with many options and tradeoffs when choosing
a backup strategy for Microsoft SQL Server databases. This
white paper maps out those choices and examines how EMC
Data Domain deduplication storage systems preserves data
integrity, meets stringent RTO/RPO objectives, and integrates
easily into a multitude of active SQL or third-party backup
environments.
February 2012
Page 2
Table of Contents
Executive summary.................................................................................................. 5
Audience ............................................................................................................................ 6
Introduction ............................................................................................................ 6
Data Domain Product Background ...................................................................................... 8
Advantages of Data Domain in a SQL Server Environment................................................... 8
EMC Data Domain Boost ..................................................................................................... 9
Integration ............................................................................................................ 19
Solution Planning ............................................................................................................. 20
Conclusion ............................................................................................................ 22
Appendix A: Index Fragmentation........................................................................... 23
Addressing the Challenge ................................................................................................. 23
Page 3
List of Figures
Figure 1: Native MS-SQL Database Backup Tool ......................................................................... 6
Figure 2: NetWorker MS SQL Client Properties VSS Snapshot Configuration ......................... 7
Figure 3: Dual MS SQL Database Backups NetWorker and Native SQL Server Back Up ............ 8
Figure 4: EMC Data Domain Boost .............................................................................................. 9
Figure 5: Microsoft SQL Server Management Studio Databases................................................ 10
Figure 6: Selection Recovery Model .......................................................................................... 12
Figure 7: Restore Database Dialog Box ..................................................................................... 13
Figure 8: Restore Database Options ......................................................................................... 13
Figure 9: Restore the Initial Full Backup then the First Transaction Log Backup ........................ 14
Figure 10: NetWorker MS SQL Client Restore GUI Example ....................................................... 14
Figure 11: Native SQL Backup - Disable Compression .............................................................. 15
Figure 12: Multi-striped Database Backup Eight Stripes ........................................................ 18
Figure 13: Database Backup to a Null Device ........................................................................... 18
Figure 14: Multiple Null Disk Devices ....................................................................................... 19
Figure 15: Nominal Database Backup Performance .................................................................. 19
Figure 16: NetWorker Management Console ............................................................................. 22
Figure 17: DBCC showcontig Command Output ..................................................................... 24
Page 4
Executive summary
Many database administrators prefer native Microsoft SQL Server backups directly to
disk compared to using third-party backup applications. When utilizing native SQL
Server backup, there is no reliance on the backup administrative team to perform
backups or play a role in database recovery. Additionally, there is no longer a need
for the database administrator to become proficient in deploying, configuring,
administering, or maintaining third-party backup applications.
Historically, native SQL backups have had some drawbacks for a couple of reasons:
Backup to disk did not meet the requirement of retaining an offsite copy of
database backups as part of a disaster recovery strategy. Native backup to disk
fell short of providing a viable solution for this requirement.
Data Domain systems are simple to integrate utilizing traditional backup software,
but also offer an alternative with high-speed, cost-effective backup directly to a
CIFS network share, utilizing native SQL Server backup. Users have the choice to
eliminate the need for third-party SQL Server backup application agents and their
associated operational costs and maintenance fees.
EMC NetWorker integration with EMC Data Domain Boost (DD Boost) significantly
increases performance by distributing parts of the deduplication process to
NetWorker storage nodes or applications hosts, and serves as a solid foundation
for additional integration between NetWorker and Data Domain systems
Data Domain systems benefit from the EMC Data Domain Data Invulnerability
Architecture continuous recovery verification, fault detection and self healing,
and other resiliency features transparent to the backup application.
This white paper provides information about the use of Data Domain deduplication
storage as backup media for Microsoft SQL Server backups.
Page 5
Audience
This white paper is intended as a guide for data protection architects, SQL Server
database administrative staff, backup administrators and EMC partners seeking
information about integrating Data Domain deduplication storage systems as a key
component in a comprehensive backup and recovery strategy.
Introduction
Microsoft SQL Server backup
methodology falls into one of two
generic categories. The first
consists of native SQL Server
database backups. This backup
technique creates SQL database
backups using tools and utilities
native to Microsoft SQL Server and
does not rely on third-party backup
application software (see figure 1).
The native database backup tool
performs a full database backup to
disk through a CIFS network share.
The tool is easy to use and
provides a feature set that
addresses business requirements.
Benefits include the use of backup
Figure 1: Native MS-SQL Database Backup Tool
and recovery interfaces familiar to
the database administrative staff. This ability is included with Microsoft SQL Server,
and there are no additional third-party software license fees.
The second backup methodology uses backup application software that integrates
with Microsoft SQL Server to perform SQL database backups based on the Virtual
Device Interface (VDI). This solution is typically packaged as a database agent
specifically for Microsoft SQL Server and a particular backup application. When VDI is
used, the backup application allows setting customized backup and recovery
parameters similar to those that can be employed when using native Microsoft SQL
tools and utilities.
EMC NetWorker backup software has the capability to utilize available snapshot
technologies designed to provide application consistency for the backup and
recovery processes. The EMC NetWorker Module for Microsoft Applications (NMM)
delivers unified, online backup and recovery utilizing Microsoft Virtual Shadow Copy
Services (VSS) for Microsoft applications including SQL Server, Exchange, SharePoint,
and Hyper-V.
Page 6
Page 7
Figure 3: Dual MS SQL Database Backups NetWorker and Native SQL Server Back Up
Page 8
This enables users to perform native database backups in conjunction with database
backups controlled by a third-party backup application without affecting
deduplication efficiency. This includes third-party backup applications that use a SQL
agent, with or without VSS snapshots. Additionally, the use of different numbers of
stripes or different blocksize values also has a negligible impact on deduplication
ratios.
Data Domain network-efficient replication can be used to create offsite copies of SQL
backups faster and more economically than legacy tape-based strategies. Data
Domain replication makes advanced disaster recovery preparedness for SQL Server a
reality.
Schedules replication
Catalog awareness
Ease of use
-
Page 9
With DD Boost, backup applications can control replication between multiple Data
Domain systems and provide backup administrators with a single point of
management for tracking all backups and duplicate copies. This paradigm allows
backup administrators to efficiently create DR copies of their backups over the WAN
using DD Replicator software and keep catalog consistency for easy disaster recovery.
This also provides the flexibility for administrators to manage different retention
periods for each copy of data.
With NetWorker, the Data Domain replication process is managed by standard
NetWorker cloning, ensuring that NetWorker can recognize and manage a replicated
(remote) copy of data and assign unique retention policies to it. The administrator
has the ability to schedule the cloning process to run at a time that is most
appropriate for the business.
Data protection strategies for the system databases are dependent on the database
being protected. For instance, transaction log backups are not supported for the
master database.
The master database cannot be recovered if a functional version of it does not already
exist. Recovery procedures for the master database may include re-installing
Microsoft SQL Server such that a backup of the pre-disaster master database can
then be restored.
Backup and Recovery for Microsoft SQL Server
Using EMC Data Domain Deduplication Storage Systems
Best Practices Planning Guide
Page 10
The model and msdb databases can contain customized data such as user-specific
templates, scheduling information, as well as backup and restore history information.
Without a data protection strategy, these items will need to be manually
reconstructed in the event of a disaster.
The tempdb database is empty when the SQL instance is shut down, and does not
require protection as it is re-created at startup.
Terminology
Entire databases, specific database files, file groups, and transaction log backups are
among the supported backup types with Microsoft SQL Server. This section defines
the terminology associated with a given backup type.
Types of Backups
Database backups
Database Backup This is a full backup of an entire database and represents the
state of the database at the point when the backup is completed
Differential Database Backup This is a backup of all the files within a database,
and contains only the extents modified since the most recent full backup of each
file. Restoring a database protected with full and differential backups to the most
recent point in time includes recovering the most recent full and differential
backup.
Partial backups
Partial Backup Partial backups provide flexibility for backing up databases that
contain some number of read-only file groups. This is a partial backup of all data
in the primary filegroup, each read/write filegroup, and any optionally specified
read-only files or filegroups.
Differential Partial Backup This backup contains only the extents modified since
the prior partial backup of the same set of filegroups
File backups
File Backup This consists of a full backup of all data in one or more files or
filegroups
Differential File Backup This is a backup of one or more files containing data
extents changed since the prior full backup of each file
Page 11
Copy-Only backups
Database backups usually change the database in some way, such as truncating a
transaction log in the case of a full database backup. Copy-Only backups can be
used in cases where a backup of a database is required without changing the
database.
Recovery Models
Microsoft SQL Server includes three recovery models: simple, bulk logged, and full
(see Figure 6). The desired recovery model can be deployed based on requirements.
Functionally, each recovery model differs with regard to how backup and recovery
strategies are executed.
Recovery Techniques
The technique used to restore a database will vary based on the recovery model being
used as well as the backup types being performed. Figures 7-10 provide a brief look
at restoring a database that was protected using the full recovery model with full and
transaction log backups. A single full backup was performed, followed by five
transaction log backups. Figure 7 depicts the restore database dialog box and general
database restore attributes. By default the full backup and subsequent transaction
log backups are all selected. Clicking the OK button would initiate recovery to the
most recent possible point in time. Alternately, recovery to a specific point in time is
also possible.
Page 12
Page 13
Figure 9 is an example of
a recovery transaction
that restores the initial
full backup, followed by
the first transaction log
backup. The remaining
transaction logs were not
included in this query for
brevity.
Figure 9: Restore the Initial Full Backup then the First Transaction Log Backup
EMC NetWorker and third party backup applications will each have a unique recovery
interface for databases. Many automate and coordinate the recovery of full and
transaction log backups similar to the way native Microsoft SQL Server tools and
utilities do.
Figure 10 is an example
of the NetWorker MS
SQL client restore GUI
Figure 10: NetWorker MS SQL Client Restore GUI Example
SETTING
NO_COMPRESSION
Disabled
Disabled
Disabled
Disabled
Page 14
Compression
Specific to SQL Server 2008 Enterprise and later
versions, backup compression can be enabled
or disabled. The default product installation
does not compress backups. A server-level
compression setting can be applied that alters
default behavior. The use of the COMPRESSION
keyword within a backup SQL transaction
explicitly enables backup compression. The use
of the NO_COMPRESSION keyword within a
backup SQL transaction explicitly disables
backup compression.
Figure 11: Native SQL Backup - Disable
Multiplexing
When the Data Domain system is integrated as a backup device with a backup
application that supports multiplexed backups, EMC recommends disabling
multiplexed backups. Multiplexing limits the ability of the Data Domain system to
deduplicate incoming data.
Historically used as a speed matching solution where multiple slower data streams
were multiplexed into a single stream to take advantage of a somewhat faster tape
drives, backups to disk drives obtain no advantage from multiplexing. Whether
deployed as a CIFS share, NFS mount, VTL, or OpenStorage / DD Boost disk pool, Data
Domain systems accommodate writing multiple backup streams in parallel without
multiplexing.
Encryption
Encrypted files are by definition, unique. The encryption software that is part of the
backup application will create unique files, on-the-fly for each backup, defeating the
deduplication capabilities of the deduplication storage system. Data Domain
Encryption software provides encryption of data at rest and is persistent in flight
during replication with Data Domain Replicator software.
Page 15
Blocksize
The, BLOCKSIZE keyword can be used to alter physical block size used when writing
to backup media. By default the backup process will automatically select a block size
appropriate for the backup device. Supported sizes are 512, 1K, 2K, 4K, 8K, 16K, 32K
and 64K bytes. The default value used for disk backup is 512 bytes.
The default 512-byte size yields excellent performance with Data Domain systems.
Third-party backup applications may substitute their own default value. The fact that
this parameter can be adjusted is included as reference. The use of larger sizes may
improve or degrade performance. Users are encouraged to investigate further to
determine what value may provide optimal results in their environment.
Stripes
While not a keyword within the context of Microsoft SQL Server, the term stripes
correlates to the number of simultaneous backup streams to be created for a given
backup operation. In the case of disk backups with SQL Server, multi-streamed
backups are performed by specifying a number of backup disk targets with the
BACKUP command.
Table 3: Mount Options
MOUNT OPTIONS
When performing native database
backups
When using a third-party backup server
SETTING
UNC path
Dependent on backup application and
server OS type
Page 16
When the Data Domain system is used as a disk backup media for native Microsoft
SQL Server backups, configuration is performed utilizing a CIFS share.
As a general rule, the UNC path to the share should be used instead of a mapped
drive because:
a) Scheduled backups may execute when no user is logged in to the server
b) When Sqlservr.exe is executed as a service, it has no relation to a login
session
Table 4: Miscellaneous Options
MISCELLANEOUS OPTIONS
CONFIGURATION
Yes
Replication
Yes
Comingling native and third-party backups to a Data Domain system should have only
a negligible impact on deduplication ratios because of the variable segment
processing and Stream Informed Segment Layout (SISL) architected into Data Domain
systems.
Since Data Domain Replicator software only sends unique, compressed data
segments to the remote system it is ideal for network-efficient disaster recovery.
Table 5: Infrastructure Configuration
INFRASTRUCTURE
CONFIGURATION
IP Network
Page 17
Backup Command
The recommended use of SQL stripes is as a speed matching technology. Multiple
backup streams from a given database can be simultaneously written to a target Data
Domain system in an effort to achieve an aggregate data transfer rate that aligns with
business requirements.
Figure 12 illustrates a multi-striped
database backup that uses eight
stripes in an effort to improve backup
data transfer rate performance.
Multiple stripes can be used to better
match data transfer rate capabilities
between source and destination
media.
Page 18
Integration
EMC NetWorker and third-party backup applications used to protect Microsoft SQL
Server can also take advantage of Data Domain systems employed as backup media.
Data Domain systems are easily configured as varied backup media types and
protocols including VTL, CIFS share, NFS mount, or Data Domain Boost (DD Boost) for
backup applications such as EMC NetWorker.
Additionally, DD Boost enables managed replication capabilities known as, clone
controlled replication with EMC NetWorker.
In this scenario, backup images are replicated from one Data Domain system to
another under the direct control of NetWorker or other supported backup
applications. DD Boost monitoring, reporting, and cataloging of replicated backup
images and savesets can be used to architect a comprehensive disaster recovery
plan.
Page 19
Solution Planning
Capacity and performance planning play a critical role in both successful deployment
and ongoing production usage of a Data Domain system. Detailed capacity analysis
should be performed by a knowledgeable EMC Velocity partner or an EMC technical
consultant. The analysis considers database sizes, growth rates, change rates, and
retention periods as input criteria. Performance analysis considers data points such
as the required aggregate data transfer rate for backups, connection topology
requirements to support the data transfer rate, and the Data Domain system required
to meet or exceed the required data transfer rate.
Beyond capacity and performance planning are additional considerations for Data
Domain system replication.
Additional Considerations
Replication Scope
Replicating all database backups is certainly possible. However, many users will want
to implement replication at a more granular level. Production database backups are
usually excellent replication candidates, whereas development and test database
backups are less critical. An analysis of network bandwidth and destination disk
space requirements should be performed by a knowledgeable EMC Velocity partner or
an EMC technical consultant.
Replication Topology
Backups are typically replicated to serve as a second backup copy for recovery in the
event of a disaster. When backups from a primary site are being replicated to a
secondary site, planning is relatively straightforward. Users with multiple primary
sites may decide to implement a bidirectional replication solution where database
backups from either site are replicated to the alternate site. Proper planning should
render an outline detailing which database backups are being replicated to each
location.
Tape Consolidation
Some users replicate backup images to a central location for disaster recovery
purposes while also using the solution as a vehicle that enables centralized tape
creation. The third-party backup application used to create tape-based backup copies
will dictate any additional considerations or restrictions that this solution involves. A
knowledgeable EMC Velocity partner or an EMC technical consultant will be able to
assist with this planning task.
Page 20
Backup Types
The goal of backups is to satisfy recovery time and point objectives. Outlining a
strategy of full, differential, and transaction log backups is beyond the scope of this
paper. That stated, there are a few key points worth noting:
Performing full backups frequently with Data Domain deduplication storage does
not create a storage usage penalty, as redundant database segments do not
consume additional disk space. While this may appear to enable the ability to
perform full backups more frequently, the load full backups place on the SQL
server and connection topology to the Data Domain system should be taken into
consideration.
When split-mirror or snapshot backups are performed and controlled by a thirdparty backup application, the Data Domain system is easily integrated as a
backup storage device. The features provided by these backup techniques (lowimpact backups, instant recovery, and so on) do not preclude the use of Data
Domain technology.
IP Network Considerations
When Data Domain systems are deployed as a CIFS backup share, EMC recommends
interconnecting SQL servers and Data Domain systems using a dedicated backup
area network. When deployment is in conjunction with a backup application as a CIFS
share, NFS mount, or OpenStorage / DD Boost disk pool, EMC similarly recommends
interconnecting backup application media servers and Data Domain systems using a
dedicated backup area network.
Whenever possible, the network used for backup and recovery communications
should be segregated from other production networks. This best practice
recommendation seeks to assure that network bandwidth is available for backup and
restore jobs to meet or exceed business objectives.
Network bandwidth requirements may dictate the need for a topology that supports
data transfers in excess of 125 MB/s. All Data Domain systems support the use of
multiple GbE network interfaces, and the use of 10 GbE network interfaces.
A knowledgeable Data Domain system engineer will be able to assist with planning
the deployment based on user requirements and available resources.
Page 21
Conclusion
A Data Domain system makes an excellent target for Microsoft SQL Server backups
because it integrates easily and seamlessly into existing SQL Server environments.
Data Domain systems allow the SQL Server administrative team to retain a greater
number of full backup images online, thereby optimizing recovery options while
occupying minimal footprint in the data center, utilizing native backup tools that are
familiar to SQL Server administrators.
The addition of a Data Domain system into the environment greatly reduces
dependence on legacy tape and provides faster time-to-DR with network-efficient
replication.
When Data Domain Boost integration with EMC NetWorker is leveraged, performance
can be greatly improved and the managed replication includes the remote backup
image in the saveset database for easy recovery.
It is for all of these reasons that more people choose to build their backup solutions
using EMC products and technology.
Page 22
Page 23
expected to grow considerably over time. Finally, disk file fragmentation can be
reduced by Windows file system defragmentation utilities such as the Windows Disk
Defragmenter.
Do all indexes need to be defragmented or just a subset?
EMC recommends the use of index defragmentation tools based on thresholds and
limits versus automatically defragmenting every index on every table whether it is
required or not. The suggestion is to understand what indexes and their
corresponding fragmentation levels impact performance.
These indexes should be monitored for a specific fragmentation threshold, and action
taken to defragment these indexes only when necessary. Selective index
defragmentation will have less impact on production and will assist in preserving the
ability to efficiently deduplicate database backups.
Figure 17 depicts the DBCC showcontig
command output. It includes extent
scan fragmentation data indicating
that index C_CustomerI1 does not
require defragmentation at this time.
Page 24
Page 25