You are on page 1of 25

SOLUTION FOR

HIGH AVAILABILITY
AND DISASTER RECOVERY
Rsystems International
Scope of this Presentation
Data Availability
Data recovery
High availability
Disaster recovery
Issues Focus
MS SQL Server
Physical servers
Presentation Focus
2
Requirements and Considerations
This section provides questions and issues to consider-
Business motivations and regulatory requirements that are driving the HA/DR
requirements. Understand the categorization of the workload from an HA/DR perspective.
Less than 2000 concurrent users
Database size = ~ 200 MB
No. of Databases = 1
We consider the recovery time objective (RTO) , the recovery point objective (RPO) and
Recovery Level Objective (RLO) for workload category, for both a failure within a data
center (local high availability) and a total data center failure (disaster recovery). RPO and
RTO may vary for different workloads and so the load testing would provide a better look
into this.
RTO = 4 Hrs (Max)
RPO = zero
RLO = Database Level
To design and adopt an HA/DR solution it is also important to understand the implications
of applying maintenance to both hardware and software (including Windows security
patching).
No. of Application servers = 2
No. of Database servers = 1 (1 more will be configured)
3
Business Case for Availability
Keep business-
critical applications
available
Secondary:
Server maintenance
Protect against loss
of data
Secondary:
Application
upgrades
Infrastructure
upgrades

High Availability Disaster Recovery
4
Database Failure Scenarios
Storage subsystem
Disk
Controller
Network
Server
Power
Operator errors
DBMS interruption
Drops / deletes
Application defects
DBMS defects
Data corruption
Physical Infrastructure
Failures
Logical Data Failures
5
Service Recovery Strategies
Standby
Mode
Failover Behavior SQL Server Feature
Cold
standby
Manual intervention required to
restore offline data copy
Backup and restore
Warm
standby
Data copy online and ready
Manual failover required
Transaction log shipping
Database mirroring
Hot
standby
Automatic failover Database mirroring
Failover clustering
6
Data Availability Continuum
Degrees of protection for information systems:
Business Risk Solution
Data Recovery Data loss Redundant data
High Availability Downtime of
database service
Redundant system
components
Disaster Recovery Downtime of
business operations
Redundant systems
and facilities
7
Data Recovery
8
Backup Retention Policy
Location of backup files
separation of backup location from the main database servers.
Duration of retention
Number of days backups are retained.
Protection of sensitive data
Backups saved in encrypted form.
Access to backups from offsite data storage
The backup locations should be accessible to all database servers.
9
Data Recovery Process
Backup file sets
Full baseline,
differential, and
transaction logs
Retrieving backup
files
Offsite storage
Tape
Network copy
Dependency on
multiple people to get
access to backup files

Recovery strategy
depends on failure
scenario
Devise recovery strategy for
each scenario
Does worst-case recovery
scenario fit within SLA
parameters?
Recovery time; SLA
Include future data
growth in recovery
plan
10
High Availability
11
High Availability
Minimize or avoid service downtime
Whether planned or unplanned
When components fail,
service interruption is brief or non-existent
Automatic failover
Eliminate single points of failure (as affordable)
Redundant database
12
Redundancy of Components
Objective: Avoid single points of failure (where affordable)
Approach: Use redundant components for database service
Proposed-
Database server nodes (2)
Application Servers (2)
DBMS instance (3 -> 1 on main database server,2 on secondary
database server)
Storage devices


Single Point of Failure still remaining-
Load balancer
Website server

13
Database Mirroring
Redundancy at user database level
Duplicate copy of user database
Independent storage devices
Multiple copies of instance database
Mirrored over private network channel
Mirror always redoing transactions from principal
Negligible impact on transaction throughput
Multiple mirroring modes:
High-availability: commit @ log on mirror; automatic
failover (WE WILL BE USING THIS MODE)
High-protection: commit @ log on mirror; manual failover
High-performance: commit when logged on principal
Very fast automatic failoverseconds
Requires witness server
Mirror-aware application client connection
Application must be using SNAC (SQL Native Client)
Database connection string must specify both servers
Mirror database may be available for read-only
access (snapshots)
Works with standard hardware
Local Storage
local sys DBs
mirror user DB
Local Storage
local sys DBs
source user DB
node A node B
witness
(optional)
14
Mirror Witness
With mirroring, more than one server is required
to decide on failover
Witness automates failover from primary to mirror
Watches database availability
Reports observations back to principal and mirror
Runs in separate SQL Server instance
(Express Edition is OK)
Very low resource consumption
Not a single point of failure
15
SQL Server Failover Clustering
Shared Storage
system DBs
user DBs
quorum
node A node B
Two clustered nodes
Active/Passive config
MS SQL services
Running on virtual server
Shared storage device
User databases
System databases
Quorum drive
Redundant internal
components

16
Active/Passive Failover
Clustering
Redundancy at database instance level
All databases fail over together
Shared copy of system databases
Single data copy on shared storage device
No I/O overhead reducing throughput
Storage unit is single point of failure for
cluster
All database services are clustered
SQL Agent; Analysis Services; Full-Text
engine, MS DTC
Automatic failover (up to minutes)
DBMS accessed over virtual IP
Database not available from inactive node
for DB client connections
Storage is controlled by one cluster node at a
time
Requires hardware certified by Microsoft
for Microsoft Cluster Service

Shared Storage
system DBs
user DBs
quorum
node A node B
17
HA Comparison
Scope: user DB
Standard hardware
One SQL license
(unless querying snapshots
on mirror)
Very fast failover (seconds)
OS flexible (e.g. 32/64)
Independent storage
Independent services
Reporting on mirror
Geographic separation OK
Scope: DBMS instance
Certified hardware
One SQL license
(only one node can access
database)
Automatic failover (up to
minutes)
Enterprise OS
Shared storage
Clustered services
Standby not available
Servers are usually co-located
Database Mirroring Failover Clustering
18
Data Recovery Requirements
Requirements
B
a
c
k
u
p

a
n
d

R
e
c
o
v
e
r
y

D
B

M
i
r
r
o
r
i
n
g


H
i
g
h
-
P
e
r
f
o
r
m
a
n
c
e

D
B

M
i
r
r
o
r
i
n
g


H
i
g
h
-
P
r
o
t
e
c
t
i
o
n

D
B

M
i
r
r
o
r
i
n
g


H
i
g
h
-
A
v
a
i
l
a
b
i
l
i
t
y

F
a
i
l
o
v
e
r

C
l
u
s
t
e
r
i
n
g

Cost Low Medium Medium Medium High
Relative complexity Low Medium Medium High High
Data loss Possible Possible None None None
Scope of duplication Database Database Database Database DBMS
Failover Downtime Manual Manual Seconds Up to minutes
Client redirect Manual Automatic Automatic Automatic Automatic
Rolling upgrades & maint. No OS & DB OS & DB OS & DB OS
Access data on secondary Restore Snapshot Snapshot Snapshot No
Geographic separation OK OK Latency? Latency? Latency?

19
Disaster Recovery
20
Disaster Recovery
Minimize downtime of business operations
Redundant systems and facilities
SQL Server features:
Transaction log shipping
Database mirroring
Failover clustering
Other technologies
Storage-based (Amazon)
21
Issues Focus
22
Considerations for Developers
App services tolerant to database service interruptions
Application transactions must be handled in codedata consistency
Exception handling for transaction retry, connection recovery
Bulk data operations must be intimated.
Transaction volume impacts rollback time during failover
Dont bypass transaction logging
Be aware of database recovery model
Mirroring uses Failover_Partner in connection string
Use TCP/IP as client protocol
23
Considerations for Admins
The principal instance and the mirror instance should use identical hardware
Your database servers should be configured using standard hardware redundancy techniques
Your drive letters and directory structure for your SQL Server data files and SQL Server log files
should be identical
You should be running the same version, edition, and build of SQL Server on both sides of the
mirroring partnership
For simplicity, all members of the mirroring partnership should be in the same Windows Domain
All members of the mirroring partnership need to be able to ping each other, and they should be
able to communicate on port 1433 and port 5022 (by default)
The principal instance and the mirror instance should have Windows Instant File Initialization
enabled
You should get your VLF counts and log file autogrow sizes under control for all of the user
databases that you want to mirror
The database mirrored must have the Full recovery model at all times
Make sure to get index maintenance situation in good shape before you mirror a database
Use backup compression for your full and log backups
24
Questions and Observations
Database is using Dynamic Ports, which makes it difficult to
configure strong Firewall on it.
WS-1, WS-2 and Database servers all have their Firewall running in
default state.
On Website front-
All forms/pages are database driven
Please suggest if application is managing the sessions?
Please mention the page session timeout?
Is there automatic refreshing of data on pages?
Please mention the database connectivity timeout on pages?
Since all pages being database driven, please suggest the open state
timeout of all pages?
25

You might also like