Professional Documents
Culture Documents
Module Objectives
By the end of this module, you should be able to: Describe the high availability solutions Discuss how high availability increases reliability of storage Define the high-availability controller configuration Describe the three modes of high-availability operation with a high-availability pair Analyze the effect on client protocols during failover and giveback operations
Loss of Building
Loss of Cable
Loss of a cable between the storage system and a shelf can be overcome by shelf multipathing
Loss of Site
Loss of Region
Loss of Building
Shelf Multipathing
PROPERLY SHUT DOWN SYSTEM BEFORE OPENING CHASSIS.
AC IN
PWR
AC IN
PWR
1
e0e
HI-POT 2200VDC
e0f
RLM
e0a
e0b
e0c
e0d
LNK
LNK
LNK
LNK
LNK
LNK
LNK
X2
SHELF ID
A B
ESH4
4Gb 2Gb 1Gb ELP
X2
4Gb 2Gb
1Gb ELP
ESH4
Adding a second cable provides availability even if a single cable goes bad
LNK
0a
0b
0c
0d
0e
0f
0g
0h
Loss of Shelf
Loss of a shelf can be overcome by implementing SyncMirror
A RAID 0 implementation where the RAID groups are mirrored
Loss of Building Loss of Region
Loss of Site
SyncMirror
SyncMirror may be configured:
In a stand alone storage system or, most commonly, in a high-availability pair (discussed next)
plex0
/vol
/vol0
/etc
/vol0
/etc
SyncMirror (Cont.)
To determine which disks are in which pool:
Software disk ownership administrators may assign disks to pools using disk assign
system> disk assign {disk_list...}[-p pool]... disk_list is the Disk IDs of the unassigned disk pool is either 0 or 1 system> disk assign 0a.21 -p 1
PROPERLY SHUT DOWN SYSTEM BEFORE OPENING CHASSIS.
REPLACE THIS ITEM WITHIN 2 MINUTES OF REMOVAL
AC IN
PWR
AC IN
PWR
1
e0e
HI-POT 2200VDC
HI-POT 2200VDC
e0f
RLM
e0a
e0b
e0c
e0d
LNK
LNK
LNK
LNK
LNK
LNK
LNK
X2
LNK
0a
0b
0c
0d
0e
0f
0g
0h
SHELF ID
Pool0
1
4Gb 2Gb 1Gb ELP
A B
ESH4
SHELF ID
Pool1
1Gb 2Gb 4Gb
X2
A B
ESH4
4Gb 2Gb 1Gb ELP
X2
X2
4Gb 2Gb 1Gb ELP
4Gb 2Gb
1Gb ELP
ESH4
ESH4
SyncMirror (Cont.)
To implement SyncMirror:
Add the syncmirror_local license
License is available at no cost
SyncMirror (Cont.)
If a shelf goes bad, such as shelf that has pool0, the data is still available from pool1
Nondisruptive shelf replacement (NDSR) is now available in Data ONTAP 7.3.2 and later
PROPERLY SHUT DOWN SYSTEM BEFORE OPENING CHASSIS.
AC IN
PWR
AC IN
PWR
1
e0e
HI-POT 2200VDC
HI-POT 2200VDC
e0f
RLM
e0a
e0b
e0c
e0d
LNK
LNK
LNK
LNK
LNK
LNK
LNK
SHELF ID
Pool0
1
4Gb 2Gb 1Gb ELP
SHELF ID
Pool1
1Gb 2Gb 4Gb
X2
A B
ESH4
4Gb 2Gb 1Gb ELP
X2
4Gb 2Gb
1Gb ELP
ESH4
If multiple disks fail in an aggregate, then the data is still available by way of the alternate pool
X
1Gb 2Gb 4Gb
X2
LNK
0a
0b
0c
0d
0e
0f
0g
0h
A B
ESH4
X2
4Gb 2Gb
1Gb ELP
ESH4
SyncMirror (Cont.)
Administrators may perform additional maintenance of the mirror, such as:
Splitting the mirror aggregate Rejoining a split aggregate Removing a plex from a mirror aggregate Comparing plexes of a mirrored aggregate
NOTE: For more information about SyncMirror, please see the High Availability Web-based courses
Loss of Controller
Loss of a controller may be come by configuring a high-availability pair
Loss of Region
Loss of Building
Loss of Site
AC IN
PWR
AC IN
PWR
AC IN
PWR
AC IN
PWR
1
e0e
HI-POT 2200VDC
HI-POT 2200VDC
1
e0e
HI-POT 2200VDC
HI-POT 2200VDC
e0f
e0f
RLM
e0a
e0b
e0c
e0d
RLM
e0a
e0b
e0c
e0d
LNK
LNK
LNK
LNK
LNK
LNK
LNK
LNK
LNK
LNK
LNK
LNK
LNK
LNK
LNK
X2
X2
X2
LNK
0a
0b
0c
0d
0e
0f
0g
0h
0a
0b
0c
0d
0e
0f
0g
0h
SHELF ID
SHELF ID
A B
ESH4
4Gb 2Gb 1Gb ELP 4Gb 2Gb 1Gb ELP
ESH4
Connected to its own disk shelves Connected to the other controllers disk shelves Storage controllers are connected to each other If a storage controller fails, the surviving partner serves the data of failed controller
2009 NetApp. All rights reserved.
X2
4Gb 2Gb
1Gb ELP
ESH4
4Gb 2Gb
1Gb ELP
ESH4
High-Availability Features
PROPERLY SHUT DOWN SYSTEM BEFORE OPENING CHASSIS. PROPERLY SHUT DOWN SYSTEM BEFORE OPENING CHASSIS.
REPLACE THIS ITEM WITHIN 2 MINUTES OF REMOVAL
AC IN
PWR
AC IN
PWR
AC IN
PWR
AC IN
PWR
1
e0e
HI-POT 2200VDC
HI-POT 2200VDC
1
e0e
HI-POT 2200VDC
HI-POT 2200VDC
e0f
e0f
RLM
e0a
e0b
e0c
e0d
RLM
e0a
e0b
e0c
e0d
LNK
LNK
LNK
LNK
LNK
LNK
LNK
LNK
LNK
LNK
LNK
LNK
LNK
LNK
LNK
X2
X2
X2
LNK
0a
0b
0c
0d
0e
0f
0g
0h
0a
0b
0c
0d
0e
0f
0g
0h
SHELF ID
SHELF ID
A B
ESH4
4Gb 2Gb 1Gb ELP 4Gb 2Gb 1Gb ELP
ESH4
X2
4Gb 2Gb
1Gb ELP
ESH4
4Gb 2Gb
1Gb ELP
ESH4
Partner Communication
In an high-availability (HA) controller configuration, partners communicate through the interconnect with a heartbeat
System state is written to disk in a Mailbox Data not committed to disk is written to the local and partner nonvolatile RAM (NVRAM)
Configuring High-Availability
License the high-availability service called cf:
system> license add xxxxxx
Reboot:
system> reboot
Setting for
date, rdate NDMP (on or off) route
Route
Time zone
2009 NetApp. All rights reserved.
Normal Operation
PROPERLY SHUT DOWN SYSTEM BEFORE OPENING CHASSIS.
AC IN
PWR
AC IN
PWR
AC IN
PWR
AC IN
PWR
1
e0e
HI-POT 2200VDC
HI-POT 2200VDC
1
e0e
HI-POT 2200VDC
HI-POT 2200VDC
e0f
e0f
RLM
e0a
e0b
e0c
e0d
RLM
e0a
e0b
e0c
e0d
LNK
LNK
LNK
LNK
LNK
LNK
LNK
LNK
LNK
LNK
LNK
LNK
LNK
LNK
LNK
X2
X2
X2
LNK
0a
0b
0c
0d
0e
0f
0g
0h
0a
0b
0c
0d
0e
0f
0g
0h
SHELF ID
SHELF ID
A B
ESH4
4Gb 2Gb 1Gb ELP 4Gb 2Gb 1Gb ELP
ESH4
X2
4Gb 2Gb
1Gb ELP
ESH4
4Gb 2Gb
1Gb ELP
ESH4
Takeover Operation
system> cf takeover
PROPERLY SHUT DOWN SYSTEM BEFORE OPENING CHASSIS.
AC IN
PWR
AC IN
PWR
AC IN
PWR
AC IN
PWR
1
e0e
HI-POT 2200VDC
e0f
system
1 2 3 4 5 6 7 8 9
e0a e0b e0c e0d
LNK LNK LNK LNK LNK LNK LNK
HI-POT 2200VDC
1
e0e
HI-POT 2200VDC
e0f
system2
1 2 3 4 5 6 7 8 9
e0a e0b e0c e0d
LNK LNK LNK LNK LNK LNK LNK
HI-POT 2200VDC
RLM
RLM
LNK
LNK
0a
0b
0c
0d
0e
0f
0g
0h
0a
0b
0c
0d
0e
0f
0g
0h
X2
X2
X2
SHELF ID
SHELF ID
A B
ESH4
4Gb 2Gb 1Gb ELP 4Gb 2Gb 1Gb ELP
ESH4
Surviving partner has two identities, with each identity able to access appropriate volumes and networks only You can access the failed node using console commands
2009 NetApp. All rights reserved.
X2
4Gb 2Gb
1Gb ELP
ESH4
4Gb 2Gb
1Gb ELP
ESH4
Takeover Events
Takeover occurs on the following events:
A node undergoes a software or system failure that leads to a panic A node undergoes a system failure (for example, a loss of power) and cannot reboot There is a mismatch between the disks that one node believes it owns and the disks that the other node believe it owns One or more network interfaces configured to support failover becomes unavailable A node cannot send heartbeat messages to its partner and no other mechanism is available A node is halted: halt A takeover is manually initiated
partner
To access the failed storage controller:
system(takeover)> partner system2/system>
Failed controller / Takeover controller >
system(takeover)>
2009 NetApp. All rights reserved.
Giveback Operation
system> cf giveback
PROPERLY SHUT DOWN SYSTEM BEFORE OPENING CHASSIS.
AC IN
PWR
AC IN
PWR
AC IN
PWR
AC IN
PWR
1
e0e
HI-POT 2200VDC
HI-POT 2200VDC
1
e0e
HI-POT 2200VDC
HI-POT 2200VDC
e0f
e0f
RLM
e0a
e0b
e0c
e0d
RLM
e0a
e0b
e0c
e0d
LNK
LNK
LNK
LNK
LNK
LNK
LNK
LNK
LNK
LNK
LNK
LNK
LNK
LNK
LNK
X2
X2
X2
LNK
0a
0b
0c
0d
0e
0f
0g
0h
0a
0b
0c
0d
0e
0f
0g
0h
SHELF ID
SHELF ID
A B
ESH4
4Gb 2Gb 1Gb ELP 4Gb 2Gb 1Gb ELP
ESH4
cf giveback command terminates the emulated node The failed node resumes normal operation The high-availability configuration resumes normal operation
2009 NetApp. All rights reserved.
X2
4Gb 2Gb
1Gb ELP
ESH4
4Gb 2Gb
1Gb ELP
ESH4
Storage System must be removed from System Manager and then re-added after HA is configured
Reboot:
system> reboot system2> reboot
Check status:
system> cf status
Add one of the storage systems to System Manager and the partner is automatically identified
The HA pair
HA Configuration Problems
Do the same task for the other partner; Remember to enable the interface
To perform a giveback
Giveback complete
Best Practices
Test failover and giveback operations before placing high-availability controllers into production Monitor:
Performance of network Performance of disks and storage shelves CPU utilization of each controller to ensure it does not exceed 50%
Enable AutoSupport
NOTE: For more information about high availability, please see the High Availability Web-based course
2009 NetApp. All rights reserved.
Loss of Building
Loss of a entire building can be overcome by implementing stretch MetroCluster
Loss of Region
Loss of Building
Loss of Site
Stretch MetroCluster
Building 1
PROPERLY SHUT DOWN SYSTEM BEFORE OPENING CHASSIS. PROPERLY SHUT DOWN SYSTEM BEFORE OPENING CHASSIS.
REPLACE THIS ITEM WITHIN 2 MINUTES OF REMOVAL
Building 2
REPLACE THIS ITEM WITHIN 2 MINUTES OF REMOVAL REPLACE THIS ITEM WITHIN 2 MINUTES OF REMOVAL
AC IN
PWR
AC IN
1
e0e
HI-POT 2200VDC
HI-POT 2200VDC
1
e0f e0e
HI-POT 2200VDC
HI-POT 2200VDC
e0f
9 1 2 3 4 5 6 7 8 9
RLM
e0a
e0b
e0c
e0d
RLM
e0a
e0b
e0c
e0d
LNK
LNK
LNK
LNK
LNK
LNK
LNK
LNK
0a
0b
0c
0d
0e
0f
0g
0h
LNK LNK LNK LNK LNK LNK LNK LNK
0a
0b
0c
0d
0e
0f
0g
0h
X2
X2
X2
SHELF ID
A B
SHELF ID
A B
ESH4
4Gb 2Gb 1Gb ELP
ESH4
4Gb 2Gb 1Gb ELP
Stretch MetroCluster expands high-availability to up to 300m See the High Availability Web-based course for more information
2009 NetApp. All rights reserved.
X2
4Gb 2Gb
1Gb ELP
ESH4
4Gb 2Gb
1Gb ELP
ESH4
Loss of Site
Loss of a site can be overcome by implementing fabric-attached MetroCluster
Loss of Site Loss of Region
Loss of Building
Fabric-Attached MetroCluster
Site1
PROPERLY SHUT DOWN SYSTEM BEFORE OPENING CHASSIS. PROPERLY SHUT DOWN SYSTEM BEFORE OPENING CHASSIS.
REPLACE THIS ITEM WITHIN 2 MINUTES OF REMOVAL
Site 2
8 9
REPLACE THIS ITEM WITHIN 2 MINUTES OF REMOVAL REPLACE THIS ITEM WITHIN 2 MINUTES OF REMOVAL
AC IN
PWR
AC IN
1
e0e
HI-POT 2200VDC
HI-POT 2200VDC
1
e0f e0e
HI-POT 2200VDC
HI-POT 2200VDC
e0f
1
RLM
e0a e0b e0c e0d
RLM
e0a
e0b
e0c
e0d
LNK
LNK
LNK
LNK
LNK
LNK
LNK
LNK
0a
0b
0c
0d
0e
0f
0g
0h
LNK LNK LNK LNK LNK LNK LNK LNK
0a
0b
0c
0d
0e
0f
0g
0h
ISL Trunk
X2
X2
X2
SHELF ID
A B
SHELF ID
A B
ESH4
4Gb 2Gb
4Gb 2Gb 1Gb ELP
ESH4
1Gb ELP
Fabric-attached MetroCluster expands high-availability up to 100 km See the High Availability Web-based course for more information
2009 NetApp. All rights reserved.
X2
4Gb 2Gb
1Gb ELP
ESH4
4Gb 2Gb
1Gb ELP
ESH4
Loss of Region
Loss of a region can be overcome by implementing SnapMirror
Loss of Region
Loss of Building
Loss of Site
SnapMirror
Region 1
PROPERLY SHUT DOWN SYSTEM BEFORE OPENING CHASSIS. PROPERLY SHUT DOWN SYSTEM BEFORE OPENING CHASSIS.
REPLACE THIS ITEM WITHIN 2 MINUTES OF REMOVAL
Region 2
REPLACE THIS ITEM WITHIN 2 MINUTES OF REMOVAL REPLACE THIS ITEM WITHIN 2 MINUTES OF REMOVAL
AC IN
PWR
AC IN
1
e0e
HI-POT 2200VDC
HI-POT 2200VDC
1
e0f e0e
HI-POT 2200VDC
HI-POT 2200VDC
e0f
9 1 2 3 4 5 6 7 8 9
RLM
e0a
e0b
e0c
e0d
RLM
e0a
e0b
e0c
e0d
LNK
LNK
LNK
LNK
LNK
LNK
LNK
LNK
0a
0b
0c
0d
0e
0f
0g
0h
LNK LNK LNK LNK LNK LNK LNK LNK
0a
0b
0c
0d
0e
0f
0g
0h
X2
X2
X2
SHELF ID
A B
SHELF ID
A B
ESH4
4Gb 2Gb 1Gb ELP
ESH4
4Gb 2Gb 1Gb ELP
SnapMirror allows mirroring volumes or qtrees See the NetApp Protection Software Administration ILT course for more information
2009 NetApp. All rights reserved.
X2
4Gb 2Gb
1Gb ELP
ESH4
4Gb 2Gb
1Gb ELP
ESH4
Possibilities Abound
These high availability techniques do not have to be used in isolation...often they are combined
PROPERLY SHUT DOWN SYSTEM BEFORE OPENING CHASSIS. PROPERLY SHUT DOWN SYSTEM BEFORE OPENING CHASSIS.
REPLACE THIS ITEM WITHIN 2 MINUTES OF REMOVAL
system
2 3 4 5 6 7 8 2 3 4 5 6 7 8
High-availability
AC IN PWR
partner
2 3 4 5 6 7 8 9
AC IN PWR
AC IN
PWR
AC IN
PWR
1
e0e
HI-POT 2200VDC
HI-POT 2200VDC
HI-POT 2200VDC
HI-POT 2200VDC
e0f
RLM
e0a
e0b
e0c
e0d
Multipath
LNK LNK
e0e
e0f
RLM
e0a
e0b
e0c
e0d
LNK
LNK
LNK
LNK
LNK
LNK
0a
0b
0c
0d
0e
0f
0g
0h
LNK
LNK
LNK
LNK
LNK
LNK
LNK
X2
LNK
0a
0b
0c
0d
0e
0f
0g
0h
X2
X2
SHELF ID
A B
SHELF ID
A B
ESH4
4Gb 2Gb 1Gb ELP 4Gb 2Gb 1Gb ELP
ESH4
X2
X2
X2
X2
SHELF ID
SHELF ID
A B
ESH4
4Gb 2Gb 1Gb ELP 4Gb 2Gb 1Gb ELP
ESH4
X2
4Gb 2Gb
4Gb 2Gb
1Gb ELP
1Gb ELP
ESH4
ESH4
4Gb 2Gb 1Gb ELP
4Gb 2Gb
1Gb ELP
ESH4 ESH4
Module Summary
In this module, you should have learned to: Describe the high availability solutions Discuss how high availability increases reliability of storage Define the high-availability controller configuration Describe the three modes of high-availability operation with a high-availability pair Analyze the effect on client protocols during failover and giveback operations
Exercise
Module 13: High Availability Estimated Time: 30 minutes