You are on page 1of 49

NETAPP UNIVERSITY

Introduction to MultiPath High Availability

Student Guide
Course ID: INT-ILT-CDOTHA
Content Version: 1.0
ATTENTION
The information contained in this course is intended only for training. This course contains information and activities that,
while beneficial for the purposes of training in a closed, non-production environment, can result in downtime or other
severe consequences in a production environment. This course material is not a technical reference and should not,
under any circumstances, be used in production environments. To obtain reference materials, refer to the NetApp product
documentation that is located at http://now.netapp.com/.

COPYRIGHT
© 2014 NetApp, Inc. All rights reserved. Printed in the U.S.A. Specifications subject to change without notice.
No part of this document covered by copyright may be reproduced in any form or by any means—graphic, electronic, or
mechanical, including photocopying, recording, taping, or storage in an electronic retrieval system—without prior written
permission of NetApp, Inc.

U.S. GOVERNMENT RIGHTS


Commercial Computer Software. Government users are subject to the NetApp, Inc. standard license agreement and
applicable provisions of the FAR and its supplements.

TRADEMARK INFORMATION
NetApp, the NetApp logo, Go Further, Faster, ASUP, AutoSupport, Campaign Express, Customer Fitness, CyberSnap,
Data ONTAP, DataFort, FilerView, Fitness, Flash Accel, Flash Cache, Flash Pool, FlashRay, FlexCache, FlexClone,
FlexPod, FlexScale, FlexShare, FlexVol, GetSuccessful, LockVault, Manage ONTAP, Mars, MetroCluster, MultiStore,
OnCommand, ONTAP, ONTAPI, RAID DP, SANtricity, SecureShare, Simplicity, Simulate ONTAP, Snap Creator,
SnapCopy, SnapDrive, SnapIntegrator, SnapLock, SnapManager, SnapMirror, SnapMover, SnapProtect, SnapRestore,
Snapshot, SnapValidator, SnapVault, StorageGRID, Tech OnTap, and WAFL are trademarks or registered trademarks of
NetApp, Inc. in the United States and/or other countries.
Other product and service names might be trademarks of NetApp or other companies. A current list of NetApp trademarks
is available on the Web at http://www.netapp.com/us/legal/netapptmlist.aspx.

2 Introduction to Multi-Path High Availability

© 2015 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Contents
INTRODUCTION TO MULTIPATH HIGH AVAILABILITY ............................................................................ 1
STUDENT GUIDE................................................................................................................................................ 1
HIGH AVAILABILITY: A REQUIREMENT FOR A PRODUCTION ENVIRONMENT ........................................... 7
HIGH AVAILABILITY CONFIGURATION DETAILS........................................................................................... 22
TROUBLESHOOTING HIGH AVAILABILITY ..................................................................................................... 38

3 Introduction to Multi-Path High Availability

© 2015 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Introduction to High-
Availability in Clustered
Data ONTAP

1 © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

4 Introduction to Multi-Path High Availability

© 2015 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Course Overview

In this course, you learn about the purpose, history, and


configuration details of the vital high-availability (HA) function of
the Data ONTAP® operating system. You also learn the tools
and techniques for troubleshooting for high availability. The
learning is reinforced through labs.

2 © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

5 Introduction to Multi-Path High Availability

© 2015 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Course Objectives

After this course, you should be able to:


 Identify the functions, value, and purpose of high availability
in the Data ONTAP operating system
 Describe high-availability (HA) configurations, including
hardware components, software components, and client
protocol response
 Define the conditions that cause HA takeover
 Verify that conditions are safe for HA giveback
 Use AutoSupport, Hardware Assist, and Config Advisor to
troubleshoot HA issues
3 © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

6 Introduction to Multi-Path High Availability

© 2015 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Module 1
High Availability: A
Requirement for a Production
Environment

1 © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

HIGH AVAILABILITY: A REQUIREMENT FOR A PRODUCTION ENVIRONMENT


Module Objectives

After this module, you should be able to:


 Explain the NetApp high-availability (HA) concept
 Describe controller failover (CFO)
 Diagram a high-availability configuration of a NetApp node

2 © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

8 Introduction to Multi-Path High Availability

© 2015 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
What Is High Availability?

 High availability is the quest to maximize data availability


 There are two ways to maximize data availability:
 Purchase reliable components
 Eliminate single points of failure (SPOFs)

3 © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

9 Introduction to Multi-Path High Availability

© 2015 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
What Is a Single Point of Failure?

A single point of failure is one component which, when it


breaks, disrupts the serving of data.
Single Points of Failure are exist for two reasons:
• No redundancy built in to the system.
• A component has failed, removing built-in redundancy. It is
important to always replace problematic components when
they are detected, to avoid creating SPOFs.

4 © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

10 Introduction to Multi-Path High Availability

© 2015 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Identify SPOFs

 Single head
 Single cable from head to
shelf
 Single port that is being
used on head
 Single module that is
being used on shelf
 Single shelf chassis

5 © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

11 Introduction to Multi-Path High Availability

© 2015 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Identify Redundancies

 Two power supplies on the


head
 Four power supplies on
the shelf
 RAID-DP technology for
disk redundancy

6 © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

12 Introduction to Multi-Path High Availability

© 2015 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Dual-Path Redundancy

Adding one cable eliminates


three single points of failure.
Now there is more
redundancy:
• Two ports on the head
• Two cables from head to
shelf
• Two shelf modules

7 © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

13 Introduction to Multi-Path High Availability

© 2015 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
SPOFs of a Dual-Path Configuration

There are still many single


points of failure:
• The head
• The shelf
• The data

8 © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

14 Introduction to Multi-Path High Availability

© 2015 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
High Availability

To eliminate the head as an SPOF, a second head can be added to create


an HA pair.

9 © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

This shelf is a DS4243 shelf, attached to a FAS3220.

15 Introduction to Multi-Path High Availability

© 2015 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Multipath High Availability

Combining a dual-path configuration and high availability results in


multipath high availability. Now the only SPOFs are the shelf chassis and
the data.

10 © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

To eliminate the shelf chassis and data as single points of failure requires SyncMirror software, which is discussed in
other courses.

16 Introduction to Multi-Path High Availability

© 2015 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Multipath High-Availability Configuration
Completing the Configuration

 To complete the multipath high-availability (MPHA)


configuration, these two items are needed:
 A license for high availability:
 system license add
 An interconnect path:
 For two-in-a-box pairs, the path is on the chassis backplane.
 For standalone systems, there is an external cable.

11 © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

17 Introduction to Multi-Path High Availability

© 2015 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Final Cabling

Interconnect
Path (x2)
Primary Connection

Standby Connection

Redundant Standby
Connection
Redundant Primary Connection

12 © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

18 Introduction to Multi-Path High Availability

© 2015 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
MPHA Restrictions

 Both heads in an HA pair must be the same model.


 Both heads in an HA pair must run the same major release of
the Data ONTAP operating system. If one head runs Data
ONTAP 8.3.1, the other head can run 8.3.2 but not 8.4.
 The user load cannot exceed the load that a single head can
support.
For each system model’s limitations on total storage, and so
on, see the Hardware Universe.

13 © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

19 Introduction to Multi-Path High Availability

© 2015 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
MPHA Configurations

 Symmetrical: Each node has equal storage.


 Asymmetrical: One node has more storage than the other.
 Active-passive: One node has only a root aggregate
(passive), and the other node has the remaining storage
(active).
 Shared stacks: Disks are assigned to nodes individually,
rather than by stack. (This configuration usually results in
asymmetrical configuration.)

14 © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

20 Introduction to Multi-Path High Availability

© 2015 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Module Summary

Now that you have completed this module,


you should be able to:
 Explain the NetApp high-availability (HA) concept
 Describe controller failover (CFO)
 Diagram a high-availability configuration of a NetApp node

15 © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

21 Introduction to Multi-Path High Availability

© 2015 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Module 2
High Availability Configuration
Details

1 © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

HIGH AVAILABILITY CONFIGURATION DETAILS


Module Objectives

After this module, you should be able to:


 Describe takeover and giveback operations
 Describe the effect of takeovers and givebacks on a system
 Verify that takeovers and givebacks are successful
 Resolve common issues with takeovers and givebacks

2 © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

23 Introduction to Multi-Path High Availability

© 2015 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
How Takeover Works

 The two heads communicate via the high-availability (HA)


interconnect.
 Status information is stored on the mailbox disks.
 Writes are sent to the NVRAM of both heads.
 If a takeover is needed, the up head does the following:
 Takes over the serving of network traffic
 Takes over ownership of the data disks
 Has all data online for users within 180 seconds

To monitor the takeover, use this command:


storage failover show-takeover
3 © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

24 Introduction to Multi-Path High Availability

© 2015 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
The HA Interconnect

The HA interconnect is a cable between the two heads used to


communicate about status. This status includes:
 Whether the head is up, down, waiting for giveback, or in
transit between states
 The status of disks, modules, and shelves
 A copy of all writes, which is sent to the partner’s NVRAM
 The time

4 © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

25 Introduction to Multi-Path High Availability

© 2015 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Mailbox Disks

 Information that is sent over the HA interconnect is saved to


two mailbox disks on each head.
 The space that is used is less than 5 MB, and it is outside the
WAFL (Write Anywhere File Layout) file system.
 If a disk that holds mailbox information fails, a new mailbox
disk is immediately chosen.
 If the partner cannot access the mailbox disks, takeover is not
possible.
 If the mailbox disks are not synchronized with each other, an
error message about an “unsynchronized log” is sent.

5 © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

26 Introduction to Multi-Path High Availability

© 2015 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Writes to NVRAM

 In the WAFL course, you learned that writes go to memory,


with a copy sent to NVRAM.
 In an HA pair, the write is sent to the memory of the head with
the write request, and the write is sent to the NVRAM of both
heads.
 If a head goes down before the write is complete, the partner
can complete the write.

6 © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

27 Introduction to Multi-Path High Availability

© 2015 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Sharing NVRAM

 To accommodate the partner, the NVRAM is split into two


halves. There is one half for each head.
 For example, the FAS3220 has 1.6 GB of NVMEM. 800 MB
are used for each head.
 400 MB can be written before NVMEM is full and a
consistency point is needed.

7 © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

28 Introduction to Multi-Path High Availability

© 2015 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Networking During Takeover

 The first step in a takeover is network migration.


 The logical interfaces (LIFs) can be configured to migrate to
the HA partner or to any node in the cluster.
 To omit LIF migration during manual takeovers, use this
command:
–skip-lif-migration true
 LIF migration is frequently omitted during maintenance events
to prevent user access.

8 © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

29 Introduction to Multi-Path High Availability

© 2015 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Storage During Takeover

 The non-root aggregates are taken offline, ownership is


moved to the up head, and the aggregate is put back online.
 This situation is disruptive to:
 CIFS users
 NFSv4 users
 This situation might cause a delay, but it is not disruptive to:
 SMB 3.0 users with continuous availability
 NFSv3 users
 SAN users
 The root aggregate is the final element that is moved.

9 © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

30 Introduction to Multi-Path High Availability

© 2015 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
When Does a Takeover Happen?

 When it is user-initiated:
 storage failover takeover
 When there is a user-initiated halt or reboot and the
–inhibit-takeover flag is not set to true
 After a software panic
 After a disruptive hardware failure, such as a power loss
 When the HA interconnect is online, but no updates are
received from the partner

10 © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

31 Introduction to Multi-Path High Availability

© 2015 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
When Does Takeover Not Happen?

 When there is a panic reboot


 When there are network issues
 After a nondisruptive hardware failure, such as a single-disk
failure
If a module or cable fails, there is not a takeover unless the
redundant path is also offline.

11 © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

32 Introduction to Multi-Path High Availability

© 2015 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
When Does a Takeover Fail?

 The most common reason for takeovers to fail is that the data
on the down head is inaccessible or offline.
 Takeover cannot happen if both heads are offline (for
example, during a data center power outage).
To find other reasons for the failure of a takeover, check the
logs on the up head:
event log show -node * -event to*

12 © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

33 Introduction to Multi-Path High Availability

© 2015 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Giveback

 Giveback works in the reverse order of takeover:


 First the root aggregate migrates home.
 Then the data aggregates migrate.
 Finally, the network LIFs migrate.
 This operation is disruptive to the same protocols as takeover.
 To monitor giveback, use this command:
storage failover show-giveback

13 © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

34 Introduction to Multi-Path High Availability

© 2015 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
When Does Giveback Happen?

 If the up head detects that the partner status is “waiting for


giveback” and it has been at least 600 seconds since
takeover, a giveback is attempted.
 If the giveback fails, it is attempted again in two hours. If this
giveback also fails, no more attempts are made.
 If the giveback is successful, but the node experiences an
additional system panic within 24 hours of the first panic,
there is not another attempt at a giveback.
To start manual giveback:
storage failover giveback

14 © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

35 Introduction to Multi-Path High Availability

© 2015 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Why Does Giveback Fail?

 The most common reason for giveback to fail is that the


partner status is not “waiting for giveback.”
 The second most common reason is a failed disk.
To ignore failed disks, use this command:
–auto-giveback-override-vetoes

15 © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

36 Introduction to Multi-Path High Availability

© 2015 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Module Summary

Now that you have completed this module,


you should be able to:
 Describe takeover and giveback operations
 Describe the effect of takeovers and givebacks on a system
 Verify that takeovers and givebacks are successful
 Resolve common issues with takeovers and givebacks

16 © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

37 Introduction to Multi-Path High Availability

© 2015 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Module 3
Troubleshooting High
Availability

1 © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

TROUBLESHOOTING HIGH AVAILABILITY


Module Objectives

After this module, you should be able to:


 Use Config Advisor to detect and correct errors
 Send AutoSupport notifications or core files from systems in
takeover
 Isolate the causes of “unsynchronized log” messages in a
high-availability configuration, and correct the issues
 Isolate and correct issues with multipath high availability

2 © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

39 Introduction to Multi-Path High Availability

© 2015 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Config Advisor

 Config Advisor is a tool that NetApp supports.


 It checks the health and configuration of NetApp systems.
 You download it from support.netapp.com and run it locally.
Because it is local, it can also be used at secure sites.
 The Config Advisor UI provides a list of errors.

3 © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

40 Introduction to Multi-Path High Availability

© 2015 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Config Advisor
Displaying Multipath

 The results include a summary of the storage cabling.


 Errors and faults are also shown.

4 © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

41 Introduction to Multi-Path High Availability

© 2015 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
AutoSupport Notifications During Takeover

 When the system is in takeover, AutoSupport notifications can


be sent from either head:
system node autosupport invoke –node [node] –
type [type] –message [message]
 AutoSupport notifications that are sent while one head is
offline do not include information about the offline head, but
they include information about storage, access, and other
online functionality.
 To see AutoSupport notifications that have already been sent:
system health autosupport trigger history
show
5 © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

42 Introduction to Multi-Path High Availability

© 2015 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Core Files

 When a software or hardware component is interrupted and


causes a system panic, the system memory contents are
dumped as core files.
 Information about the state of the system, the contents of
memory and NVRAM, and the status of each CPU is stored in
a file. The file is first saved to disk outside the file system.
 After the storage is online again, the core file is rewritten to
the file system area. This rewriting happens with or without
takeover.
 The core files can be sent to NetApp for analysis, if
necessary.
6 © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

43 Introduction to Multi-Path High Availability

© 2015 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Sending Core Files

 Core files are in /mroot/etc/crash.


 They can be uploaded with the commands system script
or autosupport:
system script upload -filename
/mroot/etc/crash/core.101178384.2014-08-
28.07_18_45.nz -destination
ftp://ftp.netapp.com/to-ntap/2004123456_core.nz
- or –
autosupport invoke-coredump -core-filename
core.101178384.2014-08-28.07_18_45.nz -case-
number 2004123456
7 © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

44 Introduction to Multi-Path High Availability

© 2015 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Troubleshooting
“Unsynchronized Log” Message

 It is normal to receive an “unsynchronized log” message


during a takeover or giveback.
 If the message is received at any other time, takeover might
be prevented.
 Verify that all mailbox disks are online and serving data, by
using:
 Config Advisor
 OnCommand System Manager
 The CLI
 After you resolve the storage issue, the logs should
synchronize within a few minutes.
8 © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

45 Introduction to Multi-Path High Availability

© 2015 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Troubleshooting
Multipath High Availability

 For new installations and changed configurations, verify that


the cabling is correct.
 If the hardware has not been reconfigured recently, check the
components in the path, such as the cable and the module.
 It is unlikely for one SAS or solid-state drive (SSD) disk to
bring a path offline.
 It is unlikely, but not unknown, for an FC-AL disk to bring a
path offline.

9 © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

46 Introduction to Multi-Path High Availability

© 2015 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
OnCommand System Manager
System Alerts

Possible
effect

Description
of cause Corrective
actions

10 © 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use

47 Introduction to Multi-Path High Availability

© 2015 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Module Summary

Now that you have completed this module,


you should be able to:
 Use Config Advisor to detect and correct errors
 Collect AutoSupport notifications or core files from systems in
takeover
 Isolate the causes of “unsynchronized log” messages in a
high-availability configuration, and correct the issues
 Isolate and correct issues with multipath high availability

11 © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

48 Introduction to Multi-Path High Availability

© 2015 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
12 © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

THANK YOU

49 Introduction to Multi-Path High Availability

© 2015 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.