Vsphere 5 MetroCluster

vSphere 5.
x support with NetApp MetroCluster

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2031038
Purpose
This article provides information about deploying a vSphere Metro Storage Cluster (vMSC) across two
datacenters or sites using NetApp MetroCluster Solution with vSphere 5.0, 5.1, or 5.5. For ESXi 5.0, 5.1,
or 5.5, the article applies for FC, iSCSI, and NFS implementations of Stretch and Fabric MetroCluster.
Resolution
What is vMSC?
vSphere Metro Storage Cluster (vMSC) is a new certified configuration for NetApp MetroCluster storage
architectures. vMSC configuration is designed to maintain data availability beyond a single physical or
logical site. A storage device configured in the vMSC configuration is supported after successful vMSC
certification. All supported storage devices are listed on the VMware Storage Compatibility Guide.
What is a NetApp MetroCluster?

NetApp MetroCluster is a synchronous replication solution between two NetApp controllers providing
storage high availability and disaster recovery in a campus or metropolitan area. A MetroCluster (MC)
configuration consists of two NetApp controllers, each residing in the same data center or across two
different physical locations, clustered together. MC handles any single failure in the storage
configuration and certain multiple failures without causing disruption to data availability and provides
single-command recovery in case of complete site disaster.
What is MetroCluster TieBreaker?

MetroCluster TieBreaker (MCTB) Solution is a plug-in that runs in the background as a Windows service
or Unix daemon on an OnCommand Unified Manager (OC UM) host. The OC UM host can be a physical
machine or a virtual machine. MCTB provides automatic failover in a MetroCluster Solution in scenarios
where automatic failover is not possible. This can occur during an entire Site Failure.
MCTB continuously monitors the MetroCluster controllers and corresponding network gateways from
an OnCommand server at a third location. When MCTB detects conditions that require a Cluster Failover
on Disaster (CFOD), it issues the necessary commands to initiate the CFOD. Log messages and
OnCommand events are generated when necessary to keep the operator informed as to the state of the
MetroCluster and MCTB.
Configuration Requirements
These requirements must be satisfied to support this configuration:
For distances under 500 m, stretch MetroCluster configurations can be used, and for distances
over 500 m but under 160 km for systems running ONTAP version 8.1.1, a Fabric MetroCluster
configuration can be used.
The maximum round trip latency for Ethernet Networks between two sites must be less than 10
ms, and for syncmirror replications must be less than 3 ms.
The storage network must be a minimum of 1 Gbps throughput between the two sites for ISL
connectivity.
ESXi hosts in the vMSC configuration should be configured with at least two different IP
networks, one for storage and the other for management and virtual machine traffic. The
Storage network handles NFS and iSCSI traffic between ESXi hosts and NetApp Controllers. The
second network (VM Network) supports virtual machine traffic as well as management functions
for the ESXi hosts. End users can choose to configure additional networks for other functionality
such as vMotion/Fault Tolerance. VMware recommends this as a best practice, but it is not a
strict requirement for a vMSC configuration.
FC Switches are used for vMSC configurations where datastores are accessed via FC protocol,
and ESX management traffic will be on an IP network. End users can choose to configure
additional networks for other functionality such as vMotion/Fault Tolerance. This is
recommended as a best practice but is not a strict requirement for a vMSC configuration.
For NFS/iSCSI configurations, a minimum of two uplinks for the controllers must be used. An
interface group (ifgroup) should be created using the two uplinks in multimode configurations.
The VMware datastores and NFS volumes configured for the ESX servers are provisioned on
mirrored aggregates.
vCenter Server must be able to connect to ESX servers on both the sites.
The maximum number of Hosts in an HA cluster must not exceed 32 hosts.
Notes:
A MetroCluster TieBreaker Machine should be deployed in a third site, and must be able to
access the storage controllers in Site one and Site two to initiate a CFOD in case of an entire site
failure.
vMSC certification testing was conducted on vSphere 5.0 and NetApp Data ONTAP version 8.1
operating in 7 mode. For ESXi 5.5, vMSC certification testing was successfully completed on
vSphere 5.5 and NetApp Data ONTAP version 8.2 operating in 7 mode.
For more information on NetApp MetroCluster Design and Implementation, see the NetApp
Technical Report, Best Practices for MetroCluster Design and Implementation. For information
about NetApp in a vSphere environment, see NetApp Storage Best Practices for VMware
vSphere.
Solution Overview
The NetApp Unified Storage Architecture offers an agile and scalable storage platform. All NetApp
storage systems use the Data ONTAP operating system to provide SAN (FC, iSCSI) and NFS.
MetroCluster leverages NetApp HA CFO functionality to automatically protect against controller failures.
Additionally, MetroCluster layers local SyncMirror, cluster failover on disaster (CFOD), hardware
redundancy, and geographical separation to achieve extreme levels of availability. Local SyncMirror
synchronously mirrors data across the two halves of the MetroCluster configuration by writing data to
two plexes: the local plex (on the local shelf) actively serving data and the remote plex (on the remote
shelf) normally not serving data. On local shelf failure, the remote shelf seamlessly takes over data-
serving operations. No data loss occurs because of synchronous mirroring. Hardware redundancy is put
in place for all MetroCluster components. Controllers, storage, cables, switches (fabric MetroCluster),
and adapters are all redundant.
A VMware HA/DRS cluster is created across the two sites using ESXi 5.x hosts and managed by vCenter
Server 5.x. The vSphere Management, vMotion, and virtual machine networks are connected using a
redundant network between the two sites. It is assumed that the vCenter Server managing the HA/DRS
cluster can connect to the ESXi hosts at both sites.
Based on the distance considerations, NetApp MetroCluster can be deployed in two different
configurations:
Stretch MetroCluster
Fabric MetroCluster
Stretch MetroCluster
This is a Stretch MetroCluster configuration:
Fabric MetroCluster
This is a Fabric MetroCluster configuration:
Note: These illustrations are simplified representations and do not indicate the redundant front-end
components, such as Ethernet and fibre channel switches.
The vMSC configuration used in this certification program was configured with Uniform Host Access
mode. In this configuration, the ESX hosts from a single site are configured to access storage from both
sites.
In cases where RDMs are configured for virtual machines residing on NFS volumes, a separate LUN must
be configured to hold the RDM mapping files. Ensure you present this LUN to all the ESX hosts.
vMSC test scenarios

This table outlines vMSC test scenarios:
Scenario
NetApp Controllers Behavior
VMware HA Behavior
Controller single
path failure
Controller path failover occurs. All LUNs and

volumes remain connected.
No impact
For FC datastores, path failover is triggered from

the host and the next available path to the same
controller will be active.
All ESXi iSCSI/NFS sessions remain active in
multimode configurations of two or more
network interfaces.
ESXi single storage No impact on LUN and volume availability. ESXi No impact
path failure
storage path fails over to the alternative path. All
sessions remain active.
Site-1 controller
failure
LUN availability remains unaffected.
No impact
FC datastores fail over to the alternate available

path of a surviving controller.
ESXi iSCSI sessions affected by node failure,
failover to surviving controller.
ESXi NFS volumes continue to be accessible
through the other surviving node.
Site-2 controller
failure
LUN availability remains unaffected.
No impact
FC datastores fail over to the alternate available

path of a surviving controller.
ESXi iSCSI sessions affected by node failure,
failover to surviving controller.
ESXi NFS volumes continue to be accessible
through the other surviving node.
MCTB VM failure
No impact on LUN and volume availability. All

sessions remain active.
No impact
MCTB VM single
Link failure
No impact. Controllers continue to function

normally.
No impact
Complete Site 1
LUN and volume availability remains unaffected. Virtual machines on failed Site 1
failure, including FC datastores fail over to the alternate available ESXi nodes fail. HA restarts failed
ESXi and controller path of a surviving controller.
virtual machines on ESXi hosts on
iSCSI sessions to surviving ESXi nodes remain

Site 2.
active.
After failed controller comes back online, and the
giveback is initiated afterward, all affected
aggregates resync automatically.
Complete Site 2
LUN and volume availability remains unaffected.
failure, including FC datastores fail over to the alternate available
ESXi and controller path of a surviving controller.
iSCSI sessions to surviving ESXi nodes remain
active.
After failed controller comes back online, and the
giveback is initiated afterward, all affected
aggregates resync automatically.
Virtual machines on failed Site 2

ESXi nodes fail. HA restarts failed
virtual machines on ESXi hosts on
Site 1.
Single ESXi failure

(shutdown)

normally.
Virtual machines on failed ESXi

node fail. HA restarts failed virtual
machines on surviving ESXi hosts.
Multiple ESXi host No impact. Controllers continue to function

management
normally.
network failure
A new Master will be selected

within the network partition.
Virtual machines will remain
running. No need to restart virtual
machines.
Site 1 and Site 2

simultaneous
failure (shutdown)
and restoration
Controllers boot up and resync. All LUNs and

No impact
volumes become available. All iSCSI sessions and
FC paths to ESXi hosts are re-established and
virtual machines restarted successfully. As a best
practice, NetApp controllers should be powered
on first and allow the LUNs/volumes to become
available before powering on the ESXi hosts.
ESXi Management No impact to controllers. LUNs and volumes

network all ISL
remain available.
links failure
If the HA host isolation response is

set to Leave Powered On, virtual
machines at each site continue to
run as storage heartbeat is still
active. Partitioned Hosts on site
that do not have a Fault Domain
Manager elect a new Master.
All Storage ISL

Links failure
No impact
No Impact to controllers. LUNs and volumes

remain available.
When the ISL links are back online, the
aggregates resync.
System Manager - No impact. Controllers continue to function
Management
normally.
Server failure
NetApp controllers can be managed using
Command Line.
No impact
vCenter Server
failure
No impact on HA. However, the

DRS rules cannot be applied.

normally.

Vsphere 5 MetroCluster

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Vsphere 5 MetroCluster

Uploaded by

Copyright:

Available Formats

vSphere 5.

x support with NetApp MetroCluster

What is a NetApp MetroCluster?

What is MetroCluster TieBreaker?

vMSC test scenarios

NetApp Controllers Behavior

Controller path failover occurs. All LUNs and

For FC datastores, path failover is triggered from

LUN availability remains unaffected.

FC datastores fail over to the alternate available

LUN availability remains unaffected.

FC datastores fail over to the alternate available

No impact on LUN and volume availability. All

No impact. Controllers continue to function

iSCSI sessions to surviving ESXi nodes remain

Virtual machines on failed Site 2

Single ESXi failure

No impact. Controllers continue to function

Virtual machines on failed ESXi

Multiple ESXi host No impact. Controllers continue to function

A new Master will be selected

Site 1 and Site 2

Controllers boot up and resync. All LUNs and

ESXi Management No impact to controllers. LUNs and volumes

If the HA host isolation response is

All Storage ISL

No Impact to controllers. LUNs and volumes

No impact on HA. However, the

No impact. Controllers continue to function

You might also like