Professional Documents
Culture Documents
Purpose
This article provides information about deploying a vSphere Metro Storage Cluster (vMSC) across two
datacenters or sites using NetApp MetroCluster Solution with vSphere 5.0, 5.1, or 5.5. For ESXi 5.0, 5.1,
or 5.5, the article applies for FC, iSCSI, and NFS implementations of Stretch and Fabric MetroCluster.
Resolution
What is vMSC?
vSphere Metro Storage Cluster (vMSC) is a new certified configuration for NetApp MetroCluster storage
architectures. vMSC configuration is designed to maintain data availability beyond a single physical or
logical site. A storage device configured in the vMSC configuration is supported after successful vMSC
certification. All supported storage devices are listed on the VMware Storage Compatibility Guide.
Configuration Requirements
These requirements must be satisfied to support this configuration:
For distances under 500 m, stretch MetroCluster configurations can be used, and for distances
over 500 m but under 160 km for systems running ONTAP version 8.1.1, a Fabric MetroCluster
configuration can be used.
The maximum round trip latency for Ethernet Networks between two sites must be less than 10
ms, and for syncmirror replications must be less than 3 ms.
The storage network must be a minimum of 1 Gbps throughput between the two sites for ISL
connectivity.
ESXi hosts in the vMSC configuration should be configured with at least two different IP
networks, one for storage and the other for management and virtual machine traffic. The
Storage network handles NFS and iSCSI traffic between ESXi hosts and NetApp Controllers. The
second network (VM Network) supports virtual machine traffic as well as management functions
for the ESXi hosts. End users can choose to configure additional networks for other functionality
such as vMotion/Fault Tolerance. VMware recommends this as a best practice, but it is not a
strict requirement for a vMSC configuration.
FC Switches are used for vMSC configurations where datastores are accessed via FC protocol,
and ESX management traffic will be on an IP network. End users can choose to configure
additional networks for other functionality such as vMotion/Fault Tolerance. This is
recommended as a best practice but is not a strict requirement for a vMSC configuration.
For NFS/iSCSI configurations, a minimum of two uplinks for the controllers must be used. An
interface group (ifgroup) should be created using the two uplinks in multimode configurations.
The VMware datastores and NFS volumes configured for the ESX servers are provisioned on
mirrored aggregates.
vCenter Server must be able to connect to ESX servers on both the sites.
The maximum number of Hosts in an HA cluster must not exceed 32 hosts.
Notes:
A MetroCluster TieBreaker Machine should be deployed in a third site, and must be able to
access the storage controllers in Site one and Site two to initiate a CFOD in case of an entire site
failure.
vMSC certification testing was conducted on vSphere 5.0 and NetApp Data ONTAP version 8.1
operating in 7 mode. For ESXi 5.5, vMSC certification testing was successfully completed on
vSphere 5.5 and NetApp Data ONTAP version 8.2 operating in 7 mode.
For more information on NetApp MetroCluster Design and Implementation, see the NetApp
Technical Report, Best Practices for MetroCluster Design and Implementation. For information
about NetApp in a vSphere environment, see NetApp Storage Best Practices for VMware
vSphere.
Solution Overview
The NetApp Unified Storage Architecture offers an agile and scalable storage platform. All NetApp
storage systems use the Data ONTAP operating system to provide SAN (FC, iSCSI) and NFS.
MetroCluster leverages NetApp HA CFO functionality to automatically protect against controller failures.
Additionally, MetroCluster layers local SyncMirror, cluster failover on disaster (CFOD), hardware
redundancy, and geographical separation to achieve extreme levels of availability. Local SyncMirror
synchronously mirrors data across the two halves of the MetroCluster configuration by writing data to
two plexes: the local plex (on the local shelf) actively serving data and the remote plex (on the remote
shelf) normally not serving data. On local shelf failure, the remote shelf seamlessly takes over data-
serving operations. No data loss occurs because of synchronous mirroring. Hardware redundancy is put
in place for all MetroCluster components. Controllers, storage, cables, switches (fabric MetroCluster),
and adapters are all redundant.
A VMware HA/DRS cluster is created across the two sites using ESXi 5.x hosts and managed by vCenter
Server 5.x. The vSphere Management, vMotion, and virtual machine networks are connected using a
redundant network between the two sites. It is assumed that the vCenter Server managing the HA/DRS
cluster can connect to the ESXi hosts at both sites.
Based on the distance considerations, NetApp MetroCluster can be deployed in two different
configurations:
Stretch MetroCluster
Fabric MetroCluster
Stretch MetroCluster
This is a Stretch MetroCluster configuration:
Fabric MetroCluster
This is a Fabric MetroCluster configuration:
Note: These illustrations are simplified representations and do not indicate the redundant front-end
components, such as Ethernet and fibre channel switches.
The vMSC configuration used in this certification program was configured with Uniform Host Access
mode. In this configuration, the ESX hosts from a single site are configured to access storage from both
sites.
In cases where RDMs are configured for virtual machines residing on NFS volumes, a separate LUN must
be configured to hold the RDM mapping files. Ensure you present this LUN to all the ESX hosts.
VMware HA Behavior
Controller single
path failure
No impact
No impact
No impact
No impact
MCTB VM single
Link failure
No impact
Complete Site 1
LUN and volume availability remains unaffected. Virtual machines on failed Site 1
failure, including FC datastores fail over to the alternate available ESXi nodes fail. HA restarts failed
ESXi and controller path of a surviving controller.
virtual machines on ESXi hosts on
No impact
aggregates resync.
System Manager - No impact. Controllers continue to function
Management
normally.
Server failure
NetApp controllers can be managed using
Command Line.
No impact
vCenter Server
failure