You are on page 1of 67

Module 8

Implementing failover clustering


Module Overview

• Planning a failover cluster


• Creating and configuring a new failover cluster
• Maintaining a failover cluster
• Troubleshooting a failover cluster
• Implementing site high availability with stretch
clustering
Lesson 1: Planning a failover cluster

• Preparing to implement failover clustering


• Failover-cluster storage
• Hardware requirements for a failover-cluster
implementation
• Network requirements for a failover-cluster
implementation
• Demonstration: Verify a network adapter's RSS and
RDMA compatibility on an SMB Server
• Infrastructure and software requirements for a failover
cluster
• Security considerations
• Quorum in Windows Server 2016
• Planning for migrating and upgrading failover clusters
Preparing to implement failover clustering

Features of failover clustering include:


• High availability
• Stateful application
• IP-based protocols
Preparing to implement failover clustering

Consider the following guidelines when planning


node capacity in a failover cluster:
• Distribute the highly-available applications from a
failed node
• Ensure that each node has sufficient capacity
• Use hardware with similar capacity for all nodes in
a cluster
Failover-cluster storage

• Failover clusters require shared storage to provide


consistent data to a virtual server after failover
• Shared storage options include:
• SAS
• iSCSI
• Fibre Channel
• Shared .vhdx
• Scale-Out File Server
• You can also implement clustered
storage spaces to achieve high
availability at the storage level
Hardware requirements for a failover-cluster
implementation

The hardware requirements for a failover


implementation include:
• You must use server hardware that is certified for
Windows Server
• Server nodes should all have the same
configuration and contain the same or similar
components
• All servers must pass the tests in the Validate a
Configuration Wizard
Network requirements for a failover-cluster
implementation

The network requirements for a failover


implementation include:
• Your server should connect to multiple networks
to ensure communication redundancy, or it should
connect to a single network with redundant
hardware, to remove single points of failure
• You should ensure that network adapters are
identical and that they have the same IP protocol
versions, speed, duplex, and flow-control
capabilities
• Your network adapters should be compatible with
RSS and RDMA
Demonstration: Verify a network adapter's RSS
and RDMA compatibility on an SMB Server

In this demonstration, you will learn how to verify a


network adapter’s RSS and RDMA compatibility on
an SMB Server
Infrastructure and software requirements for a
failover cluster

• The infrastructure requirements for a failover


implementation include:
• Active Directory domain controllers should run
Windows Server 2008 or newer
• Domain-functional level and forest-functional level
should run Windows Server 2008 or newer
• The application must support Windows Server 2016
high availability
• The software best practices for a failover cluster
implementation requires that:
• All nodes have the same edition of Windows Server
2016, same service pack and updates
Security considerations

• Security considerations for failover clustering include that


you must:
• Provide a method for authentication and authorization
• Ensure that unauthorized users do not have physical access to failover
cluster nodes
• Ensure that you use antimalware software
• Ensure that your intra-cluster communication authenticates with
Kerberos version 5
• If you use an Active Directory-detached cluster:
• AD DS objects for network names are not created
• Cluster network name that you register in a DNS is not necessary to
create new objects in AD DS
• We do not recommend this for any scenario that requires Kerberos
authentication
• You must run Windows Server 2012 R2 or newer on all cluster nodes
Security considerations

Windows Server 2016 introduces several cluster


types, and which one you use depends on your
domain-membership scenario:
• Single-domain clusters
• Workgroup clusters
• Multi-domain clusters
• Workgroup and domain clusters
Quorum in Windows Server 2016
Quorum What has the vote? When is quorum
mode maintained?
Node Only nodes in the When more than half of
majority cluster have a vote the nodes are online
Node and The nodes in the cluster When more than half of
disk and a disk witness have the votes are online
majority a vote
Node and The nodes in the cluster When more than half of
file share and a file share witness the votes are online
majority have a vote
No Only the quorum- When the shared disk is
majority: shared disk has a vote online
disk only
Dynamic Votes are dynamically When half the votes are
quorum assigned to always be online
odd
Quorum in Windows Server 2016

• Dynamic quorum:
• Disk witness
• File share witness
• Azure Cloud Witness
• We recommend that you use dynamic quorum,
which is the default configuration
• You should use all other forms of quorum in
specific use cases only
Planning for migrating and upgrading failover clusters

The upgrade steps for each node in the cluster


include:
• Pause the cluster node and drain all cluster resources
• Migrate cluster resources to another node in the cluster
• Replace the cluster node operating system with Windows
Server 2016 and add the node back to the cluster
• Upgraded all nodes to Windows Server 2016
• Run cmdlet Update-ClusterFunctionalLevel
Lesson 2: Creating and configuring a new
failover cluster
• The Validation Wizard and the cluster support-policy
requirements
• The process for creating a failover cluster
• Demonstration: Creating a failover cluster
• Demonstration: Reviewing the Validation Wizard
• Configuring roles
• Demonstration: Creating a general file-server failover cluster
• Managing failover clusters
• Configuring cluster properties
• Configuring failover and failback
• Configuring storage
• Configuring networking
• Configuring quorum options
• Demonstration: Configuring the quorum
The validation wizard and the cluster support-
policy requirements

• Validation Wizard performs multiple types of tests,


such as:
• Cluster
• Inventory
• Network
• Storage
• System

• You can perform validation from the Validate a


Configuration Wizard or with the Test-Cluster
Windows PowerShell cmdlet
The process for creating a failover cluster

1. Install the failover clustering feature


2. Verify the configuration, and create a cluster
3. Install the role on all cluster nodes by using
Server Manager
4. Create a clustered application by using the
Failover Clustering Management snap-in
5. Configure the application
6. Test failover
Demonstration: Creating a failover cluster

In this demonstration, you will learn how to install a


Failover Clustering feature
Demonstration: Reviewing the Validation Wizard

In this demonstration, you will learn how to validate


and configure a failover cluster
Configuring roles

• Configuring a cluster role includes:


• Choosing a clustering role
• Installing the role
• Verifying the status (Running) on all cluster nodes
• You can configure a cluster role by using:
• The Cluster Manager console
• The New-Cluster Windows PowerShell cmdlet
Demonstration: Creating a general file-server
failover cluster

In this demonstration, you will learn how to cluster


a file server role
Managing failover clusters

The most common management tasks


include:
• Managing nodes
• Managing networks
• Managing permissions
• Configuring cluster-quorum settings
• Migrating services and applications to a cluster
• Configuring new services and applications
• Removing the cluster
Configuring cluster properties

The three aspects of managing cluster nodes


include:
• Adding nodes after you create a cluster
• Pausing nodes, which prevents resources from
running on that node
• Evicting nodes from a cluster, which removes the
node from the cluster configuration
Configuration tasks are available in:
• The Actions pane of the Failover Cluster
Management console
• Windows PowerShell
Configuring failover and failback

• During failover, the clustered instance and all


associated resources move from one node to
another
• Failover occurs when:
• The node that hosts the instance becomes inactive for
some reason
• One of the resources within the instance fails
• An administrator performs a failover

• The Cluster service can fail back after the offline


node becomes active again
• Failover can be planned or unplanned
Configuring storage

Storage configuration tasks in Failover Clustering


include:
• Adding storage spaces
• Adding a disk to available storage and to the CSV
• Taking a disk offline
• Bringing the disk back online
Configuring networking

Network Description
Public network Clients use this network to connect to the
clustered service
Private network Nodes use this network to communicate with
each other
Public-and-private Required to communicate with external storage
network systems

• One network can support both client and node


communications
• Multiple network adapters are recommended for
enhanced performance and redundancy
• iSCSI storage should have a dedicated network
Configuring quorum options

Quorum configuration options available in the


Configure Cluster Quorum Wizard and Windows
PowerShell) include:
• Use typical settings
• Add or change the quorum witness
• Advanced quorum configuration and witness selection
Dynamic quorum and quorum-configuration
considerations
• Dynamic quorum management:
• Failover cluster dynamically manages the vote assignment to nodes
• Allows for a cluster to run on the last surviving cluster node
• Cannot survive a simultaneous failure of a majority of voting nodes
• If you explicitly remove a vote from a node, the cluster cannot
dynamically add or remove that vote.
• Quorum configuration considerations include:
• Validating the quorum configuration by using the Validate a
Configuration Wizard, or the Test-Cluster Windows PowerShell
cmdlet.
• Changing the quorum configuration only in specific scenarios:
• Adding or evicting nodes
• Node or witness have failed and cannot be recovered quickly
• Recovering a cluster in a multisite disaster recovery scenario.
Demonstration: Configuring the quorum

In this demonstration, you will learn how to


configure a quorum
Lab A: Implementing failover clustering

• Exercise 1: Creating a failover cluster


• Exercise 2: Verifying quorum settings and adding a
node
Logon Information
Virtual machines: 20740A-LON-DC1
20740A-LON-SVR1
20740A-LON-SVR2
20740A-LON-SVR3
20740A-LON-SVR5
20740A-LON-CL1
User name: Adatum\Administrator
Password: Pa$$w0rd
Estimated Time: 45 minutes
Lab Scenario

A. Datum Corporation is looking to ensure that its


critical services, such as file services, have better
uptime and availability. You decide to implement a
failover cluster with file services to provide better
uptime and availability.
Lab Review

• What information do you need for planning a


failover-cluster implementation?
• After running Validate a Configuration Wizard,
how can you resolve the network communication’s
single point of failure?
• In which situations might it be important to
enable failback of a clustered application during a
specific time?
Lesson 3: Maintaining a failover cluster

• Monitoring failover clusters


• Backing up and restoring failover-cluster
configuration
• Maintaining failover clusters
• Managing cluster-network heartbeat traffic
• What is cluster-aware updating?
• Demonstration: Configuring CAU
Monitoring failover clusters

Tools you can use to monitor clusters include:


• Event Viewer
• Tracerpt.exe
• MHTML-formatted cluster configuration reports
• Performance and Reliability Monitor snap-in
Backing up and restoring failover-cluster configuration

• When backing up failover clusters, remember that:


• Windows Server Backup is a Windows Server 2016 feature
• Non-Microsoft tools are available to perform backups and restores
• You must perform system-state backups
• A nonauthoritative restore completely restores a single
node in the cluster
• An authoritative restore restores the entire cluster
configuration to a point in time
Maintaining failover clusters

Failover cluster troubleshooting techniques include:


• Using the Validate a Configuration Wizard
• Reviewing events in logs (cluster, hardware, storage)
• Defining a process for troubleshooting failover clusters
• Reviewing storage configuration
• Checking for group and resource failures
Managing cluster-network heartbeat traffic

• Types of network monitoring:


• Aggressive
• Relaxed

• Network-monitoring parameter settings:


• Delay
• Threshold

• Windows PowerShell cmdlet examples:


Get-Cluster | fl *subnet*
(Get-Cluster).SameSubnetThrehold=10
What is cluster-aware updating?

• Automated feature in Windows Server 2016


• Updates nodes in a cluster with minimal or no
downtime
• Benefits:
• Updating is automatic
• Can be scheduled
• No downtime
How CAU works

CAU works in two modes:


• Remote updating mode:
• Configure a separate computer as an orchestrator
• Install the failover-clustering administrative tools
• Ensure that the orchestrator computer is not a cluster
member
• Self-updating mode:
• Configure the CAU clustered role as a workload
• Ensure that there is no dedicated orchestrator computer
• Remember that cluster updates itself
Demonstration: Configuring CAU

In this demonstration, you will learn how to


configure CAU
Lesson 4: Troubleshooting a failover cluster

• Communication issues
• Repairing the cluster name object in AD DS
• Starting a cluster with no quorum
• Demonstration: Reviewing the Cluster.Log file
• Monitoring performance with failover clustering
• Using Event Viewer with failover clustering
• Windows PowerShell troubleshooting cmdlets
Communication issues

• The following might cause communications issues


in failover clustering:
• Network latency
• Network failures
• Network-adapter driver issues
• Firewall rules
• Security software

• You can use Get-ClusterLog cmdlet to generate


the Cluster.log file for troubleshooting located in
C:\Windows\Cluster\Reports
Repairing the cluster name object in AD DS

• The CNO repair process:


• Use Repair Active Directory Object option in the Failover
Cluster Manager
• You must have Reset Password permissions on the CNO
computer object
• The VCO repair process:
• Use the AD Recycle Bin feature to recover deleted
computer objects, and use the Repair function as the last
recovery action
• The CNO will reset the password and self-heal
automatically
• The CNO must have Create Computer Objects
permissions on the VCO’s OU
Starting a cluster with no quorum

• Cluster nodes must retain quorum for the cluster to


work
• If quorum is lost, try to reestablish the quorum
• If you cannot reestablish quorum during an extended
period, start the cluster in the ForceQuorum mode
• After you start the cluster in ForceQuorum mode,
other nodes can rejoin the cluster
• Once quorum is reestablished again, cluster mode
changes from ForceQuorum to normal automatically
• When joining nodes to the cluster in ForceQuorum
mode, you should start other nodes with a setting
preventing quorum
Demonstration: Reviewing the Cluster.Log file

In this demonstration, you will learn how to review


the Cluster.log file
Monitoring performance with failover clustering

Some of the failover clustering performance


counters include:
• Cluster Network Messages
• Cluster Network Reconnections
• Global Update Manager
• Database
• Resource Control
• API
• Cluster Shared Volumes
Using Event Viewer with failover clustering

Events that are displayed in Event Viewer and require you to


troubleshoot clusters include:
• Cluster resource in clustered service or application failed
• Cluster network interface for cluster node on network
failed
• File share witness resource failed to arbitrate for the file
share
• Cluster node was removed from the active failover cluster
membership
• The Cluster service failed to bring clustered service or
application completely online or offline
• Cluster network name resource failed registration of one or
more associated DNS name(s)
• Cluster network name resource cannot be brought online
Windows PowerShell troubleshooting cmdlets

Common cmdlets for troubleshooting failover


clustering include:
• Get-Cluster
• Get-ClusterAccess
• Get-ClusterDiagnostics
• Get-ClusterGroup
• Get-ClusterLog
• Get-ClusterNetwork
• Get-ClusterResourceDependencyReport
• Get-ClusterVMMonitoredItem
• Test-Cluster
• Test-ClusterResourceFailure
Lesson 5: Implementing site high availability with
stretch clustering

• What is a stretch cluster?


• Prerequisites for implementing a stretch cluster
• Synchronous and asynchronous replication
• Overview of the Storage Replica feature
• Demonstration: Implementing server-to-server
storage replica
• Selecting a quorum mode for a stretch cluster
• Configuring a stretch cluster
• Challenges for deploying a stretch cluster
• Multisite failover and failback considerations
What is a stretch cluster?

A stretch cluster is a cluster that has been extended so that


different nodes in the same cluster reside in separate
physical locations

Site A Site B

SAN SAN
Prerequisites for implementing a stretch cluster

To implement a stretch-failover cluster, you must


ensure the following:
• Plan for additional hardware to support enough nodes
on each site
• Ensure that the same operating systems and service
packs are installed on each node
• Include at least one low-latency and reliable network
connection between sites
• Configure a storage replication mechanism
• Configure storage infrastructure services on each site
Synchronous and asynchronous replication
• In synchronous replication, the host receives a write complete response
from the primary storage after the data is written successfully to both
storage locations
• In asynchronous replication, the host receives a write complete
response from the primary storage after the data is written
successfully on the primary storage
Site A Site B

Replication
Write
request
Secondary
Data Data storage
Write
complete Primary
storage
Overview of the Storage Replica feature

• Use for disaster recovery or preparedness


• Configure via Failover Cluster Manager or
Windows PowerShell
• The three replication scenarios are:
• Stretch cluster
• Server-to-server
• Cluster-to-cluster

• Replicates synchronously or asynchronously


• Requires Windows Server 2016 Datacenter Edition
• Requires GPT-initialized disks
Storage Replica

• Synchronous replication

• Asynchronous replication
Storage Replica

Hyper-V stretch cluster supports synchronous replication only


Storage Replica

Server-to-server supports both synchronous and


asynchronous replication
Storage Replica

Cluster-to-cluster supports synchronous replication only


Demonstration: Implementing server-to-server
storage replica

In this demonstration, you will learn how to


configure storage replica
Selecting a quorum mode for a stretch cluster

• File-share witness:
• Requires three or more datacenter locations
• Is available in Windows Server 2012 R2 and
Windows Server 2016
• Azure Cloud Witness:
• Requires two datacenter locations
• Requires Internet connection for all nodes
• Is available in Windows Server 2016 only

• No witness:
• Is not recommended
• Manual failover (disaster-recovery site)
Configuring a stretch cluster

Site-aware failover-cluster services provide:


• Failover affinity
• Cross-site heartbeating
• Preferred site configuration
Challenges for deploying a stretch cluster

When deploying stretch clusters:


• Ensure that the business requirements are met
• Use storage replication between sites:
• Hardware vendor (Windows Server 2012 R2 or earlier)
• Storage Replica (Windows Server 2016)
• Choose the correct quorum witness to properly
maintain functionality in the event of failures
• Choose the correct storage-replication solution to meet
the needs for Storage Replica
Multisite failover and failback considerations

When implementing stretch clusters in disaster


recovery scenarios, consider the following:
• Failover time
• Services for failover
• Quorum maintenance
• Storage connection
• Published services and name resolution
• Client connectivity
• Failback procedure
Lab B: Managing a failover cluster
• Exercise 1: Evicting a node and verifying quorum settings
• Exercise 2: Changing the quorum from disk witness to
file-share witness, and defining node voting
• Exercise 3: Verifying high availability
Logon Information
Virtual machines: 20740A-LON-DC1
20740A-LON-SVR1
20740A-LON-SVR2
20740A-LON-SVR3
20740A-LON-SVR5
20740A-LON-CL1
User name: Adatum\Administrator
Password: Pa$$w0rd
Estimated Time: 45 min
Lab Scenario

A. Datum Corporation recently implemented


failover clustering for better uptime and
availability. The implementation is new and your
boss has asked you to go through some failover-
cluster management tasks so that you are
prepared to manage it moving forward.
Lab Review

• Why would you evict a cluster node from a failover


cluster?
• Do you perform failure-scenario testing for your
high-available applications based on Windows
Server failover clustering?
Module Review and Takeaways

• Review Questions
• Real-world Issues and Scenarios
• Tools
• Best Practices
• Common Issues and Troubleshooting Tips

You might also like