You are on page 1of 4

Node-to-node connectivity problems

What problem are you having?

• I cannot complete a cluster on the first node.

• When the resources fail over and the nodes do not detect each other, there is no connectivity between the

nodes or with the cluster storage device.

• Quorum resource does not start.

• Quorum resource fails.

• Quorum log becomes corrupted.

• Additional node cannot join the cluster.

• Nodes cannot connect to the cluster drives.

• The cluster quorum disk (containing the quorum resource) becomes disconnected from all nodes in a

cluster and you are later unable to add the nodes back to the cluster.

I cannot complete a cluster on the first node.

Cause: Windows Server 2003, Enterprise Edition or Windows Server 2003, Datacenter Edition is incorrectly
installed.

Solution: Make sure that Windows Server 2003, Enterprise Edition or Windows Server 2003, Datacenter Edition is
correctly installed.

Cause: Your hardware is not listed in the Cluster category on the Microsoft Windows Catalog.

Solution: Make sure that your hardware is listed on the Microsoft Windows Catalog. See the Windows Catalog at
Support resources. Search for "Cluster".

If any of the hardware you are using for your cluster is not on this list, consider replacing those components with
hardware that is listed.

Cause: You have chosen individual components from the Windows Catalog instead of systems.

Solution: Do not choose individual components of your cluster from the Microsoft Windows Catalog. Microsoft
supports only systems chosen from this list.

Cause: Your primary Internet protocol (IP) address is invalid.

Solution: If the node uses DHCP to obtain noncluster IP addresses, use Ipconfig.exe to verify that you have a
valid primary IP address for all network adapters. If the second IP address listed (the subnet mask) is 0.0.0.0, your
primary address is invalid.

When the resources fail over and the nodes do not detect each other, there is no connectivity between
the nodes or with the cluster storage device.

Cause: The Remote Procedure call (RPC) service is not running.

Solution: On each node, use Services in Control Panel to confirm that the RPC service is running.
Cause: The nodes do not have RPC connectivity.

Solution: Verify that the nodes have RPC connectivity.

You can determine this by using a network analyzer (such as Network Monitor), or you can use RPCPing (available
on the Microsoft Exchange Server CD)

Quorum resource does not start.

Cause: The resource is not physically connected to the server.

Solution: Make sure that the resource is physically connected to the server.

Quorum resource fails.

Cause: The disk on the shared bus holding the quorum resource has failed.

Solution: If the disk on the shared bus holding the quorum resource fails and cannot be brought online, the
Cluster service cannot start. To correct this situation, you must use the fixquorum option to start the Cluster
service on a single node, and then use Cluster Administrator to configure the Cluster service to use a different disk
on the shared bus for the quorum resource.

When fixquorum is specified, the Cluster service starts without a quorum resource, and does not bring the
quorum disk online. A node cannot join a cluster when the Cluster service is running with the fixquorum option.

For instructions on how to change the disk that the Cluster service uses for the quorum resource, see Use a
different disk for the quorum resource.

Quorum log becomes corrupted.

Cause: This may occur for a variety of reasons.

Solution: If the quorum log is corrupted, the Cluster service attempts to correct the problem by resetting the log
file. In this case, the Cluster service writes the following message in the system log:

The log file [name] was found to be corrupt. An attempt will be made to reset it.

If the quorum log cannot be reset, the Cluster service cannot start.

If the Cluster service fails to detect that the quorum log is corrupted, the Cluster service may fail to start. In this
case, there may be an "ERROR_CLUSTERLOG_CORRUPT" message in the system log.

To correct this, you must use the noquorumlogging option when starting the Cluster service to temporarily run
the Cluster service without quorum logging. You can then correct the disk corruption and delete the quorum log, as
necessary. When noquorumlogging is specified, the Cluster service brings the quorum disk online, but disables
quorum logging. You can then run Chkdsk on your quorum disk to detect and correct disk corruption.

For instructions on how to recover from a corrupted quorum log or quorum disk, see Recover from a corrupted
quorum log or quorum disk.

Additional node cannot join the cluster.

Cause: The cluster configuration on the node may not have been completely removed if the node was previously
evicted.

Solution: At a command prompt, type cluster [cluster name] nodenode name/forcecleanup.

When an additional node fails to join a cluster, improper name resolution is often the cause. The problem may exist
because of invalid data in the WINS cache. You may also have the wrong binding on the WINS or DNS Services for
the additional node.
If WINS or DNS is functioning correctly on all nodes:

Cause: You may not be using the proper cluster name, node name, or IP address.

Solution: Confirm that you are using the proper cluster name, node name, or IP address.

When joining a cluster, you can specify the cluster name, the computer name of the first node, or the IP address of
either the cluster or the first node.

Cause: The Cluster Name resource may not have started.

Solution: Confirm that the Cluster Name resource started.

Use Cluster Administrator on the first node to ensure that the Cluster Name resource is running.

Cause: The Cluster service may not be running on the first node.

Solution: Confirm that the Cluster service is running on the first node and that all resources within the Cluster
Group are online before installing a second node.

The Cluster service may not have yet started when you attempted to join the cluster.

Cause: Network connectivity may not exist between the nodes.

Solution: Confirm that network connectivity exists between the nodes.

Make sure TCP/IP is properly configured on all nodes.

Cause: You may not have IP connectivity to the cluster address.

Solution: Confirm that you have IP connectivity to the cluster address and that the IP address is assigned to the
correct network.

If you cannot ping the IP address of the cluster, run Cluster Administrator on the first node and ensure the cluster
IP Address resource is running. Also, use Cluster Administrator to ensure that the cluster has a valid IP address
and subnet mask (click Cluster Group, right-click Cluster IP Address, and click Properties), and that the IP
address does not conflict with an IP address that is already in use on the network. If the address is not valid,
change it, take the Cluster IP Address resource offline, and then bring it online again. If the IP address is not
assigned to the correct network, use Cluster Administrator to correct the problem.

If your cluster nodes use DHCP to obtain noncluster IP addresses, use Ipconfig.exe to verify that you have a valid
primary IP address for the networks in question. If the second IP address listed (the subnet mask) is 0.0.0.0, your
primary address is invalid.

Cause: The network role may have changed from Use for all communication.

Solution:Confirm that the network role has not been changed from Use for all communication.

A network role is initialized to Use for all communications but can be changed by the administrator. After
verifying that you have IP connectivity to the cluster address and that the IP address is assigned to the correct
network, use Cluster Administrator to confirm that at least one of the networks connected between the nodes is
enabled to Use for all communications. To verify the role in Cluster Administrator, open the Networks folder,
right-click Network, and then click Properties. Use for all communications enables the network to have both
node-to-node communication and client-to-cluster communication.

For more information on networks, see Server cluster networks. For more information on network roles,
see Configuring cluster network hardware. To use Cluster Administrator to reset the network's role to Use for all
communications, see Enable a network for cluster use.

Nodes cannot connect to the cluster drives.

Cause: The same drive letters may not have been assigned to the cluster drives on all nodes.

Solution: Confirm that the cluster drives are assigned the same drive letters on all nodes.
To do so, run Disk Management on each node and make sure that identical drive letters are assigned to all cluster
drives.

Where?

• Computer Management/Storage/Disk Management

Cause: The node may not be physically connected to the cluster drive.

Solution: Confirm that the node is physically connected to the cluster drive.

If it is not, shut down all nodes and the cluster drive. Connect the nodes to the shared storage bus. Then, start the
cluster drive and start the first node. After the Cluster service starts on the first node, start the additional nodes,
and attempt to connect to the cluster drive.

Cause: If you are using a shared SCSI bus, the SCSI devices may not have unique IDs.

Solution: Verify that each SCSI device has a unique ID. SCSI controller IDs are preset to seven. Reset one SCSI
controller ID to six.

Cause: The controllers on the shared storage bus may not be correctly configured.

Solution: Confirm that the controllers on the shared storage bus are correctly configured (with all cards
configured to transfer data at the same rate).

Cause: The devices and controllers may not match.

Solution: Confirm that your devices and controllers match.

For example, do not use a wide connection controller on one node and a narrow connection controller on another
node. It is also recommended that all fibre channel controllers be homogeneous, so do not use different brands of
controllers together.

The cluster quorum disk (containing the quorum resource) becomes disconnected from all nodes in a
cluster and you are later unable to add the nodes back to the cluster.

Cause: The cluster configuration on the nodes may not have been completely removed.

Solution #1: Use the cluster.exe node /force[cleanup] command to evict the nodes from the cluster. For more
information, see Evict a node from the cluster.

Solution #2: Use the Cluster service fixquorum start up parameter to start the Cluster service. Only one node at
a time can be started with this command. You cannot join any other nodes to the node started using this
command. For more information, see Recover from a corrupted quorum log or quorum disk.

For information about how to obtain product support, see Technical support options.

You might also like