You are on page 1of 40

How to Install Windows 2000 Cluster Services: Selecting and Preparing the Hardware

Selecting the correct hardware, and preparing it correctly, is half the battle when clustering Windows 2000. As a rule of thumb, if you have done a good job with the hardware, you should not have any difficulty installing Windows 2000 Cluster Services. In fact, it should be a non-event. But if you don't have the correct hardware installed and configured correctly, you could spend hours, if not days, troubleshooting why you can't install Cluster Services. The focus of this section is to highlight some of the most important considerations when selecting and configuring hardware for a SQL Server cluster. Because there are so many different possible hardware configurations possible for Windows 2000 clustering, there is no way it is possible in a single article to cover all of the potential issues. But for the most part, the topics discussed in the article will cover 90% or more of the "big" issues when it comes to Windows 2000 clustering hardware.

Hardware Requirements for Clustering Each node in the cluster must have this required hardware: A physical disk to hold Windows 2000 Advanced Server or Windows 2000 Datacenter Server. This disk must not be connected to the storage bus used to connect to the shared array. Ideally, this drive should be connected to its own SCSI card, and should be mirrored for best fault tolerance. A SCSI or Fibre Channel adapter to be used to connect to the shared disk array. Other than the shared disk array, no other devices should be attached to this adapter. Two PCI network adapter cards: one for the public network and one for the private network. RAM requirements will vary, depending on the needs of SQL Server. CPU requirements will vary, depending on the needs of SQL Server.

Each of these will be discussed in more detail later in this article.

Active/Active vs. Active/Passive Configuration Before we go any further, you need to know if you will be running your SQL Server cluster in an Active/Active or in an Active/Passive configuration. The answer to this question will affect how you need to size the server's belonging to the cluster. (Note: This articles focuses on the two-node cluster design, although most of the information in this article also is applicable to a four-node cluster design).

An Active/Active configuration means that you will be running SQL Server on both nodes of the cluster. So if one node fails over to the other node, then the node will be running two instances of SQL Server, instead of one instance. An Active/Active configuration has two major implications for your cluster hardware. First, ideally, each node in the cluster must be large enough in capacity to run both instances of SQL Server. While you may not need to size each node to be fully twice as big as necessary in order to handle the double load when a failover occurs, they do need to large enough to handle both loads well enough so that user productivity doesn't suffer unnecessarily when a failover does occur. Second, an Active/Active configuration requires that the shared disk array have available at least two separate disks , one for each of the active nodes, in addition to a shared disk for use as the cluster quorum drive. In an Active/Passive configuration, SQL Server is only running on a single node of the cluster. So if the primary node (the node that is currently running SQL Server) fails, the the other node takes over. Assuming both servers are have the same hardware, there should be no performance issues after a failover. Because there is only one instance of SQL Server running, each server only needs to be as big as necessary to handle a single SQL Server's needs.

Sizing the Cluster's Servers For the most part, you size the servers in a cluster like you would size them for a SQL Server not in a cluster. The number of CPUs, the amount of RAM, and the size and type of the disk arrays you need is dependent on how big the database is, and how active it is. While a cluster does offer some overhead not found in a nonclustered SQL Server, this overhead is minimal. Memory is the one item that clustering will use the most of, so when you do size RAM for your servers, be sure you not only include enough for SQL Server and its databases, but include some extra from the needs of Windows 2000 and clustering services. In most cases, 128256MB of RAM, above and beyond, what SQL Server will need. As has already been mentioned above, an Active/Active SQL Server configuration requires that each server in the cluster be oversized in order to ensure that they have enough resources to simultaneously run two instances of SQL Server under production conditions. If the nodes in your cluster will be using AWE memory (more than 4GB or RAM in each server), it is critical that both nodes have the identical amount of RAM, configured identically. If not, then failover most likely will fail.

The Shared Array The shared array is one of the most critical components of a Windows 2000 cluster. It is the disk storage that will be shared between the two nodes in the cluster. It will hold the important quorum drive (used by the Windows 2000 Clustering Service),

along with the one or more drives or drive arrays that will hold the shared databases. The shared array is connected to the two nodes in the cluster via a shared bus, either SCSI or Fibre Channel. In an Active/Active configuration, you will not only need a quorum drive, but at least two shared drives. One of these shared drives will be used by the first active node, and the second shared drive will be used by the second active node. Two nodes cannot control a single shared disk array at the same time. And since an Active/Active configuration is running two separate instances of SQL Server, each instance must have exclusive access to its own shared disk or disk array. On the other hand, any single node in a cluster can access more than one disk array at the same time, as long as that node is the exclusive owner of that one or more disk arrays. In an Active/Passive configuration, only a quorum drive and one shared disk array is required as a minimum configuration. If necessary, more than one shared disk array can be owned by the active node of a cluster. Most shared disk arrays are in the form of a self-contained disk subsystem or a Storage Area Network (SAN), connected to the nodes in the cluster with either SCSI or Fibre Card connections and cables. Based on my experience, this is the area where most of the hardware configuration problems occur. There are many different types of SCSI and Fibre equipment, and they all must be configured differently. You will want to take special care when following Microsoft's and the vendor's instructions on how to configure shared array components for your cluster. This is especially true for SCSI connections, which require appropriate SCSI IDs must be set and terminations made. Most shared disk arrays allow you to configure RAID arrays for the best fault tolerance. For best performance, select a RAID 10 configuration. If this is not available or affordable, then select RAID 5.

The Network Connection In order for the nodes in a cluster to communicate, they must talk with each other over the network. There are two different ways to enable this communication. First, they can share the same network connection (public network) as the users who access the cluster. Second, they can communicate over a private network (where the two nodes share a separate network). Nodes in a cluster need to send and receive what is called a heartbeat signal, among other communications. This signal is used by each node to determine if the other node is still available. While this heartbeat signal can be configured to go over the public network, it is better if it is configured to run over a private network. A private network is simply two network cards (one in each node of the cluster) that are connected via a pass-through network cable, or connected to each other by a hub. Personally, I prefer a direct connection using a pass-through cable because it eliminates another potential point of failure: the hub.

Using a private network produces four benefits. First, it moves the heartbeat traffic off the public network and onto the private network. This helps to reduce network congestion on the public network. Second, it helps to boost redundancy by eliminating a single point of failure. And third, it helps to boost network security. Last of all, if you don't use a private network, Microsoft will not support your cluster (it is a requirement of the HCL). For your public network connection, you may want to consider a 1GB or a 100MB network card, while a 100MB or a 10MB network card is adequate for the private network.

Using Identical Hardware Although it is not absolutely required, ideally, the nodes in your cluster should have identical hardware, drivers, and configuration. In fact, the cards in each server should ideally be placed in the same numbered slot. For example, if you put a 100MB public network card in slot 2 of node 1, then you should also put the same 100MB network card in slot 2 of node 2. Using identical hardware, drivers, and configuration produces many benefits, including: Less problems and troubleshooting. This is the main reason. Ensuring that both nodes have identical capacity to better handle production loads if failover occurs. Should you have major hardware problems and the worst happens, and can't get parts fast enough, you can always cannibalize parts in one server to get the other one working.

Use the Correct, and Latest Hardware Drivers Another big gotcha when configuring hardware is using outdated, buggy, or using the wrong driver for the node's hardware. Many types of hardware have many different variations, each often requiring their own driver versions. It can often be very difficult to determine which driver you need, where to find it, and even how to properly install it. I have seen cases where parts and drivers have been mislabeled, which really can cause headaches. So before you even begin build your servers, you will want to take time to research all of the drivers that you need, locate them, ensure that they are the latest versions, and also verify that they support Windows 2000 clustering. Just because a particular driver works after the server is build does not mean it will continue to work once Clustering Services has been installed.

If you are not careful, you could end up spending endless hours troubleshooting your cluster if you don't have all the correct drivers installed correctly. And as with hardware, be sure both nodes in the cluster use the same drivers, configured identically.

Hardware Compatibility List None of the advice given above will do you any good if the hardware you are using in your cluster is not certified by Microsoft, as an approved and tested cluster system. So what does this mean? Microsoft certifies hardware two different ways. First, individual parts can be certified to be Windows 2000 Cluster compatible. But that is not enough. Just because each of the components in your cluster have been certified individually does not mean your cluster hardware has a whole has been certified. Second, Microsoft certifies cluster hardware as total systems, and this is the certification you must get before you order the parts for your cluster. To find out what "total systems" Microsoft certifies for Windows 2000 clustering, you must go to this url: www.microsoft.com/hcl/. From here, there are two options on the screen: "Search for the following" and "In the following types." Under the option, "Search for the following," select the option, "All products," (the default), and for "In the following types, select "cluster" (not the default). Next, click on "Search now." The results of this return all of the approved hardware systems for Windows 2000 clustering. As you can see, there are a lot. But the system you want may not be there. If this is the case, then your system will not be supported by Microsoft if you run into problems. Because of this, and to reduce the chance for potential problems, you should only use approved hardware systems for your cluster. So what exactly is a "hardware system?" The best way find the answer is to look at an example from the approved cluster systems approved by Microsoft for Windows 2000. For example, when you look up this system on this web page, "Compaq ProLiant Cluster HA F100/F200 (DL380)", note that it is approved for Windows 2000 clustering. When you click on this link, a new window appears providing this information: Compaq ProLiant Cluster HA F100/F200 (DL380) Server 1: Compaq ProLiant DL380 Server 2: Compaq ProLiant DL380 SCSI/Raid Controller - Compaq StorageWorks Fibre Channel Host Adapter/P and Compaq StorageWorks 64-bit/66-Mhz Fibre Channel Host Adapter Shared SCSI/RAID Storage - Compaq StorageWorks RAID Array 4000/4100 This information compromises the "hardware system." It includes the servers acting as the nodes in the cluster, along with the tested and approved shared arrays to be used with the server nodes. So in other words, a cluster's servers and its shared array must be approved as a system. On the other hand, other devices in your servers, such as network cards or graphics cards, are not considered part of the total hardware system. Instead, they must be also on the HCL for Windows 2000.

Where To Next? As you can see from this article, selecting hardware for your Windows 2000 cluster is not an easy task. It involves much time and research to ensure you have all the right hardware, properly installed and connected. Assuming you have done this part now, the next part, installing and configuring Windows 2000, should be relatively easy. Once the hardware is properly selected and assembled, the next step is to install Windows 2000. For the most part, installing Windows 2000 on servers that will be clustered is the same as any Windows 2000 installation. The focus of this section of the the article focus on what is slightly different between installing Windows 2000 for a stand-alone server and installing Windows 2000 for a server to be clustered.

Things to Keep In Mind When Installing Windows 2000 for Clustering Before you begin installing SQL Server 2000, be sure that you have any special drivers available for your hardware. Most likely, Windows 2000 will not have all the special drivers you need for the shared disk array, and it may not have the drivers you need for your graphics card, network cards, etc. Before you begin, ensure that the shared disk array has been configured appropriately. At a minimum, there must be two logical drives available on the shared disk array. One will be used for shared disk space, and the other will be used as the Quorum drive for the Windows 2000 clustering service. The Quorum drive must be at least 50MB in size, although I generally make them much larger, as much as 1GB (disk space is cheap). Once Windows 2000 is installed, then you can format the logical drives. Before you begin, you will need to determine the following information: o Name to be assigned to node 1 of the cluster. o Public IP Address and subnet mask for node 1 of the cluster. o Private IP address and subnet mask for node 1 of the cluster. o Default gateway for node1 Public Network Card (the same as node 2). o Name to be assigned to node 2 of the cluster. o Public IP address and subnet mask for node 2 of the cluster. o Private IP address and subnet mask for node 2 of the cluster. o Default gateway for node2 Public Network Card (the same as node 1). o The IP addresses of the DNS and WINs servers to be used by the public network cards in the two nodes of the cluster. o The password you will give to the local administrators account on both nodes of the cluster. o The name of the domain you will be joining. You will either need to be a domain administrator, or have a domain administrator, add the servers to the domain. Install Windows 2000 on only one node of the cluster at a time. First, perform all of the installation steps for one node. Once that node is done, turn if off, then perform the installation of the second node. Be sure that you never have

both nodes turned on at the same time before the Clustering Service is installed. Only when the Clustering Service is installed on the second node will both nodes of the cluster be turned on at the same time. All the nodes in a cluster must be either member servers or domain controllers, you are not able to make one node a member server and the other a domain controller. For best performance, both servers in the cluster should be member servers. Install Windows 2000 and updates in this order: o Windows 2000 Advanced Server o The latest service pack o The latest version of IE and the latest IE updates. Don't install any Windows 2000 components that you will not be using, such as IIS, etc. After Windows 2000 is installed, to make your life easier, copy the /I386 folder from the installation CD to a local drive (generally the drive where Windows 2000 is installed). Also copy the service pack installation files to a folder to the same local drive. When you install the Clustering Service later, you will need to have access to both of these. When configuring the disk array from within Windows 2000, they must be configured as "basic" disks, not "dynamic" disks. Format all logical drives using NTFS. While not required, you may want to assign the logical name "q:" to the Quorum drive in order to prevent any later confusion of its purpose. Install the public network card using standard default settings. But when you install the private network card, keep the following in mind: o Don't add the IP address for DNS or WINS to the private network card's configuration. o Don't add a default gateway address for the private network cards. o Be sure that "Disable NetBIOS over TCP/IP is selected from WINs tab on the Advanced TCP/IP Settings property box. o If any of the network cards support more than one speed, such as 10/100MB network cards, you must ensure that you have manually configured them for the appropriate speed (such as 10 or 100) and for the correct duplex (half or full). Do not use the default "automatic" sensing options, as these options may not work and cause you much grief. (I learned this the hard way.) For convenience's sake, name the public network connection "Public Network," and the private network connection "Private Network." This will make it easier to identify which network is which when later installing the Clustering Service.

Once Windows 2000 is installed on both nodes of the cluster, you are ready to create the necessary service accounts and set the necessary security in Windows 2000 for preparation for installing Cluster Services and SQL Server clustering.

Establishing User Accounts and Security Necessary for Clustering

Setting up the appropriate user accounts and establishing the necessary security is one of the easiest steps when creating a cluster. It is also one of the easiest to forget. What I like to do in this step is not only set up the domain service account for the cluster service, but also to set up all of the security I need for my SQL Server cluster all at one time. This way, I won't forget later on. Don't forget that you will need to set up security on both nodes of your cluster. Ensure that security is set up identically on both. And was mentioned on the previous section, be sure only one server node is turned on at the time time when configuring security. If both nodes are turned on before Cluster Services is installed, you could corrupt the shared array.

DBA Security The first step I always take when setting up security on clustered servers is to set up the security to allow DBA's to be local administrators on the clustered nodes. Since our company already has a global domain group for this purpose, all I have to do is to add this domain group to the local administrator's group to each node in the cluster.

SQL Server Service Security Although this step is not required now (I could wait until I install SQL Server), I like to do it now so that I don't later forget. As you know, the mssqlserver and the sqlserveragent services need to have accounts to run under. In the case of a cluster, it is mandatory that the account used for this purpose be a domain account. At our company, we have a single domain account we use for this purpose. So all I have to do is to add this domain account to the local administrator's group to each node in the cluster. Be sure you don't use different domain accounts for each node, as this would cause you many heartaches later on.

Cluster Service Security The Cluster Service, which will be installed in the next step, also requires a domain account to run. As part of the Cluster Service setup procedure, you must specify the name of the account and the password. If you currently do not have a Cluster Service domain account, you must set one up in your domain, then add it to the local administrator's group of each node of the cluster.

Creating Service Accounts

Just in case you are not familiar with creating service accounts, such as the one necessary for the Cluster Service, you need to keep the following in mind when creating one: The accounts must be domain accounts, not local accounts. Be sure that the option, "User must change password at next logon" is not selected. Be sure the option, "Password never expires" is selected. Ensure that the account can logon 24 hours a day. Select a hard to break password.

Once you have security properly set up, you are now ready to proceed to the next step, which is to install the Cluster Service itself.

Installing the Windows 2000 Clustering Service


This portion of the article shows you, step-by-step, how to install the Windows 2000 Cluster Service. Not every potential option will be discussed, only the most common. This section has been divided into two web pages to make it easier to load into your web browser and read. There are many graphics, and it make take a minute or two for all of them to load. Before you begin, you must have two pieces of information necessary for the installation of Cluster Services. The are: The virtual name you will be assigning the cluster. This is the name that will be used by clients to attach to the cluster. The virtual IP address you will be assigning to the cluster. This is the IP address that will be used by clients to attach to the cluster.

The above information should be determined before you begin to install the Cluster Service. Note: If you are using a hub to connect the nodes of the private network, then you don't have to take any special steps other than to ensure than the hub is operational and that the private network card on the primary node can talk to it. If you are using a cross-over cable to connect the primary node to the secondary node's private network card, then you must turn on the secondary node, but don't boot it, in order to successfully install Cluster Service. Start the node, but when Windows 2000 starts, press the F8 key to go to the Windows 2000 Advanced Options Menu. Stay at this screen until you are ready to install the Cluster Service on the

secondary node. If you don't do this, then the private network card in the primary server won't be available and you will be unable to install the Cluster Service on it.

Installing the Cluster Service on Node 1 of the Cluster 1. Start the "Add/Remove Programs" option in Control Panel, and then select "Add or Remove Windows Components." The following screen appears:

2. From this screen, check the box to the left of "Cluster Service" and then click on the "Next" button. The following screen appears:

3. How you respond to this screen depends on how you want to configure Terminal Services on this server. In this example, we don't want to make any changes to the current settings, so we just select "Next". The following screen appears:

4. This screen tells you that Windows 2000 is now installing the Cluster Service files. But before this screen completes, this screen may appear:

5) This screen is asking you to insert the CD with the Windows 2000 Service Pack files on them. You may either put the CD in the CD drive on the server and click "OK," or if the Service Pack 2 files are locally stored, as in this example, just click "OK".

6) If the install program can't find Service Pack 2 files in the CD player, it will prompt you (as above) for the location of these files. Use the "Browse" button to find the files on the current server, or on a remote server, then click "OK." Next, the following screen may appear:

7) This screen is asking you to insert the CD with the Windows 2000 Advanced Server files on them. You may either put the CD in the CD drive on the server and

click "OK," or if the Windows 2000 Advanced Server files are locally stored, as in this example, just click "OK".

8) If the install program can't find the Windows 2000 Advanced Server files in the CD player, it will prompt you (as above) for the location of these files. Use the "Browse" button to find the files on the current server, or on a remote server, then click "OK". After a few moments, all of the necessary files should have been copied and this screen should now appear:

9) Now that all of the files for the Cluster Service have been copied to your server, it is now time to begin configuring the Cluster Service using the Cluster Service Configuration Wizard. Click "Next" to begin the wizard, and this screen appears:

10) In this screen, Microsoft wants you to confirm that you understand that Microsoft does not support hardware configurations not listed in the Cluster category of the Microsoft Hardware Compatibility List. Click on "I Understand," then "Next," to continue. This screen appears:

11) Now you have to specify if the server you are installing the Cluster Service on is the first node in the cluster (in this case, it is), or if it is "The second or next node in the cluster." Since this is the first node, be sure the "The first node in the cluster" option is selected and click "Next." Then this screen appears:

12) Now you must specify the virtual name of the cluster you are creating. Enter a name and click "Next." In our case, the cluster name is "clusternode1." Then this screen appears:

13) When Cluster Service is installed, it is installed as a service on your server. Because of this, you must assign a domain account to the service, which is used by the Cluster service to log into the operating system. This account must belong to the local administrators group of the server. The account we use in this example is "clusterservice." You must enter both the name of this domain account, along with its password and domain name. Click "Next" to continue, and this screen appears:

14) This screen is used to specify to the Cluster Service which drives on the shared array will be managed by the cluster. By default, all drives on the shared array are listed under "Managed disks" (see above). If this is what you want, and in our case, it is, then you need only click on "Next" to continue. But, should you not want all of the drives on the shared array to be managed by this cluster, then you must highlight the disk(s) and then click "Remove" to move them to the "Unmanaged disks" window, and then click "Next." After clicking "Next," this window appears:

15) Now you must tell the Cluster Service which disk on the shared array will be used as the Quorum drive. This is the drive used by Cluster Service to store its checkpoint and log files, which are used by the Cluster Service to communicate between nodes of the cluster. Select the appropriate drive from the drop-down list under "Disks." Note here that we have made our life easy by giving the Q: drive the name "Quorum" previously using the Disk Administrator. Of course, any drive letter and name could be used. Once you are done, click "Next," and the following screen appears:

16) This screen is informational only. It recommends that besides the public network used to access the cluster by clients, that you also have a private network that can be used by the nodes in the cluster to communicate. This is the approach we will take here. After reading the message, click "Next," and the following screen appears.

17) The Cluster Configuration Wizard is now asking you to specify which network card to use for the Private Network in the cluster. Note that the "Network name" above is "Private Network." It is named this because when we originally configured the private network card, we give this connection the name "Private Network." We did this to make this step in the Cluster Server configuration process easier. Be sure that the checkbox next to "Enable this network for cluster use" is selected. This must be selected; otherwise the cluster would not use this network card. Under the option, "This network performs the following role in the cluster," the radio button next to "Internal cluster communications only (private network) is selected. This means that the internal network can only be used by cluster service, which is the only option that makes sense. Note that this screen will appear for every network card in your server. In most cases, such as this one, there are only two network cards in the server, and so this screen will only appear twice, as shown next. Once all the options are set, click "Next," and this screen appears:

18) This is similar to the previous screen, but now, we are configuring the public network. Under "Network name," the name "Public Network" is there because that is the name we gave it when we configured this network card when we installed Windows 2000. "Enable this network for cluster use" must be selected. Under the option, "This network performs the following role in the cluster," the radio button next to "All communications (mixed network) is selected. Although you could choose "Client access only (public network), we chose "All communications (mixed network) instead because this provides additional redundancy should the private network fail. For example, since this option is selected, should the private network fail, Cluster Service will still be able to communicate via the public network if need be. Once all the options are set, click "Next," and this screen appears:

19) Because we specified that the Public Network was mixed, the Cluster Configuration Wizard needs to know which network should be used as the primary network for internal communications, and which one to be used as the backup network should the primary network fail. The primary network must appear first in this list, and the backup network second. If they are not correct, you can move then with the "Up" and "Down" options. Now click "Next," and the following screen appears:

20) Now you must tell the Cluster Service Configuration Wizard the IP address that is to be used as the virtual cluster IP address, which is the IP address used by clients to connect to the cluster. You must also specify the appropriate subnet mask. Next to "Network," you must select the name of the public network, which in our case, is "Public Network," which is the network to be used by clients to access the cluster. When you are done, click "Next," and this screen appears:

21) The primary node has now been configured and you are ready to finish this task. Click on "Finish," then after a few more seconds of activity on the screen, the Cluster Service will start automatically, testing to see if it can start. If it can, then this screen appears:

After you click "OK," and a few seconds longer, this screen appears:

22) Now click "Finish," and you are done installing Cluster Service on the primary node. To see if Cluster Service is running successfully, start the Cluster Administrator to see if the cluster resources are online. You can also check the Event Viewer for any potential problems that arose during the installation

To learn how to install the Cluster Service on Node 2


Once the primary node of the cluster has had Cluster Service installed on it, you can now install Cluster Services on the secondary node. Before you start, be sure that the primary node is up and running and that Cluster Service is up and running successfully. 1. On node 2, start the "Add/Remove Programs" option in Control Panel, and then select "Add or Remove Windows Components." The following screen appears:

2. From this screen, check the box to the right of "Cluster Service" and then click on the "Next" button. The following screen appears:

3. How you respond to this screen depends on how you want to configure Terminal Services on this server. In our example, we don't want to make any changes to the current settings, so we just select "Next". The following screen appears:

4) This screen tells you that Windows is now installing the Cluster Service files. But before this screen completes, this screen may appear:

5) This screen is asking you to insert the CD with the Windows 2000 Service Pack files on them. You may either put the CD in the CD drive on the server and click "OK," or if the Service Pack 2 files are locally stored, as in this example, just click "OK".

6) If the install program can't find Service Pack 2 files in the CD player, it will prompt you (as above) for the location of these files. Use the "Browse" button to find the files on the current server, or on a remote server, then click "OK". Next, the following screen may appear:

7) This screen is asking you to insert the CD with the Windows 2000 Advanced Server files on them. You may either put the CD in the CD drive on the server and click "OK," or if the Windows 2000 Advanced Server files are locally stored, as in this example, just click "OK".

8) If the install program can't find the Windows 2000 Advanced Server files in the CD player, it will prompt you (as above) for the location of these files. Use the "Browse" button to find the files on the current server, or on a remote server, then click "OK."

After a few moments, all of the necessary files should be copied and this screen should now appear:

9) Now that all of the files for the Cluster Service have been copied to your server, it is now time to begin configuring the Cluster Service using the Cluster Service Configuration Wizard. Click "Next" to begin the wizard, and this screen appears:

10) In this screen, Microsoft wants you to confirm that you understand that Microsoft does not support hardware configurations not listed in the Cluster category of the Microsoft Hardware Compatibility List. Click on "I Understand," then "Next," to continue. This screen then appears:

11) Because we are now adding the second node to the cluster, we want to select the option "The second or next node in the cluster," in order to join the cluster we have already created. Then click "Next," and this screen appears:

12) To add this node to the current cluster, you must enter the name of the cluster, along with the name of the domain account used for the Cluster Service, along with its password. Once you have done this, click "Next," and this screen appears:

13) You must now reenter the password of the domain account you are using as the Cluster Server service account. Click "Next," and this screen appears:

14) That's all there is to adding a node to a current cluster. Click "Finish." After a few seconds, the Cluster Service starts and this screen appears:

15) Click "OK," and this screen appears.

16) Click "Finish," and you are done. To verify that the Cluster Service is working correctly on this secondary node, start Cluster Administrator to verify that you can see both nodes of the cluster

Testing and Verifying the Windows 2000 Clustering Service


The worst is now over. If you have successfully gotten this far, most likely the Windows 2000 Cluster Service is ready to be used. But since I trust a computer installation only as far as I can throw it (which is not very far), it is a good idea to test the Cluster Service installation to see if it is really working as it should. This section takes a look at several tests you can perform to ensure that Cluster Service is really doing its job. In most cases, if there is any problem with your cluster, these tests will find them. What the problems are, and how to fix them, are beyond the scope of this article. If your cluster passes all of these tests with flying colors, then the odds are very strong your cluster will not have any future problems (although this can never be guaranteed).

Test Number 1: Moving Groups

The first test we will perform is very simple. What we will do is to move the current resources (including the Cluster Group and any Disk Groups) that were created with Cluster Service was installed, from the active cluster node to the inactive cluster node. In your cluster, the two nodes can be divided into an active node (is in control of the cluster's resources) and an inactive node (not in control of any cluster resources). If you are a clustering expert, then you know I have over-simplified this explanation, but this explanation is good enough for this testing. After the Cluster Service has been installed on both nodes of the cluster, one of the nodes will be in control of all the default cluster groups (the active node) and the other node will not have any cluster groups assigned to it (the inactive node). The resources found on the active node, by default, include what is called the "Cluster Group" and the "Disk Group". (There may be one or more Disk Groups, depending on how your shared disk array has been configured. In this example, we will assume there is only one Disk Group.) The Cluster Group generally contains these cluster resources: Cluster IP Address (the virtual IP address of the cluster). Cluster Name (the virtual name of the cluster, used by clients to access the cluster). Disk Q: (the quorum disk, may or may not be labeled Q:)

The Disk Group generally contains a single disk resource, a drive letter, which refers to a logical drive on the shared disk array. If you have more than one logical drive as part of the shared disk array, then there will be a separate Disk Group for each logical drive available. Now that we got all that out of the way, let's begin our first test to see if the cluster is functioning properly. Our goal in this test is to see if we can manually move both default cluster groups from the active node in the cluster to the inactive node, and then reverse our steps so that the cluster groups return to their original location on the active cluster. Here's how: 1. Start Cluster Administrator. 2. In the Explorer pane at the left side of the Cluster Administrator, open up the "Groups" folder. Inside it you should see the Cluster Group and the Disk Group groups. 3. Click on "Cluster Group" to highlight it. In the right pane of the screen, you will see the cluster resources that make up this group. Note the "Owner" of the resources. This is the name of the active node. 4. Each of the groups must be moved to the other node, one at a time. First, right-click on "Cluster Group," then select "Move Group." As soon as you do

this, you will see the "State" change from "Online" to "Offline pending" to "Offline" to "Online pending" to "Online." This will happen very quickly. Also note that the "Owner" changes from the active node to the inactive node. 5. Now do the same for the "Disk Group."

6. Assuming there are no problems, both groups will have moved to the inactive
node, which, in effect, has now become the active node. Once both nodes have been moved, look in the Event Viewer to see if any error messages were generated. If everything worked correctly, there should be no error message. 7. Now, move both groups back to the original node by repeating steps four through six above. This is a very basic test, but it helps to determine if the cluster is working as it should. The following tests are slightly more comprehensive, helping you to root out any other potential problems.

Test Number 2: Initiate Failure This test is very similar to the one above, except we are pretending to failover the nodes. In effect, we are manually simulating a failover. Here's how to perform this test: 1. Start Cluster Administrator. 2. In the Explorer pane at the left side of the Cluster Administrator, open up the "Groups" folder. Inside it you should see the Cluster Group and the Disk Group groups. 3. Click on "Cluster Group" to highlight it. In the right pane of the screen, you will see the cluster resources that make up this group. Note the "Owner" of the resources. This is the name of the active node. 4. Now, right-click on the "Cluster IP Address" resource in the right pane of the window, the select "Initiate Failure." What this does is to tell Cluster Service that the virtual IP address has failed. 5. After you select this option, you will notice some activity under "state," but that fairly quickly the resource returns to an "Online" status and that the "Owner" has not changed. It appears as if no failover has occurred. And you are correct. No failover has occurred. Believe it or not, this is normal and to be expected. This is because Cluster Services will try to restart a failed resource up to three times before it actual fails over (this number can be changed). So to actually initiate a failover, you must redo step number four above a total of four times before an actual failover occurs. When failover occurs, you will also notice that all of the resources in the "Cluster Group" also failover.

6. Now if you click on the "Disk Group," you will notice that your disk resource did not fail over. This is also normal. This is because a failover will only force dependent resources to failover as a group, and the "Cluster Group" we failed over earlier is not dependent on the "Disk Group," so it did not fail over. To fail over the disk group, right-click on the disk resource in the right pane of the window, and select "Initiate failure." You will have to do this a total of four times in order to failover the disk resource to the other node. 7. Now that you have done, reverse your steps, and failover the "Cluster Group" and the "Disk Group" back to the original node. Like the previous test, check out the Event Viewer logs to see if any error messages occurred. If everything worked as expected, you are ready for the next test.

Test Number 3: Turn Off Each Node While the first two tests were performed from the Cluster Administrator, the next three tests are more real world. In this test, you will first need to ensure that all of the default groups are located on one of the two nodes. Then you will physically turn off (flip the switch) the active node (first node). If you are watching the cluster groups from the Cluster Administrator from the inactive node after turning off the first node, you should see a failover occur and the resources should be automatically failed over to the second node. Check the Event Log for any potential error messages after this occurs. Once you have checked for any potential problems, turn the node on that was turned off earlier (node 1) and wait until it fully boots. You will note that turning on the node that was turned off does not cause the cluster to fail back. The cluster resources will remain on the second node until you force them to return to the first node. Now turn off the node with the active groups (second node), repeating what you did earlier with the first node. As before, you can use the Cluster Administrator from node 1 to watch the groups fail over to the first node. Check the Event Log for any potential error messages. Once the groups fail back to the first node, turn the second node back on, and wait until it boots up fully. This is a very good test to see if failover will work in the real world. If no problems arose from this test, then you are ready for the next test.

Test Number 4: Break Network Connectivity

This test is similar in concept as the above test. What we want to do is force a fail over. But instead of simulating a computer failure, we will be simulating a networkrelated error. From the node that has the default resource groups (the first node), remove the network cable from the public network card. This will simulate a failure of the first node, and should initiate a failover to the second node. If you are watching the cluster groups from the Cluster Administrator from the second node, you should see a failover occur and the resources should be automatically failed over. Check the Event Log for any potential error messages. Once you have checked for any potential problems, plug the network cable back into the first node, and then remove the network cable from the public network card on the second node. As before, you can use the Cluster Administrator to watch the groups fail over to the first node. Check the Event Log for any potential error messages. Once you are done, plug the network cable back into the public network card on the second node. If no problems arose from this test, then you are ready for the next.

Test Number 5: Break Shared Array Connectivity This test is designed to help uncover potential issues with the shared disk array. I have seen clusters pass all of the above four tests, but fail this one if the shared disk array is not configured 100% correct. This test is designed to simulate what would happen if the controller card or cable connected from a node to the shared disk array fails. From the node that has the default resource groups (he first node), remove the cable from the card used to connect to the shared array. This will simulate a failure of the first node, and should initiate a failover to the second node. If you are watching the cluster groups from the Cluster Administrator from the second node, you should see a failover occur and the resources should be automatically failed over. Check the Event Log for any potential error messages. Once you have checked for any potential problems, plug the cable back into the first node, and then remove the cable from the card used to connect to the shared array on the second node. As before, you can use the Cluster Administrator to watch the groups fail over to the first node. Check the Event Log for any potential error messages. Once you are done, plug the cable back into the appropriate card.

You might also like