Professional Documents
Culture Documents
This is the first of a series of articles that will demonstrate how to create a Linux cluster using Red Hat/CentOS 5.5 distribution. When we created our first cluster at the office, we were searching for some Red Hat cluster setup information on the internet. To my surprise, we could not find a lot of article or user experience on that topic. So I hope, this series of articles will benefit to the community of users trying or wanting to create their own cluster.
The Setup
Our cluster will be contains 3 servers, we will have 2 actives servers and one passive. For our example, we will have one server running an HTTP service, the second server will be an FTP server. The third server will be use as a fail-over server, if the first or second server have network, SAN, or hardware problem, the service it is running will move the third server. Although not required, having a passive server offer some advantage, first it make sure that if one server have problem the passive server will be able to handle the load of any of your server. If we did not have that passive server we would need to make sure that either of the server would be capable of handling the load of the two servers on one server. Clustering environment offer some other advantage when time come to do hardware update on a server. Let say we need to add memory to the first server, we could move the HTTP service from one server to the passive node, add memory to the first server and then move back to the service on the original node when ready. Another advantage of having a passive server is that you can update the OS of your node one by one without affecting the service (if reboot is necessary). So the name of our node will be bilbo the will host the http service, gollum the will host the FTP service and gandalf will be our passive node. As you can see below image, each server use 3 network cards. The first NIC (eth0), is use to offer the service HTTP and FTP to the users and it is the host main network card. The second network card (eth1) is used by cluster software for his heartbeat. In a corporate environment, this network should be isolated from the rest of your network. There will be a lot of broadcast on it and the response time is important. The ILO network will be used by the cluster software to remotely poweron/poweroff server in our cluster, when a problem arise.
# 127.0.0.1 # # Host Real IP 192.168.1.103 192.168.1.104 192.168.1.111 # # Service Virtual IP 192.168.1.204 192.168.1.211 # # HeartBeat IP Address 10.10.10.103 10.10.10.104 10.10.10.111 # # HP ILO IP Address 192.168.1.3 192.168.1.4 192.168.1.11 # ilo_gandalf.maison.ca ilo_gollum.maison.ca ilo_bilbo.maison.ca ilo_gandalf ilo_gollum ilo_bilbo hbgandalf.maison.ca hbgollum.maison.ca hbbilbo.maison.ca hbgandalf hbgollum hbbilbo ftp.maison.ca www.maison.ca ftp www gandalf.maison.ca gollum.maison.ca bilbo.maison.ca gandalf gollum bilbo localhost.localdomain localhost
We should define all the ip address that our cluster need to function properly in your /etc/hosts. Of course they should also be define in your DNS, but in case your DNS does not respond, the /etc/hosts will assure you that the cluster will continue to work properly. We will use the domain name maison.ca through out these articles.
# grep -i locking_type /etc/lvm/lvm.conf locking_type = 0 # lvmconf --enable-cluster # grep -i locking_type /etc/lvm/lvm.conf locking_type = 3
# system-config-securitylevel Disable the SELinux need reboot and Disable the firewall
We now we can start the cluster services manually or reboot the server. Start Cluster Services.
New cluster config warning The first time you run the cluster configuration GUI, a warning message may display. It is just informing us that the cluster configuration file /etc/cluster/cluster.conf was not found. Click on the Create New Configuration button and lets move on.
General cluster setting: Next we need to enter the name of our cluster, we have choose to name it our_cluster. The name of a cluster cannot be change. The only way to change the name of the cluster is to create a new one, so choose it wisely. We use the recommend Lock Manager (Dynamic Lock Manager), the GULM is depreciated.
The multicast address we used for the heartbeat will be 239.1.1.1. The usage of the Quorum Disk is outside the scope of this article. But basically, you define a small disk on the SAN that is shared among the nodes in the cluster and node status is regularly written to that disk. If a node hasnt updated its status for a period of time, it will be considered down and the cluster will then fence that node. If you are interested to use the Quorum Disk there is an article here that describe how to set it up.
Cluster Properties
Now lets check some of the default setting given to our cluster. Click on Cluster on the left hand side of the screen and then click on the Edit Cluster Properties. The Configuration Version value is by default to 1 and automatically incremented each time you modify your cluster configuration. The Post-Join Delay parameter is the number of seconds the fence daemon (fenced) waits before fencing a node after the node joins the fence domain. The Post-Join Delay default value is 3. A typical setting for Post-Join Delay is between 20 and 30 seconds, but can vary according to cluster and network performance. The Post-Fail Delay parameter is the number of seconds the fence daemon (fenced) waits
before fencing a node (a member of the fence domain) after the node has failed. The Post-Fail Delay default value is 0. Its value may be varied to suit cluster and network performance. CHANGE THE POST FAIL DELAY to 30 Seconds.
Enter the name of the host name used for the heartbeat, for our first node it will be hbbilbo.maison.ca. Remember that this name MUST be defined in our hosts file and in your DNS (if you have one).
Quorum is a voting algorithm used by the cluster manager. We say a cluster has quorum if a majority of nodes are alive, communicating, and agree on the active cluster members. So in a thirteen-node cluster, quorum is only reached if seven or more nodes are communicating. If the seventh node dies, the cluster loses quorum and can no longer function. For our cluster we will leave the Quorum Votes to the default value of 1. If we would have a two-node clusters, we would need to make a special exception to the quorum rules. There is a special setting two_node in the /etc/cluster.conf file that looks like this: <cman expected_votes=1 two_node=1/> Repeat operation for every node you want to include in the cluster