Professional Documents
Culture Documents
High availability is: - The masking or elimination of both planned and unplanned downtime. - The elimination of single points of failure (SPOFs) - Fault resilience, but not fault tolerant.
The failure of any component of the solution, be it hardware, software or system management, will not cause the application and its data to be inaccessible to the user community. High availability solutions do fail, Fault Tolerant solutions should not fail. High availability solution should be to achieve continuous availability, i.e., no downtime. We must not only implement High Availability solution, but also reduce our planned downtime through disciplined and documented change management.
The standalone system may offer limited availability benefits: Journalled Filesystem Dynamic CPU Deallocation Service Processor Redundant Power Redundant Cooling ECC Memory Hot Swap Adapters Dynamic Kernel
Single points of failure: o Operating System o Network o Network Adapter o Node o Disk o Application o Site Failure The enhanced system may offer increased availability benefits: Redundant Data Paths Data Mirroring Hot Swap Storage Redundant Power for Storage Arrays Redundant Cooling for Storage Arrays Hot Spare Storage Single points of failure: o Operating System o Application o Network o Network Adapter o Node o Site Failure Clustering technologies offer High Availability: Redundant Servers Redundant Networks Redundant Network Adapters Heartbeat Monitoring Failure Detection Failure Diagnosis Automated Fallover Automated Reintegration Single points of failure: o Site Failure o Application
Benefits of High Availability Solutions: Standard Components (no specialized hardware) Can be built from existing hardware (no need to invest in new kit) Works with just about any application Works with wide range of disk and network types No specialized operating system or microcode Excellent availability at low cost HACMP is largely independent of the disk type , network and application chosen. o o o o o High Availability solutions require the following : Thorough design and detailed planning Selection of appropriate hardware Disciplined system administration practices Documented operational procedures Comprehensive testing
A High Availability solution based upon HACMP provides automated failure detection , diagnosis , recovery and reintegration. The highly available solution will include: AIX Operating System, HACMP for AIX , customized enhancements, Cluster Proven applications and of course a plan for the design and testing.
Hardware Prerequisites:
All pSeries systems will work with high availability, in any combination of nodes within a cluster, however a minimum of 4 free adapter slots is recommended ( 2 for network adapters and 2 for disk adapters) . Any other adapters (graphics adapters) will occupy additional slots. The internal Ethernet adapter should not be included in the calculations. Even with 4 adapter slots free, there will be a single point of failure as the cluster will only be able to accommodate a single TCP/IP local area network between the nodes.
HACMP Features:
1) Availability using: - Cluster concept - Redundancy at component level (standby adapters) - AIX: LVM (JFS, disk mirroring) , SRC, Error Notify Event (fault) Detection - network adapter, network, or node Automatically triggered or customized Event Recovery - adapter swap, fallover or notification of network down CSPOC: Tools for global changes across all nodes - create AIX users, passwords, LVM components (VG, LV, JFS) DARE (Dynamic Automatic Reconfiguration Event) - Make HACMP changes without stopping the application Monitoring using: - HACMP commands, HAview, or HAtivoli, pager support
2) 3) 4) 5) 6)
HACMP is Not the Right Solution If... 1) You cannot suffer any downtime - Fault tolerance is required - 7 x 24 operation is required - Life critical systems 2) Your environment is insecure - Users have access to root password - Network security has not been implemented 3) Your environment is unstable - Change management is not respected - You do not have trained administrators - Environment is prone to 'user fiddle factor' HACMP will never be an out-of-the-box solution to availability. A certain degree of skill will be always required.
Other Failures:
1) Disk Drive failure LVM mirroring , RAID 2) Other Hardware failures No direct HACMP support. HACMP for AIX provides SMIT interface to AIX Error Notification Facility. Trap on specific errors. Execute command in response to error. 3) Application failures 4) HACMP failure Promoted to node failure. 5) Power failure Avoid common power supplies across replicated devices / Use UPS.
Cluster Resources :
Applications Disk Drives Volume Groups File Systems NFS File Systems IP Addresses
Disk Crash
1) Data replicated through LVM mirroring 2) Data replicate on RAID.
Network Fails:
1) HACMP provides notification and runs any user defined scripts. - HACMP detects a fault. - Standard event script does not contain any actions. - Network takeover only possible with customizing. - Behavior of application depends on infrastructure.
Machine Fails:
1) Workload ( resources ) moved to surviving node. 2) TCP/IP address moved from failed to surviving node. 3) Users login again , using same host name. What you lose: 1) Work in progress. 2) Any data not cached to disk. 3) All process state.
When a cluster is configured to use IPAT , an additional network adapter must be defined. This is known as a Boot Adapter. When a failed node recovers, it cannot boot on the Service IP address if this has been acquired by another node in the cluster. For this reason, the failed node needs to boot on a unique IP address which is not used elsewhere in the cluster. This ensures that there is no IP address duplication during reintegration.
Configuring IPAT:
IPAT is only required on rotating resource groups and is optional on CASCADING. It is not supported on concurrent resource groups. On all nodes, prepare security and name resolution: 1. Add an entry for the boot IP label in to /etc/hosts on each node. 2. Add the boot IP label to /.rhosts on each node. 3. Use FTP or rdist to keep these files in sync' and minimise human error. On the node that will have its service IP address taken over: 4. Change the IP address that is held in the ODM to that of the boot IP address by using smit chinet. This will cause cfgmgr to read the 'boot' address at system startup. On any node, update cluster configuration: 5. Add the boot adapter definition to the cluster topology, for that node which will have its service IP address taken over. 6. Synchronise the topology (you will get a warning message). 7. Now add the service IP label of the node to be taken over to a resource group. 8. Take a snapshot of your modified topology and update your cluster planning worksheets.
Some adapter types are very specific about the numbering of the first two digits in an LAA. Token-ring and FDDI in particular. Must start with 42 for Token-Ring and 4,5,6 or 7 for FDDI in the first octet of the first byte. Always check the documentation provided with the adapter and the HACMP manuals. Token-Ring adapters will not release the LAA if AIX crashes. AIX must be set to reboot automatically after a system crash (smit environment). Install the HACMP software from the HACMP CD cluster.adt cluster.base cluster.cspoc cluster.man cluster.taskguides cluster.vsm clstrmgr clsmuxpd ( works on SNMP) cllockd for concurrent access clinfo required for IPAT ( hardware MAC takeover )
HACMP Daemons:
clstrmgr and clsmuxpd daemons are mandatory. The other two are optional. 1) Cluster Manager (clstrmgr) : Runs on all cluster nodes. Tracks cluster topology. Tracks network status. Externalize failure events.
Cluster Manager has four functional pieces Cluster Controller (CC) , Event Manager (EM) , NIM Interface Layer (NIL)and Network Interface Module (NIM).
10
CC , EM and NIL are all part of the clstrmgr executable. NIMs are separate executables one for each network. The Cluster Controller reads on the start the information out of the ODM the Network Interface Layer is controlling the Hardware using the Network Interface Modules. The Event Manager is handling the Event Scripts and communicates to the clsmuxpd and cllockd. The Cluster Controller performs a number of coordinating functions: Retrieves cluster configuration information from the HACMP ODM object classes at startup and during a refresh or DARE operation. Establishes the ordering of cluster neighbours for the purpose of sending keep alive packets. Tracks changes to the cluster topology. Receives information about cluster status changes from the NIM via the NIL. Queues events in response to status changes in the cluster. Handles node isolation and partitioned clusters. The NIL provides a common interface between the Cluster Controller and one or more NIMs. This allows NIMs to be developed for new adapter hardware without rewriting the cluster manager. It tells the NIMs the appropriate keep alive and failure detection rates for each network type as mentioned in the ODM. It starts the appropriate NIMs for the network types that have been defined in the HACMP classes of the ODM. It gives the NIMs a list of the IP addresses or /dev files to send keep alive to. It restarts the NIMs if they hang or exit.
11
The NIMs are the contact point between HACMP and the network interfaces. The NIMs send and receive keep-alive and message information. They detect network related failures. They are provided for each supported network type including a generic one. The Event manager performs the following functions: It starts the appropriate event scripts in response to status change in the cluster. It sets the required environment variables. It communicates with clsmuxpd and cllockd when required. It starts the config_too_long event if any event does not exit 0 within 6 minutes.
Event Manager causes event scripts to execute . Primary events ( such as node_up, node_up_complete , node_down , node_down_complete, etc.) are called directly by the cluster manager. Sub events ( such as node_up_local , node_up_remote , node_down_remote , node_down_local , etc. ) are called by primary events. 1) 1) 1) Cluster SNMP Agent (clsmuxpd): Receives information from Cluster Manager. Maintains the HACMP enterprise specific MIB. Provides information to SNMP. Cluster Lock Manager (cllockd): Cluster wide advisory level locking. CLM locking API. Unix locking API. Only for processes running on cluster. Cluster Information Services (clinfo): Optional on both cluster nodes and cluster clients. Provides cluster status information to clients. Clinfo API allows for cluster aware applications. - Transmitted over all interfaces known to HACMP - Direct I/O (non-IP networks) or UDP packets - Three adjustable transmission rates (fast, normal, slow) - If a failure rate is exceeded, an event is triggered
12
13
Dynamic Reconfiguration :
HACMP provides a facility that allows changes to cluster topology and resources to be made while the cluster is active. This facility is known as DARE or to give it it's full name "Dynamic Automatic Reconfiguration Event". This requires 3 copies of the HACMP ODM. Default Configuration Directory (DCD) which is updated by SMIT/command line /etc/objrepos Staging Configuration Directory (SCD) which is used during reconfiguration /usr/sbin/cluster/etc/objrepos/staging. Active Configuration Directory (ACD) from which clstrmgr reads the cluster configuration /usr/sbin/cluster/etc/objrepos/active DARE allows changes to be made to most cluster topology and nearly all resource group components without the need to stop HACMP, take the application offline or reboot a node. All changes must be synchronised in order to take effect.
logform /dev/<lv name> 7) Create a logical volume mklv -t jfs -y <lv name> <vg name> <size> 8) Create file system crfs -v jfs -d /dev/<lv name> -m /<mount point> 9) Turn off the vg using varyoffvg <vgname> 10) Import the VG in the other node with same major number importvg -V 44 -y <vgname> <pv name> 11) Off the auto varry on feature with command chvg -a n <vg name> 12) Turn off the VG using varyoffvg <vgname>
15
Step 6 - Configure the application start and stop scripts - Application servers Step 7 - Define the cluster resources and resource groups mounts Step 8 - Synchronize the cluster resources Step 9 - Test the cluster -Verification performed automatically - Including application tests -File systems, IP addresses, exports and NFS
16
Difference between HACMP and HACMP/ES: - HACMP/ES uses RSCT (Reliable Scalable Cluster Technology) - User defined events based on RSCT - Application Monitoring - Recovery from resource group acquisition failure. - Dynamic node policy - Selective Fallover - Plugins In 4.4.1, plugins are provided to help configure the following services: - DNS - DHCP - Print services The plugins add the server application and an application monitor to an existing resource group. - Process Monitoring via provided plugin scripts - Users must create their own scripts/programs for Custom Monitoring
17
HACMP Commands
rdist -b -f /etc/disfile1 binary mode To distribute the files in disfile1 to all nodes in disfile1 in
Sample entry for disfile1 HOSTS = ( root@node1 root@node3 ) FILES = ( /etc/passwd /etc/security/passwd) ${FILES} -> ${HOSTS} clstart -m -s -b -i -l clstop -f -N clstop -g -N clstop -gr -N cldare -t cldare -t -f cldare -r cldare -r -f clverify cllscf cllsclstr cllsnode cllsnode -i node1 cllsdisk -g shrg cllsnw cllsnw -n ether1 cllsif cllsif -n node1_service cllsvg cllsvg -g sh1 cllslv cllslv -g sh1 cllsdisk -g sh1 cllsfs cllsfs -g sh1 cllsnim cllsnim -n ether cllsparam -n node1 cllsserv claddclstr -i 3 -n dcm claddnode claddnim To start cluster daemons (m-clstrmgr, s-clsmuxpd, b-broadcast message, -i-clinfo, -l cllockd) To force shutdown cluster immediately without releasing resources To do graceful shutdown immediately with no takeover To do graceful shutdown immediately with takeover To sync the cluster toplogy To do the mock sync of topology To sync the cluster resources To do the mock sync of resources cluster verification utility To list clustur topology information To list the name and security level of the cluster To list the info about the cluster nodes To list info about node1 To list the PVID of the shared hard disk for resource group shrg To list all cluster networks To list the details of network ether1 To list the details by network adapter To list the details of network adapter node1_service To list the shared vgs which can be accessed by all nodes To list the shared vgs in resource group sh1 To list the shared lvs To list the shared lvs in the resource group sh1 To list the PVID of disks in the resource group sh1 To list the shared file systems To list the shared file systems in the resource group sh1 Show info about all network modules Show info about ether network module To list the runtime parameters for the node node1 To list all the application servers To add a cluster definition with name dcm and id 3 To add an adapter To add network interface module
18
claddgrp -g sh1 -r cascading -n n1 n2 To create resource group sh1 with nodes n1,n2 in cascade claddserv -s ser1 -b /usr/start -e /usr/stop Creates an application server ser1 with startscript as /usr/start and stop script as /usr/stop clchclstr -i 2 -n dcmds To change cluster definitions name to dcmds and id to 2 clchclstr -s enhanced To change the clustur security to enhanced clchnode To change the adapter parameters clchgrp To change the resource group name or node relationship clchparam To change the run time parameters (like verbose logging) clchserv To change the name of app. server or change the start/end scripts clrmclstr clrmgrp -g sh1 clrmnim ether clrmnode -n node1 clrmnode -a node1_svc clrmres -g sh1 clrmserv app1 clrmserv ALL clgetactivenodes -n node1 To remove the cluster definition To delete the resource group sh1 and related resources To remove the network interface module ether To remove the node node1 To remove the adapter named node1_svc To remove all resources from resource group sh1 To remove the application server app1 To remove all applicaion servers To list the nodes with active cluster manager processes from cluster manager on node node1 clgetaddr node1 returns a pingable address from node node1 clgetgrp -g sh1 To list the info about resource group sh1 clgetgrp -g sh1 -f nodes To list the participating nodes in the resource group sh1 clgetif To list interface name/interface device name/netmask associated with a specified ip label / ip address of a specific node clgetip sh1 To get the ip label associated to the resource group clgetnet 193.9.200.2 255.255.255.0 To list the network for ip 193.9.200.2, netmask 255.255.255.0 clgetvg -l nodelv To list the VG of LV nodelv cllistlogs To list the logs clnodename -a node5 To add node5 to the cluster clnodename -o node5 -n node3 To change the cluster node name node5 to node3 clshowres Lists resources defined for all resource group clfindres To find the resource group within a cluster xclconfig X utility for cluster configuration xhacmpm X utility for hacmp management xclstat X utility for cluster status
19
20
HACMP Configuration Exercise : Scenario : Connecting machines in Cascading Resource Group , so as to operate for IP address takeover , NFS availablility and Application takeover. 1. Create .rhosts file on all nodes which are going to be part of HACMP . The file should be in root directory and contain name of boot , standby and service adapters . # cat .rhosts cws node3 node1 # End of generated entries by updauthfiles script node1_boot node1_svc node1_stby node3_boot node3_svc node3_stby 2. # smit hacmp A.) Define a cluster HACMP for AIX Move cursor to desired item and press Enter. Cluster Configuration Cluster Services Cluster System Management Cluster Recovery Aids RAS Support
Cluster Configuration Move cursor to desired item and press Enter. Cluster Topology Cluster Security Cluster Resources Cluster Snapshots Cluster Verification Cluster Custom Modification Restore System Default Configuration from Active Configuration
21
Cluster Topology Move cursor to desired item and press Enter. Configure Cluster Configure Nodes Configure Adapters Configure Network Modules Show Cluster Topology Synchronize Cluster Topology
Configure Cluster Move cursor to desired item and press Enter. Add a Cluster Definition Change / Show Cluster Definition Remove Cluster Definition
Add a Cluster Definition Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] * Cluster ID * Cluster Name [10] [dcm] #
B.) Add Participating nodes # smit hacmp - Cluster Configuration - Cluster Topology - Configure Nodes - Add Cluster Nodes Make entries for participating nodes . The names may not be related to /etc/hosts file. Can be any name.
22
Add Cluster Nodes Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] * Node Names [node1 node3]
B.) Create entries for all IP addresses First confirm that the system has booted through the boot IP given. Check using # lsattr El en0 Make entries for all adapters like node1_boot , node1_svc, node1_stby , node3_boot, node3_svc and node3_stby. # smit hacmp - Cluster Configuration - Cluster Topology - Configure Adapters - Add an adapter Add an Adapter Type or select values in entry fields. Press Enter AFTER making all desired changes. * Adapter IP Label * Network Type * Network Name * Network Attribute * Adapter Function Adapter Identifier Adapter Hardware Address Node Name [Entry Fields] [node1_boot] [ether] [ether1] public boot [193.9.200.225] [] [node1]
+ + + + +
B.) Check the Cluster Topology #smit hacmp - Cluster Configuration - Cluster Topology - Show Cluster Topology - Check the cluster topology. C.) Synchronize Cluster Topology #smit hacmp
23
- Cluster Configuration - Cluster Topology - Synchronize Cluster Topology The topology is copied to all participating nodes. D.) Create a Resource Group #smit hacmp - Cluster Configuration - Cluster Resources - Define resource Group - Add a resource Group Add a Resource Group Type or select values in entry fields. Press Enter AFTER making all desired changes. * Resource Group Name * Node Relationship * Participating Node Names [Entry Fields] [rg1] cascading [node1 node3]
+ +
Give Resource Group Name , Node Relationship and Participating Node Name. B.) Define Resources for a Resource Group #smit hacmp - Cluster Configuration - Cluster resources - Change/Show Resources for a RG Select a Resource Group Move cursor to desired item and press Enter. rg1
24
Configure a Resource Group Type or select values in entry fields. Press Enter AFTER making all desired changes. [TOP] Resource Group Name Node Relationship Participating Node Names [Entry Fields] rg1 cascading node1 node3 + + + + + + + + + + + + + + + + +
Service IP Label [node1_svc] Filesystems [] Filesystems Consistency Check fsck Filesystems Recovery Method sequential Filesystems to Export [] Filesystems to NFS Mount [] Volume Groups [] Concurrent Volume Groups [] Raw Disk PVIDs [] AIX Connections Services [] AIX Fast Connect Services [] Application Servers [] Highly Available Communication Links [] Miscellaneous Data [] Inactive Takeover Activated 9333 Disk Fencing Activated SSA Disk Fencing Activated Filesystems mounted before IP configured [BOTTOM] false false false false
An entry is made for node1_svc in rg1 so that it can be taken over in case of adapter failure or network failure. Similarly create another Resource Group rg2 for node3_svc. Entries for NFS and Application Servers if required, also have to be made in the above screen.
25
B.) Copy the Resource information to all participating nodes #smit hacmp - Cluster Configuration - Cluster resources - Synchronize Cluster resources. The Resources configuration gets copied to all participating nodes. C.) Start HACMP on all participating nodes. #smit hacmp - Cluster Services - Start Cluster Services Started on individual machine. We can use C-SPOC ( Cluster single point of control ) for all machines. However it has not been enabled on SP due to security reasons. For C-SPOC # smit hacmp - Cluster system management - HACMP for AIX Cluster Services. - Start Cluster Services ( It takes time ) B.) Check cluster services are started # lssrc g cluster Now the system has been configured for high availability of IP . It can be tested by stopping the HACMP services on one of the nodes. For this follow these steps : a.) On both the nodes run the following command and check that service IP is being used on en1. en2 should be using standby IP. # netstat in Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll lo0 16896 link#1 210 0 229 0 0 lo0 16896 127 127.0.0.1 210 0 229 0 0 lo0 16896 ::1 210 0 229 0 0 en0 1500 link#2 0.60.94.e9.56.e3 40117 0 38081 0 0 en0 1500 192.9.200 192.9.200.2 40117 0 38081 0 0 en1 1500 link#3 0.6.29.ac.ca.66 63612 0 1136 0 0 en1 1500 193.9.200 193.9.200.226 63612 0 1136 0 0 en2 1500 link#4 0.6.29.ac.f2.f6 0 0 3 3 0 en2 1500 193.9.201 193.9.201.1 0 0 3 3 0 The above display is obtained on node1. b.) Stop the cluster services on node1. # smit hacmp - Cluster Services - Stop Cluster Services
26
Stop Cluster Services Type or select values in entry fields. Press Enter AFTER making all desired changes. * Stop now, on system restart or both [Entry Fields] now + + +
BROADCAST cluster shutdown? true * Shutdown mode graceful with takeover (graceful, graceful with takeover, forced)
a.) Now check on node3 that node1_svc IP has shifted to its standby adapter (en2) using netstat command. Adding Serial links obtained through SSA : Check the device addresses on each node using # lsdev C | grep tmssa . # smit hacmp - Cluster Configuration - Cluster Topology - Configure Adapters - Add an adapter Add an Adapter Type or select values in entry fields. Press Enter AFTER making all desired changes. * Adapter IP Label * Network Type * Network Name * Network Attribute * Adapter Function Adapter Identifier Adapter Hardware Address Node Name [Entry Fields] [node1_tmssa] [tmssa] + [ssa1] + serial + service + [/dev/tmssa3] [] [ node1 ] +
27
# smit hacmp - Cluster Configuration - Cluster Resources - Define Application Servers Define Application Servers Move cursor to desired item and press Enter. Add an Application Server Change / Show an Application Server Remove an Application Server
Add Application Server Type or select values in entry fields. Press Enter AFTER making all desired changes. * Server Name * Start Script * Stop Script [Entry Fields] [tsm] [/usr/bin/tsmstart] [/usr/bin/tsmstop]
The same scripts have to be copied on all participating nodes . The entry for Application Server has to be made in Resource Group (Step 2. G ).
28
Sample tsmstart script SERVICEIP=1 while [ SERVICEIP -ne 0 ] do x=`netstat I | grep -c node1_svc` if [ $x -eq 1 ] then SERVICEIP=0 echo "Exiting with SERVICEIP" else echo "Executing IP Take over" sleep 2 fi done sleep 15 /usr/tivoli/tsm/server/bin/rc.adsmserv Sample tsmstop script cd /usr/tivoli/tsm/client/ba/bin dsmadmc -id=admin -password=support halt sleep 15 Cluster snapshot creation : # smit hacmp Cluster Snapshots Add a Cluster Snapshot We are required to provide Cluster Snapshot Name , Custom defined Snapshot Method and Cluster Snapshot Description. The snapshot is created in directory /usr/bin/cluster/snapshots. Two file are created <snapshotname>.odm and <snapshotname>.info. Testing the non-IP serial communication (SSA): Node1 : # cat < /dev/tmssa3.tm Node3 : # cat <filename> > /dev/tmssa2.im Scenario : Connecting the nodes in Rotating Resource Group. In Rotating mode , a resource does not belong to any node . Therefore when creating a resource like IP address , node name is not required to be given.
29
The configuration of the resource group is exactly the same as was done for cascading mode. However while adding adapter for node1_svc , do not provide the Node Name. The other adapters like node1_boot, node1_stby , node3_boot and node3_stby are added as in Step 2.C . # smit hacmp - Cluster Configuration - Cluster Topology - Configure Adapters - Add an adapter
30
Add an Adapter Type or select values in entry fields. Press Enter AFTER making all desired changes. * Adapter IP Label * Network Type * Network Name * Network Attribute * Adapter Function Adapter Identifier Adapter Hardware Address Node Name [Entry Fields] [node1_svc] [ether] + [ether1] + public + service + [193.9.200.226] [] [ ] +
After adding adapter definitions , synchronize the topology . Then create a resource group rg3 using node relationship as rotating . After making entries for resource group , synchronize the RG. Start HACMP on each node. The service IP is allocated on the boot adapter of node1. To test for IP takeover , stop the hacmp services on node1 ( Step 2.J b ) . The service IP label moves to boot adapter of node3. Note : It has been observed that the serial link tmssa should be configured in rotating RG . The test for IP takeover goes through successfully only when tmssa is configured.
31