Professional Documents
Culture Documents
for HA Administrators
Session ID: 41CO
Michael Herrera
PowerHA SystemMirror (HACMP) for AIX
ATS Certified IT Specialist
mherrera@us.ibm.com
Agenda
Management:
Start & Stop of cluster services
Moving Resources
Saving off Configuration
Maintenance
Configuration Optimization
Hostname Changes
Naming requirements in V7.x
Auto start or not of cluster
services
Dynamic Node Priority
Application Monitoring
DLPAR Integration
Resource Group Dependencies
Also useful:
# lssrc ls clstrmgrES | grep fix
cluster fix level is "3"
Attention:
Be aware that HA 7.1.1 SP2 or SP3 does not get reported back properly. The halevel command
probes with the wrong option and since the server.rte fileset is not updated it will not catch the
updates to the cluster.cspoc.rte filesets.
3
Upgrade Considerations
There are two main areas that you need to consider OS & HA Software
Change Controls: what is your ability to apply and test the updates ?
Consider things like Interim Fixes locking down the system
Will they need to be reapplied?
Will they need to be rebuilt?
Operating System:
Should you do AIX first or HA code?
Should you combine the upgrade
New OS requirements for HA
What is your back-out plan?
Alternate disk install
Mksysb
Common Question: Can the cluster run with the nodes running different levels?
5
Lessons Learned:
Do not do an upgrade of the cluster filesets while unmanaged on all nodes
This would recycle the clstrmgrES daemon and the cluster would lose its internal state
Application monitors are not suspended when you UNMANAGE the resources
If you manually stop the application and forget about the monitors existing application
monitors could auto-restart it or initiate a takeover depending on your configuration
Application Start scripts will get invoked again on restart of cluster services
Be aware of what happens when you invoke your start script while already running, or
comment out the scripts prior to restarting cluster services
Leave the Manage Resources attribute set to Automatic
Otherwise it will continue to show the RG as UNMANAGED until you do an RG move
ONLINE
7
Status
active
active
10
Application
Monitors will
continue to run.
Depending on the
implementation it
might be wise to
suspend monitors
prior to this
operation
12
13
Location Dependencies
Most of this is old news, but the use of dependencies can affect where and how
the resources get acquired. More importantly it can affect the steps required to
move resource groups and more familiarity with the configuration is required
14
15
16
Be aware of the
clcomd changes for
version 7 clusters
The clutils.log file should show the results of the nightly check
17
Custom Verification Methods may be defined to run during the Verify / Sync operations
Note: Automatic verify & sync on node start up does not include any custom verification methods
18
20
#!/bin/ksh
echo "Currently Loaded Interim Fixes:"
clcmd emgr -P
echo "Please Ensure that they are consistent between the nodes!"
21
Snapshot C .info
Snapshot
cluster
reportB .info
Snapshot
cluster
reportA .info
cluster report
Cluster Configuration
HACMPcluster
...infoT
<html tags>
cllsnode
T..
HACMPnode
TinfoT
cllscf
T..
HACMPadapter
TinfoT.
cllsif
T..
Inet0 - hostname
Inet0 - hostname
Only the service IP should
be swapping between nodes
# lscluster output
TT.
UUID as well
Service IP
Volume Group
/filesystems
start.sh
#!/bin/ksh
set new Hostname
stop.sh
#!/bin/ksh
unset hostname
Application
Controller
* This is restriction currently under evaluation by the CAA development team and may
be lifted in a future update
22
23
24
Versions 6.1 and earlier allowed Standard VGs or Enhanced Concurrent VGs
Version 7.X require the use of ECM volume groups
Your Answers:
Standard VGs would require an openx call against each physical volume
Processing could take several seconds to minutes depending on the number of LUNs
25
NODE B
Further Options
1 RG vs. Multiple RGs
Selective Fallover behavior (VG / IP)
RG Processing
Parallel vs. Sequential
RG Dependencies
Parent / Child, Location
Start After / Stop After
Best Practice:
Always try to keep it simple, but stay current with new features and take advantage
of existing functionality to avoid added manual customization.
26
Scenario:
10 Filesystems in volume group & only 1 defined in RG
HA processing will only mount the one FS
27
28
Not invoked:
get_disk_vg_fs
node_down_local
node_down_remote
node_down_local_complete
node_down_remote_complete
node_up_ local
node_up_remote
node_up_local_complete
node_up_remote_complete
release_vg_fs
29
Notes:
Recycle cluster services after updating UDE events
Scripts must exist on all cluster nodes: (Path, permissions)
Logic in recovery program can be configured to send
notification, append more space, etcT
Can specify multiple values in Selection String field
Actions logged in clstrmgr.debug and hacmp.out files
30
# odmget HACMPude
HACMPude:
name = "Herrera_UDE_event"
state = 0
recovery_prog_path =
"/usr/local/hascripts/Herrera_UDE
recovery_type = 2
recovery_level = 0
res_var_name = "IBM.FileSystem"
instance_vector = "Name = \"/\""
predicate = "PercentTotUsed > 95"
rearm_predicate = "PercentTotUsed < 70"
Configuration_Files
/etc/hosts
/etc/services
/etc/snmpd.conf
/etc/snmpdv3.conf
/etc/rc.net
/etc/inetd.conf
/usr/es/sbin/cluster/netmon.cf
/usr/es/sbin/cluster/etc/clhosts
/usr/es/sbin/cluster/etc/rhosts
/usr/es/sbin/cluster/etc/clinfo.rc
SystemMirror_Files
31
Node A
/usr/local/hascripts/app*
/usr/local/hascripts/app*
#!/bin/ksh
Application Start Logic
#!/bin/ksh
Application Start Logic
RED Updates
#!/bin/ksh
Application Stop Logic
BLUE Logic
#!/bin/ksh
Application Stop Logic
RED Updates
32
Node B
Blue Logic
Can select
Local (files)
LDAP
Select Nodes by
Resource Group
No selection
means all nodes
Users will be
propagated to all of
the cluster nodes
applicable
Password command
can be altered to
ensure consistency
across al nodes
33
Optional List of
Users whose
passwords will be
propagated to all
cluster nodes
passwd
command is
aliased to
clpasswd
Functionality
available since
HACMP 5.2
(Fall 2004)
34
35
Sample Email:
From: root 10/23/2012 Subject: HACMP
Node mhoracle1: Event acquire_takeover_addr occurred at Tue Oct 23 16:29:36 2012, object =
36
Attention:
Sendmail must be working and accessible via the firewall to receive notifications
37
There is a push to
leverage IBM Systems
Director which will guide
you through the step by
step configuration of the
cluster
38
TT..............
39
40
41
42
Attributes stored
in HACMPcluster
object class
CAA
Startup
Monitor
Only
invoked on
application
startup
Confirm the
startup of the
application
New
Application
Startup Mode
in HA 7.1.1
45
Process
Monitor
Custom
Monitor
60 sec
interval
60 sec
interval
Checks the
process table
Invokes the
custom logic
Volume Group
/filesystems
start.sh
Application
Controller
stop.sh
Start up Monitor
Long-Running Monitor
46
Information
stored in HA
ODM object
classes
Multiple HMC
IPs may be
defined
separated by a
space
Food for Thought: How many DLPAR operations can be handled at once?
47
[Entry Fields]
Application_svr1
[ 0.00]
[ 0.00]
[0]
[0]
#
#
[0]
[0]
#
#
[no]
[no]
Add an Application
Controller
Add a new
Resource Group
49
Summary
There are some notable differences between V7 and HA 6.1 and earlier
Pay careful attention to where some of the options are available
Appended Summary Chart of new features to the presentation
SG24-8030
50
Summary Chart
New Functionality & Changes
New CAA Infrastructure
51
7.1.1
7.1.1
DR Capabilities
7.1.2
7.1.0
7.1.0
7.1.0
7.1.1
7.1.2
7.1.1
7.1.1
7.1.0
7.1.0
7.1.2
7.1.2
Management
New Command Line Interface
clcmd
clmgr utility
lscluster
IBM Systems Director Management
7.1.0
7.1.0
Federated Security
7.1.X
7.1.1
(12/16/2011)
(11/18/2011)
(8/19/2011)
(9/30/2011)
(9/30/2011)
Questions?
Additional Resources
PowerHA SystemMirror 7.1.1 Update SG24-8030
http://www.redbooks.ibm.com/redpieces/abstracts/sg248030.html?Open
http://www.redbooks.ibm.com/Redbooks.nsf/RedbookAbstracts/sg247845.html?Open
RedGuide: High Availability and Disaster Recovery Planning: Next-Generation Solutions for Multi
server IBM Power Systems Environments
http://www.redbooks.ibm.com/abstracts/redp4669.html?Open
http://www-03.ibm.com/systems/power/software/availability/aix/index.html
http://www-941.ibm.com/collaboration/wiki/display/WikiPtype/High+Availability
53