Professional Documents
Culture Documents
Workload Fallover
WAN
Client
Production Standby
85.0%
14.0%
1.0%
Continuous
Availability
Continuous High
Operations Availability
TCP/IP Subsystem Using serial networks to connect adjoining nodes and clients
Fault
Tolerant
High Availability
Cluster
Enhanced
Stand-alone
Journaled Filesystem
Dynamic CPU Deallocation
Service Processor
Redundant Power
Redundant Cooling
ECC Memory
Hot Swap Adapters
Dynamic Kernel
Disk mirroring
Journaled Filesystem
Dynamic CPU Deallocation
Service Processor
Redundant Power
Redundant Cooling
ECC Memory
Hot Swap Adapters
Dynamic Kernel
Redundant Data Paths
Data Mirroring
Hot Swap Storage
Redundant Power for Storage Arrays
Redundant Cooling for Storage Arrays
Hot Spare Storage
Dual Disk Adapters
Redundant nodes (operating system)
Redundant Network Adapters
Redundant Networks
Application Monitoring
Site Failure (SAN distance)
Solutions
Journaled Filesystem Redundant Data Paths Redundant Servers Lock Step CPUs
Dynamic CPU Deallocation Data Mirroring Redundant Networks Hardened Operating System
Service Processor Hot Swap Storage Redundant Network Redundant Memory
Availability Redundant Power
Redundant Cooling
Redundant Power for
Storage Arrays
Adapters
Heartbeat Monitoring
Continuous Restart
Depends, but
Downtime Couple of days Couple of hours In theory, none!
typically 3 mins
Good as your
Data Availability Last transaction Last transaction No loss of Data
last full backup
Toronto London
*The HACMP XD feature of HACMP contains IBM's HAGEO product and PPRC support .
© Copyright IBM Corporation 2004
Why Might I Need High Availability?
60% of all large companies now operate round the clock (7x24)
Losses on failure:
330,000 $US per hour (industry average)
Peak losses: 130,000 $US per minute (telephone network) Lose of Revenue $M
Loss of customer loyalty 200
Loss of customer confidence 150
100
50
And, if there is no disaster recovery:
0
50% of affected companies will never reopen
90% of affected companies are out of business in less than two years
E $ £
© Copyright IBM Corporation 2004
Benefits of High-Availability Solutions
High-availability solutions offer the following benefits:
+ =
High
availability Data
Networking Continuous
availability
Continuous
operation
Hardware
Environment
Software
© Copyright IBM Corporation 2004
A Philosophical View of High Availability
The goal of an HA cluster is to make a service highly available.
Users aren't interested in highly available hardware.
Users aren't even interested in highly available software.
Users are interested in the availability of services.
Therefore, use the hardware and the software to make the services
highly available.
Cluster design decisions should be judged on the basis of whether
or not they:
Contribute to availability (for example, eliminate a SPOF)
Detract from availability (for example, gratuitous complexity)
Since it is impractical if not impossible to truly eliminate all SPOFs,
be prepared to use risk analysis techniques to determine which
SPOFs are tolerated and which must be eliminated
A A
B B
urce
Reso p
Grou
Application
Fallover
Clusters based upon HACMP 5.2 can contain between 2 and 32 nodes.
on
No two
unicati
Ne
Comm rface
n- rk
IP
Inte
n
tio
nica
u e
o mm evic
C D
No
de de
No
r
l uste
C
Resources
r
sto rt scr erve
S
p s ipt
up
sta tion
pt
ro
cri
G
ca
em
e
m
pli
st
lu
Sy
Ap
Vo
Se ddr
le
rvi ess
A
Fi
ce
IP
Resource Group
Node list
Resource Group Policies
Policies
startup
fallover
© Copyright IBM Corporation 2004
fallback
Solution Components
IBM
IBM
Midrange IBM
IBM
pSeries 690
IBM
pSeries 680
Entry Deskside IBM
IBM
pSeries
pSeries
pSeries 670
IBM
pSeries 660
pSeries
server
pSeries 620
pSeries 650
Models 6F0, 6F1
server
pSeries 630 pSeries 655
Model 6E4
pSeries 610
Model 6E1
H C R U6
IBM
IBM
pSeries
pSeries 630
pSeries 520
IBM
server
IBM
Model 6C4
pSeries 640
pSeries
pSeries 550
pSeries 610 Model B80 server
pSeries 570
Model 6C1
Entry Rack
All pSeries systems work with HACMP in any combination of nodes within a
cluster. However, a minimum of four free adapter slots is recommended.
© Copyright IBM Corporation 2004
Supported Storage Environments
SSA Adapter
RS/6000
RS/6000
FAStT
SAN
IBM
Maximum 25m
Host
System
ESS Fibre
SCSI
T
RS/6000
Controller
SCSI SCSI SCSI SCSI
Module Module Module Module
Token Ring
PC
PC
Server Server
Server Traffic Flow
FDDI
Server
Server
Non -IP
Recovery
cspoc programs event
scripts
dare
CLSTRMGR clsmuxpd
clverify
Configuration assistant
clstat
planning worksheets
clsmuxpd Application
snmp mib Monitoring
Customized
Pre-event scripts
Application
Server
HACMP core events
Application start script
Application stop script
Customized
post-event scripts
hints
Node A IP Label IP Address Netmask Node A IP Label IP Address Netmask
Service database 192.168.9.3 255.255.255.0 Service webserv 192.168.9.5 255.255.255.0
Boot nodeaboot 192.168.9.4 255.255.255.0 Boot nodebboot 192.168.9.6 255.255.255.0
Standby nodeastand 192.168.254.3 255.255.255.0 Standby nodebstand 192.168.254.3 255.255.255.0
Public Network
Failure (SPOFs).
Always include a non IP network. rootvg
raid1
9.1GB
VG =httpvg
rootvg
raid1
Be methodical.
Execute the test plan prior to Resourse Group httprg contains
Volume Group = httpvg
Resourse Group databaserg contains
Volume Group = dbvg
HACMP can also monitor applications, processor load and available disk
capacity.
How the cluster responds to a failure depends on what has failed, what the
resource group's fallover policy is, and if there are any resource group
dependencies
The cluster's configuration is determined by the application's requirements
Typically another equivalent component takes over duties of failed
component (for example, another node takes over from a failed node)
?
How the cluster responds to the recovery of a failed component depends on
what has recovered, what the resource group's fallback policy is, and what
resource group dependencies there are.
The cluster's configuration is determined by the application's requirements.
Cluster administrator may need to indicate/confirm that the fixed component
is approved for use.
© Copyright IBM Corporation 2004
Primary Node With a Standby Node
Start policy =
"Home node"
A A
Fallover policy =
"Fallover to next Halifax Vancouver
Halifax Vancouver priority node"
Fallback policy =
"Fallback to higher A
A priority node"
Halifax Vancouver
Halifax Vancouver
Vancouver returns
Halifax returns
Multiple layers of backup nodes are possible--fallover policy determines which node .
For example: primary -> secondary -> tertiary -> quaternary -> quinary -> senary -> septenary -> octonary ->
nonary -> denary ...
Start policy =
"Home node" A
A Fallover policy =
"Fallover to next priority Halifax Vancouver
Halifax Vancouver node"
Fallback policy =
Halifax joins the "Never Fallback "
cluster
Halifax Vancouver
Downtime is minimized by avoiding fallbacks.
Multiple resource groups tend to gather together on the node which has been
up the longest. © Copyright IBM Corporation 2004
Two-Node Mutual Takeover Scenario
Halifax Vancouver
Start policy =
"Home node"
Fallover policy = A B
B A
"Fallover to next
Vancouver priority node" Halifax Vancouver
Halifax
Fallback policy =
"Fallback to higher
priority node" Vancouver joins the
Halifax joins the cluster,
cluster, Halifax releases
Vancouver releases
resource group B
resource group A
A B
Halifax Vancouver
Fault
Resilience Concurrent
Resource HAGEO HACWS
Manager
HACMP
ES
Fault
Resilience
and 32 GeoRM
way
Scalability
Remote
Mirroring
HACMP
ES
HACMP
(combined)
for
AIX
Concurrent
Resource
Manager
HAGEO
HAGEO (replaced by) HACMP
XD PPRC
Remote
GeoRM Mirroring
Fault resilience,
32-way scalability
HACMP
and concurrent for
clusters AIX
Global
HACMP Availability
XD
High
HACWS Availability
CWS
TSM
SCSI
SSA
ESS
SCSI
SSA
ESS
A B
ODM ODM
C D
A B varyonvg
ODM ODM
varyonvg C D
httpvg
A B varyonvg
ODM ODM
dbvg
C D
varyonvg
A B
ODM ODM
varyoffvg httpvg
dbvg C D
varyonvg
httpvg A B
varyonvg
ODM ODM
varyonvg httpvg
dbvg
C D
varyonvg
A B varyonvg
ODM ODM
varyonvg C D
A B
varyonvg
ODM ODM
varyonvg C D
A B hdisk0
varyonvg
hdisk1
hdisk2
ODM ODM hdisk3
hdisk4
hdisk5
hdisk6
varyonvg C D hdisk7
hdisk8
hdisk9
ODM ODM
ODM ODM
1. A decision is made to move httpvg
from the right node to the left
active passive
varyon C D varyon
dbvg
passive httpvg
A B
varyon
ODM ODM
2. Right node releases active varyon of
httpvg
active passive
varyon C D varyon
dbvg
ODM ODM
3. Left node obtains active varyon of
httpvg
active passive
varyon C D varyon
dbvg
passive httpvg
A B
varyon
Active varyon state and passive varyon state
ODM ODM are concepts which don't apply to failed
active passive
nodes
varyon C D varyon
dbvg
httpvg
active A B passive
varyon varyon
3. Left node obtains active mode varyon of
ODM ODM
httpvg
active passive
C D
varyon dbvg varyon
2. True or False?
Using RSCT-based shared disk protection results in slower fallovers.
3. True or False?
Ghost disks must be checked for and eliminated immediately after every cluster fallover or
fallback.
4. True or False?
The fast disk takeover facility is a risk free performance improvement in HACMP 5.1.
2. True or False?
Using RSCT-based shared disk protection results in slower fallovers.
3. True or False?
Ghost disks must be checked for and eliminated immediately after every cluster fallover or
fallback.
4. True or False?
The fast disk takeover facility is a risk free performance improvement in HACMP 5.1.
Maximum 25m
Host Host
System System
T T SCSI
SCSI
Controller
5 6 Controller
SCSI 4 SCSI 3 SCSI 2 SCSI 1
Module Module Module Module
SSA Adapter
4 1
SSA SSA
A1 A1
A2 A2
Adapter B1
B2
5
16
B1
B2
Adapter
13 7133 8
SSA SSA
A1 A1
A2 12 9 A2
Adapter B1
B2
B1
B2
Adapter
http://www.ibm.com/redbooks
http://www.ibm.com/redbooks
Fibre Channel
Raid
Storage
Server
A B
ODM
C D
A B
ODM ODM
C D
A B
ODM ODM
C D
2. True or False?
SSA disk subsystems can support RAID5 (cache-enabled) with HACMP.
3. True or False?
Compatibility must be checked when using different SSA adapters in the same loop.
4. True or False?
SSA can be configured for no single point of failure.
5. True or False?
hdisk numbers must map to the same PVIDs across an entire HACMP cluster.
2. True or False?
SSA disk subsystems can support RAID5 (cache-enabled) with HACMP (although certain
limitations apply).
3. True or False?
Compatibility must be checked when using different SSA adapters in the same loop.
4. True or False?
SSA can be configured for no single point of failure.
5. True or False?
hdisk numbers must map to the same PVIDs across an entire HACMP cluster.
Physical Logical
Partitions Partitions
PVID
hdisk1
Volume
Group
This example shows an application writing to a filesystem which has its LVs
mirrored in a volume group physically residing on separate hdisks.
LVM
Physical Logical
Partitions Partitions
Volume Group
write to
/filesystem
Mirrored
Logical
Volume
Application
LVM
Physical Logical
Partitions
Volume Group
Partitions
write to
/filesystem
Mirrored
Logical
Volume
Application
[MORE...12]
1 create shared volume group Name the VG something meaningful like shared_vg1
Note: HACMP supports both JFS and the newer JFS2 filesystems. JFS2 filesystems are not supported for NFS export
from an HACMP cluster unless an external log area is used. It would probably be best to always use an external JFS2 log
logical volume as one never knows which filesystems will need to be NFS exported someday.
# mklvcopy sharedlv 3
# syncvg -l sharedlv = =
VGDA
ODM ODM
#3
#4
#2 cfgmgr
#1 varyoffvg
unmount importvg
mkvg varyoffvg chvg
chvg
mklv (log)
logform
mklv (data)
crfs
100% VGDA's
varyonvg >50% VGDA's or if MISSINGPV_VARYON=TRUE
>50% VGDA's
sharedvg
ON PASSIVE NODE:
3. True or False?
Quorum should always be disabled on shared volume groups.
4. True or False?
filesystem and logical volume attributes cannot be changed while the cluster is operational.
5. True or False?
An enhanced concurrent volume group is required for the heartbeat over disk feature.
3. True or False?
Quorum should always be disabled on shared volume groups.
4. True or False?
filesystem and logical volume attributes cannot be changed while the cluster is operational.
5. True or False?
An enhanced concurrent volume group is required for the heartbeat over disk feature.
bondar hudson
bondar hudson
bondar hudson
bondar hudson
bondar hudson
non-IP network
bondar hudson
non-IP network
bondar hudson
lab ce IP
NF
el
Sy
Fil tem
rvi
S
e
s
ex
Se
ro e
NF po
G lum
Sm rts
up
er
Vo
ou
nts
ion Serv
licat
App
urce
Reso p
Grou
192.168.25.12
192.168.25.12
192.168.25.12
192.168.25.12
3. True or False?
Heartbeat packets must be acknowledged or a failure is assumed to have occurred.
4. True or False?
Clusters are required to include a non-IP network.
5. True or False?
Each NIC on each physical IP network on each node is required to have an IP address on a
different logical subnet.
3. True or False?
Heartbeat packets must be acknowledged or a failure is assumed to have occurred.
4. True or False?
Clusters are required to include a non-IP network.
5. True or False?
Each NIC on each physical IP network on each node is required to have an IP address on a
different logical subnet.
*HACMP responds to the loss of quorum but the loss is detected by the Logical Volume
Manager.
© Copyright IBM Corporation 2004
HACMP Concepts and Configuration Rules
After completing this topic, you should be able to:
List the networking technologies supported by HACMP
Describe the purpose of public and private HACMP networks
Describe the topology components and their naming rules
Define key networking related HACMP terms
Describe the basic HACMP network configuration rules
IP:
non-IP:
Interface
Interface
Interface
Network
Network
Network
Interface
Network
no
Card
Card
Card
n-
Card
IP
ne
tw
or Communic
k ation Device
Serial Serial
Port non IP - rs232 Port
IP label
# netstat -i
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll
lo0 16896 link#1 5338 0 5345 0 0
lo0 16896 127 loopback 5338 0 5345 0 0
lo0 16896 ::1 5338 0 5345 0 0
tr0 1500 link#2 0.4.ac.49.35.58 76884 0 61951 0
0
tr0 1500 192.168.1 vancouver_boot1 76884 0 61951 0
0
tr1 1492 link#3 0.4.ac.48.22.f4 476 0 451 13
0
tr1 1492 192.168.2 vancouver_boot2 476 0 451 13
0
4. True or False?
There are no exceptions to the rule that each NIC on each physical network on each node must
have an IP address in a different subnet.
4. True or False?**
There are no exceptions to the rule that each NIC on each physical network on each node must
have an IP address in a different subnet.
* Refer to earlier discussion of heartbeating and failure diagnosis for explanation of why
© Copyright IBM Corporation 2004
IPAT via IP Aliasing in Operation
When the resource group comes up on a node, HACMP aliases the
service IP label onto one of the node's available (that is, currently
functional) interfaces (odm).
192.168.5.1 (alias)
* See earlier discussion of heartbeating and failure diagnosis for explanation of why
192.168.5.1 (alias)
* See earlier discussion of heartbeating and failure diagnosis for explanation of why
© Copyright IBM Corporation 2004
IPAT via IP Aliasing After a Node Fails
If the resource group's node fails, HACMP moves the resource group
to a new node and aliases the service IP label onto one of the new
node's available (for example, currently functional) non-service (odm)
communication interfaces
192.168.5.1 (alias)
* See earlier discussion of heartbeating and failure diagnosis for explanation of why
© Copyright IBM Corporation 2004
IPAT via IP Aliasing Summary
Configure each node's communication interfaces with IP addresses
(each on a different subnet)
Assign service IP labels to resource groups as appropriate
There is no limit on the number of resource groups with service IP
labels
There is no limit on the number of service IP labels per resource
group
HACMP assigns service IP labels to communication interfaces
(NICs) using IP aliases as appropriate
IPAT via IP aliasing requires that hardware address takeover is not
configured
IPAT via IP aliasing requires gratuitous arp support
* See earlier discussion of heartbeating and failure diagnosis for explanation of why
© Copyright IBM Corporation 2004
IPAT via IP Replacement in Operation
When the resource group comes up on a node, HACMP replaces an
interface (odm) IP label with the service IP label
It replaces the interface IP label on the same subnet if the resource
group is on its startup node or if the distribution fallover policy is used.
It replaces an interface IP label on a different subnet otherwise
NIC A NIC B
192.168.11.1 (odm) 192.168.10.7 (service) 192.168.10.2 (odm) 192.168.11.2 (odm)
/etc/rc /usr/sbin/cluster/etc/harc.net
mount all /etc/rc.net -boot
cfgif
/etc/rc.tcpip
daemons start < HACMP startup > clstmgr
event node_up
/etc/rc.nfs node_up_local
daemons start get_disk_vg_fs
exportfs acquire_service_addr
telinit -a
/etc/rc.tcpip
daemons start
/etc/rc.nfs
daemons start
IPAT changes the init exportfs
sequence
hudson-if1
© Copyright IBM Corporation 2004
An IPAT via IP Aliasing Convention
Here's one possible IP label number convention for IPAT via IP
aliasing networks:
IP address is of the form AA.BB.CC.DD
AA.BB is assigned by network administrator
CC indicates which interface, service IP label on each node:
15,16 indicates non-service/interface IP labels
5 chosen for service labels bondar-if1 192.168.15.29
etc (as required) bondar-if2 192.168.16.29
hudson-if1 192.168.15.31
DD indicates which node hudson-if2 192.168.16.31
xweb 192.168.5.92
29 indicates an IP address on bondar yweb 192.168.5.70
31 indicates an IP address on hudson
Be flexible. For example, this convention uses DD=29 for bondar
and DD=31 for hudson because the network administrator assigned
bondar-if1 to be 192.168.15.29 and hudson-if1 to be 192.168.15.31.
Fortunately, the network administrator could be convinced to use .29
and .31 for the other bondar and
hudson interface IP addresses.
31 indicates hudson
2. True or False?
All networking technologies supported by HACMP support IPAT via IP aliasing.
3. True or False?
All networking technologies supported by HACMP support IPAT via IP replacement.
4. If the left node has NICs with the IP addresses 192.168.20.1 and 192.168.21.1
and the right hand node has NICs with the IP addresses 192.168.20.2 and
192.168.21.2 then which of the following are valid service IP addresses if IPAT via
IP aliasing is being used (select all that apply)?
a. (192.168.20.3 and 192.168.20.4) OR (192.168.21.3 and 192.168.21.4)
b. 192.168.20.3 and 192.168.20.4 and 192.168.21.3 and 192.168.21.4
c. 192.168.22.3 and 192.168.22.4
d. 192.168.23.3 and 192.168.24.3
5. If the left node has NICs with the IP addresses 192.168.20.1 and 192.168.21.1
and the right hand node has NICs with the IP addresses 192.168.20.2 and
192.168.21.2 then which of the following are valid service IP addresses if IPAT via
IP replacement is being used (select all that apply)?
a. (192.168.20.3 and 192.168.20.4) OR (192.168.21.3 and 192.168.21.4)
b. 192.168.20.3, 192.168.20.4, 192.168.21.3 and 192.168.21.4
c. 192.168.22.3 and 192.168.22.4
d. 192.168.23.3 and 192.168.24.3
© Copyright IBM Corporation 2004
Checkpoint Answers
1. True or False?
A single cluster can use both IPAT via IP aliasing and IPAT via IP replacement.
2. True or False?
All networking technologies supported by HACMP support IPAT via IP aliasing.
3. True or False?
All networking technologies supported by HACMP support IPAT via IP replacement.
4. If the left node has NICs with the IP addresses 192.168.20.1 and 192.168.21.1
and the right hand node has NICs with the IP addresses 192.168.20.2 and
192.168.21.2 then which of the following are valid service IP addresses if IPAT via
IP aliasing is being used (select all that apply)?
a. (192.168.20.3 and 192.168.20.4) OR (192.168.21.3 and 192.168.21.4)
b. 192.168.20.3 and 192.168.20.4 and 192.168.21.3 and 192.168.21.4
c. 192.168.22.3 and 192.168.22.4
d. 192.168.23.3 and 192.168.24.3
5. If the left node has NICs with the IP addresses 192.168.20.1 and 192.168.21.1
and the right hand node has NICs with the IP addresses 192.168.20.2 and
192.168.21.2 then which of the following are valid service IP addresses if IPAT via
IP replacement is being used (select all that apply)?
a. (192.168.20.3 and 192.168.20.4) OR (192.168.21.3 and 192.168.21.4)
b. 192.168.20.3, 192.168.20.4, 192.168.21.3 and 192.168.21.4
c. 192.168.22.3 and 192.168.22.4
d. 192.168.23.3 and 192.168.24.3
© Copyright IBM Corporation 2004
The Impact of IPAT on Clients
After completing this topic, you should be able to:
Explain how user systems are affected by IPAT related operations
Describe what the ARP cache issue is
Explain how gratuitous ARP usually deals with the ARP cache issue
Explain three ways to deal with the ARP cache issue if gratuitous
ARP does not provide a satisfactory resolution to the ARP cache
issue:
Configure clinfo on the client systems
Configure clinfo within the cluster
Configure Hardware Address Takeover within the cluster
© Copyright IBM Corporation 2004 this space over here in this corner intentionally left blank
Local or Remote Client?
If the client is remotely connected through a router, it is the
router's ARP cache which must be corrected.
ARP: ARP:
router (192.168.8.1) 00:04:ac:42:9c:e2 router (192.168.8.1) 00:04:ac:42:9c:e2
192.168.8.3 192.168.8.3
00:04:ac:27:18:09 00:04:ac:27:18:09
ARP: ARP:
xweb (192.168.5.1) 00:04:ac:62:72:49 192.168.8.1 xweb (192.168.5.1) ??? 192.168.8.1
client (192.168.8.3) 00:04:ac:27:18:09 00:04:ac:42:9c:e2 client (192.168.8.3) 00:04:ac:27:18:09 00:04:ac:42:9c:e2
192.168.5.99 192.168.8.99
00:04:ac:29:31:37 00:04:ac:29:31:37
xweb 192.168.5.1 (alias) 192.168.5.1 (alias) xweb
192.168.10.1 (odm) 192.168.11.1 (odm) 192.168.10.1 (odm) 192.168.11.1 (odm)
00:04:ac:62:72:49 00:04:ac:48:22:f4 00:04:ac:62:72:49 00:04:ac:48:22:f4
snmpd
clinfo
clsmuxpd
clinfo.rc
clstrmgr
clstrmgr clinfo.rc
# Example:
#
# PING_CLIENT_LIST="host_a host_b 1.1.1.3"
#
PING_CLIENT_LIST=""
TOTAL_CLIENT_LIST="${PING_CLIENT_LIST}"
if [[ -s /etc/cluster/ping_client_list ]] ; then
#
# The file "/etc/ping_client_list" should contain only a line
# setting the variable "PING_CLIENT_LIST" in the form given
# in the example above. This allows the client list to be
# kept in a file that is not altered when maintenance is
# applied to clinfo.rc.
#
. /etc/cluster/ping_client_list
TOTAL_CLIENT_LIST="${TOTAL_CLIENT_LIST} ${PING_CLIENT_LIST}"
fi
#
# WARNING!!! For this shell script to work properly, ALL entries in
# the TOTAL_CLIENT_LIST must resolve properly to IP addresses or hostnames
# (must be found in /etc/hosts, DNS, or NIS). This is crucial.
xweb 192.168.5.1 (service) 192.168.20.1 (odm) 192.168.20.1 (odm) 192.168.5.1 (service) xweb
40:04:ac:62:72:49 00:04:ac:48:22:f4 00:04:ac:62:2e:4c 40:04:ac:62:72:49
xweb
bondarstandby 192.168.5.1
192.168.9.1 255.255.255.0
255.255.255.0 40:04:ac:62:72:49
00:04:ac:48:22:f4
The hardware
xweb
address is "moved" hudsonboot
Bondar Hudson
hudsonstandby
bondarstandby 192.168.9.2
192.168.9.1
tr1
255.255.255.0
255.255.255.0
00:04:ac:48:22:f4
00:04:ac:62:72:61 tr1
After HACMP is
xweb
started the node hudsonboot
Bondar Hudson
2. True or False?
All client systems are potentially directly affected by the ARP cache issue.
3. True or False?
clinfo must not be run both on the cluster nodes and on the client systems.
4. Use the LAA generation technique described earlier to generate an LAA for
each of the following GAA addresses (all but one of these are taken from
real ethernet cards):
00.20.ed.76.fb.15 ____________________
0.4.ac.17.19.64 ____________________
0.6.29.ac.46.8 ____________________
12.7.1.71.1.6 ____________________
2. True or False?
All client systems are potentially directly affected by the ARP cache issue.
3. True or False?
clinfo must not be run both on the cluster nodes and on the client systems.
4. Use the LAA generation technique described earlier to generate an LAA for
each of the following GAA addresses (all but the last one of these are
taken from real ethernet cards):
00.20.ed.76.fb.15 40.20.ed.76.fb.15
0.4.ac.17.19.64 40.04.ac.17.19.64
0.6.29.ac.46.8 40.06.29.ac.46.08
12.7.1.71.1.6 52.07.01.71.01.06
http://www-1.ibm.com/servers/eserver/support/pseries/fixes/
2. True or False?
HACMP 5.2 is compatible with any version of AIX 5.x.
3. True or False?
Each cluster node must be rebooted after the HACMP software is installed.
4. True or False?
You should take careful notes while you install and configure HACMP so that you know what to
test when you are done.
2. True or False?
HACMP 5.2 is compatible with any version of AIX 5.x.
3. True or False?
Each cluster node must be rebooted after the HACMP software is installed.
4. True or False?
You should take careful notes while you install and configure HACMP so that you know what to
test when you are done.
*There is some dispute about whether the correct answer is b or e although a disconcerting
number of clusters are implemented in the order a, b, c, d, e (how can you possibly order the
hardware if you do not yet know what you are going to build?) or even just a, c, d (cluster
implementers who skip step b rarely have time for long naps).
Application Layer
Contains the highly available applications that
use HACMP services
HACMP Layer
Provides highly available services to
applications
AIX Layer
Provides operating system services
AIX
Process
Monitor
HACMP 5.2
HA Recovery Driver
Database RSCT ~
Resource RMC Cluster Recovery
Monitor (ctrmc) Manager Programs
Switch
Resource
Monitor
Recovery
RSCT Commands
RSCT
Group ~
Topology
Services Services HACMP Event
Scripts
processor, LAN
heartbeats messages
membership
information
Heartbeat
25.8.60.6 25.8.60.5
25.8.60.2 25.8.60.4
25.8.60.3
ODM
Start Group Network Name SPswitch
Network Type HPS
Nodes
Services *
*Node Type Address
Adapters 1 css0 192.168.13.1
2 css0 192.168.13.2
Start event mgmt 3 css0 192.168.13.3
services .
Start Cluster
Manager
Join Group
Services
Run Recovery
Program
Node up
© Copyright IBM Corporation 2004
Application Monitoring
run_clappmond
clappmond
Startup or long-running
one or many
Application Application
[Entry Fields]
* Select an Application [] +
* Begin analysis on YEAR (1970-2038) [] #
* MONTH (01-12) [] #
* DAY (1-31) [] #
* Begin analysis at HOUR (00-23) [] #
* MINUTES (00-59) [] #
* SECONDS (00-59) [] #
* End analysis on YEAR (1970-2038) [2003] #
* MONTH (01-12) [12] #
* DAY (1-31) [31] #
* End analysis at HOUR (00-23) [20] #
* MINUTES (00-59) [54] #
* SECONDS (00-59) [02] #
2. Which of the following are true about HACMP 5.2 (select all that apply):
a. RMC used in HACMP only at release 5.2
b. Cluster Lock Manager enhanced
c. Cluster Information Daemon removed
d. Enhanced Concurrent Volume Group supported in nonconcurrent mode
e. Clcomd provides message authentication in HACMP only in release 5.2
3. True or False?
Migrating from older versions of HACMP is a routine task which generally requires little planning.
4. True or False?
The cluster communication daemon (clcomd) eliminates the need for /.rhosts files.
3. True or False?
Migrating from older versions of HACMP is a routine task which generally requires little planning.
4. True or False?
The cluster communication daemon (clcomd) eliminates the need for /.rhosts files.
Recovery Command
Programs
__
__ Event Script
__
rules.hacmprd
#
TE_JOIN_NODE
0
/usr/sbin/cluster/events/node_up.rp
file
2
0
# 6) Resource variable only used for event manager events
RMC
(ctrmc) Group Services/ES
Topology Services/ES
Recovery
HACMP Event
Command
No Counter Yes
RC=0
>0
Yes
Boom!
No
Notify Command
Start
1) node_up
t node_up_local
ar
St acquire_service_addr
Event RC acquire_takeover_addr
Manager Sta get_disk_vg_fs
rt
RC
2) node_up_complete
node_up_local_complete
start_server
run start script
© Copyright IBM Corporation 2004
Another Node Joins the Cluster
ning ar
t
run St
Event Event
Manager Messages Manager
Sta
St
ar
rt
t
RC
RC
1) node_up 2) node_up
Sta
Sta
node_up_remote node_up_local
rt
RC
rt
stop_server acquire_service_address
RC
ning op
run St
Event Event Sta
Manager Messages Manager rt
RC 1) node_down
St
ar
node_down_local
t
stop_server
3) node_down run stop script
RCSta
St
ar
RC Sto
node_down_remote release_takeover_addr
t
acquire_service_addr release_vg_fs
RC
rt
2) node_down_complete
4) node_down_complete node_down_local_complete
node_down_remote_complete
start_server 3) cluster manager exits
3. When a node joins an existing cluster, what is the correct sequence for
these events?
a. node_up on new node, node_up on existing node, node_up_complete on new node,
node_up_complete on existing node
b. node_up on existing node, node_up on new node, node_up_complete on new node,
node_up_complete on existing node
c. node_up on new node, node_up on existing node, node_up_complete on existing node,
node_up_complete on new node
d. node_up on existing node, node_up on new node, node_up_complete on existing node,
node_up_complete on new node
4. True or False?
Checkpoint questions are boring.
3. When a node joins an existing cluster, what is the correct sequence for
these events?
a. node_up on new node, node_up on existing node, node_up_complete on new node,
node_up_complete on existing node
b. node_up on existing node, node_up on new node, node_up_complete on new node,
node_up_complete on existing node
c. node_up on new node, node_up on existing node, node_up_complete on existing node,
node_up_complete on new node
d. node_up on existing node, node_up on new node, node_up_complete on existing node,
node_up_complete on new node
4. True or False?
Checkpoint questions are boring.
Parent RG
Child/Parent Parent/Child
RG RG
Child RG
2. True or False?
HACMP 5.2 does not support choosing Cascading as a resource group type.
4. True or False?
Resource groups support IPAT via IP replacement in HACMP 5.2.
2. True or False?
HACMP 5.2 does not support choosing Cascading as a resource group type.
4. True or False?
Resource groups support IPAT via IP replacement in HACMP 5.2.
bondar hudson
D D
These network interfaces are all connected to the same physical network.
The subnet mask is 255.255.255.0 on all networks/NICs.
An enhanced concurrent mode volume group "ecmvg" has been created to
support the xweb application and will be used for a disk non-ip heartbeat
network
/mydir/xweb_start
/mydir/xweb_stop
System Management
TCP/IP
NFS
HACMP for AIX
[Entry Fields]
* Communication Path to Takeover Node [hudson_if1] +
* Application Server Name [xwebserver]
* Application Server Start Script [/mydir/xweb_start]
* Application Server Stop Script [/mydir/xweb_stop]
* Service IP Label [xweb] +
[Entry Fields]
* Start now, on system restart or both now +
Start Cluster Services on these nodes [bondar,hudson] +
BROADCAST message at startup? true +
Startup Cluster Information Daemon? true +
Reacquire resources after forced down ? false +
[Entry Fields]
* Stop now, on system restart or both now +
Stop Cluster Services on these nodes [bondar] +
BROADCAST cluster shutdown? true +
* Shutdown mode graceful +
+--------------------------------------------------------------------------+
¦ Shutdown mode ¦
¦ ¦
¦ Move cursor to desired item and press Enter. ¦
¦ ¦
¦ graceful ¦
¦ takeover ¦
¦ forced ¦
¦ ¦
¦ F1=Help F2=Refresh F3=Cancel ¦
F1¦ F8=Image F10=Exit Enter=Do ¦
F5¦ /=Find n=Find Next ¦
F9+--------------------------------------------------------------------------+
D D
A A
[Entry Fields]
* Resource Group Name [adventure]
* Participating Node Names (Default Node Priority) [hudson bondar] +
[Entry Fields]
* Resource Group Name [adventure]
* Participating Node Names (Default Node Priority) [hudson bondar] +
[Entry Fields]
* Resource Group Name [adventure]
* Participating Node Names (Default Node Priority) [hudson bondar] +
[Entry Fields]
* Resource Group Name [adventure]
* Participating Node Names (Default Node Priority) [hudson bondar] +
Extended Configuration
[Entry Fields]
* IP Label/Address [] +
* Network Name [] +
+--------------------------------------------------------------------------+
¦ IP Label/Address ¦
¦ ¦
¦ Move cursor to desired item and press Enter. ¦
¦ ¦
¦ (none) ((none)) ¦
¦ bondar (192.168.5.29) ¦
¦ hudson (192.168.5.31) ¦
¦ yweb (192.168.5.70) ¦
¦ xweb (192.168.5.92) ¦
¦ ¦
¦ F1=Help F2=Refresh F3=Cancel ¦
F1¦ F8=Image F10=Exit Enter=Do ¦
F5¦ /=Find n=Find Next ¦
F9+--------------------------------------------------------------------------+
[Entry Fields]
* IP Label/Address [yweb] +
* Network Name [] +
+--------------------------------------------------------------------------+
¦ Network Name ¦
¦ ¦
¦ Move cursor to desired item and press Enter. ¦
¦ ¦
¦ net_ether_01 (192.168.15.0/24 192.168.16.0/24) ¦
¦ ¦
¦ F1=Help F2=Refresh F3=Cancel ¦
F1¦ F8=Image F10=Exit Enter=Do ¦
F5¦ /=Find n=Find Next ¦
F9+--------------------------------------------------------------------------+
[Entry Fields]
* IP Label/Address [yweb] +
* Network Name [net_ether_01] +
[Entry Fields]
* Server Name [ywebserver]
* Start Script [/usr/local/scripts/startyweb]
* Stop Script [/usr/local/scripts/stopyweb]
+--------------------------------------------------------------------------+
¦ Select a Resource Group ¦
¦ ¦
¦ Move cursor to desired item and press Enter. ¦
¦ ¦
¦ xwebserver_group
| adventure ¦
¦ ¦
¦ F1=Help F2=Refresh F3=Cancel ¦
¦ F8=Image F10=Exit Enter=Do ¦
F1¦ /=Find n=Find Next ¦
F9+--------------------------------------------------------------------------+
[Entry Fields]
Custom Resource Group Name adventure
Participating Node Names (Default Node Priority) hudson bondar
Extended Configuration
+--------------------------------------------------------------------------+
¦ Select a category ¦
¦ ¦
¦ Move cursor to desired item and press Enter. ¦
¦ ¦
¦ Add Discovered Communication Interface and Devices ¦
¦ Add Predefined Communication Interfaces and Devices ¦
¦ ¦
¦ F1=Help F2=Refresh F3=Cancel ¦
¦ F8=Image F10=Exit Enter=Do ¦
F1¦ /=Find n=Find Next ¦
F9+--------------------------------------------------------------------------+
Don't risk a potentially catastrophic partitioned cluster by using cheap rs232 cables!
+--------------------------------------------------------------------------+
¦ Select a category ¦
¦ ¦
¦ Move cursor to desired item and press Enter. ¦
¦ ¦
¦ # Discovery last performed: (Feb 12 18:20) ¦
¦ Communication Interfaces ¦
¦ Communication Devices ¦
¦ ¦
¦ F1=Help F2=Refresh F3=Cancel ¦
¦ F8=Image F10=Exit Enter=Do ¦
F1¦ /=Find n=Find Next ¦
F9+--------------------------------------------------------------------------+
+--------------------------------------------------------------------------+
¦ Select a Node ¦
¦ ¦
¦ Move cursor to desired item and press Enter. ¦
¦ ¦
¦ bondar ¦
¦ hudson ¦
¦ ¦
¦ F1=Help F2=Refresh F3=Cancel ¦
¦ F8=Image F10=Exit Enter=Do ¦
F1¦ /=Find n=Find Next ¦
F9+--------------------------------------------------------------------------+
[Entry Fields]
* Node Name bondar
* Network Name [net_ether_01] +
* Node IP Label/Address [bondar-per] +
[Entry Fields]
* Verify, Synchronize or Both [Both] +
* Automatically correct errors found during [No] +
verification?
Don't forget to verify that you actually implemented what was planned by executing your test plan.
[Entry Fields]
* Cluster Snapshot Name [] /
Custom Defined Snapshot Methods [] +
Save Cluster Log Files in snapshot No +
* Cluster Snapshot Description []
bondar hudson
D D
A A
2. In which of the top level HACMP menu choices is the menu for starting and
stopping cluster nodes?
a. Initialization and Standard Configuration
b. Extended Configuration
c. System Management (C-SPOC)
d. Problem Determination Tools
3. An orderly shutdown of AIX while HACMP is running is equivalent to which
of the following:
a. Graceful shutdown of HACMP followed by an orderly shutdown of AIX.
b. Takeover shutdown of HACMP followed by an orderly shutdown of AIX.
c. Forced shutdown of HACMP followed by an orderly shutdown of AIX.
d. None of the above.
4. True or False?
It is possible to configure HACMP faster by having someone help you on the other node.
5. True or False?
You must specify exactly which filesystems you want mounted when you put resources into a
resource group.
2. In which of the top level HACMP menu choices is the menu for starting and
stopping cluster nodes?
a. Initialization and Standard Configuration
b. Extended Configuration
c. System Management (C-SPOC)
d. Problem Determination Tools
3. An orderly shutdown of AIX while HACMP is running is equivalent to which
of the following:
a. Graceful shutdown of HACMP followed by an orderly shutdown of AIX.
b. Takeover shutdown of HACMP followed by an orderly shutdown of AIX.
c. Forced shutdown of HACMP followed by an orderly shutdown of AIX.
d. None of the above.
4. True or False?**
It is possible to configure HACMP faster by having someone help you on the other node.
5. True or False?
You must specify exactly which filesystems you want mounted when you put resources into a
resource group.
*This was False in previous releases as it was not possible to configure the recommended non-IP network using the standard
path. However the Two-Node configuration assistant can.
**Whoever synchronizes first will cause their changes to take effect and result in the other person's changes to made prior to
the time of the first synchronization to be thrown away.
© Copyright IBM Corporation 2004
Break Time!
bondar hudson
D D
A A
B B
[Entry Fields]
* Resource Group Name [ballerina]
* Participating Node Names (Default Node Priority) [bondar hudson]
+
Does the order in which the node names are specified matter?
© Copyright IBM Corporation 2004
Adding a Third Service IP Label (1 of 2)
The extended configuration path screen for adding a service IP label provides
more options. We choose those which mimic the standard path.
Configure HACMP Service IP Labels/Addresses
+--------------------------------------------------------------------------+
¦ Select a Service IP Label/Address type ¦
¦ ¦
¦ Move cursor to desired item and press Enter. ¦
¦ ¦
¦ Configurable on Multiple Nodes ¦
¦ Bound to a Single Node ¦
¦ ¦
¦ F1=Help F2=Refresh F3=Cancel ¦
¦ F8=Image F10=Exit Enter=Do ¦
F1¦ /=Find n=Find Next ¦
F9+--------------------------------------------------------------------------+
[Entry Fields]
* IP Label/Address [zweb] +
* Network Name net_ether_01
Alternate HW Address to accompany IP Label/Address []
[Entry Fields]
* Server Name [zwebserver]
* Start Script [/usr/local/scripts/startzweb]
* Stop Script [/usr/local/scripts/stopzweb]
Tape Resources [] +
Raw Disk PVIDs [] +
Miscellaneous Data []
[BOTTOM]
[Entry Fields]
* Verify, Synchronize or Both [Both] +
Force synchronization if verification fails? [No] +
* Verify changes only? [No] +
* Logging [Standard] +
Don't forget to verify that you actually implemented what was planned by
executing your test plan.
© Copyright IBM Corporation 2004
Expanding the Cluster
The Users "find" money in the budget and decide to "invest" it
to improve the availability of the adventure and discovery
applications.
Nobody seems to be too worried about the ballerina application.
D D D
A A A
B B
[Entry Fields]
* Cluster Name [xwebserver_cluster]
New Nodes (via selected communication paths) [jones-if1] +
Currently Configured Node(s) bondar hudson
COMMAND STATUS
[TOP]
Communication path jones-if1 discovered a new node. Hostname is jones. Addin
g it to the configuration with Nodename jones.
Retrieving data from available cluster nodes. This could take a few minutes....
[Entry Fields]
* Node Name [jones]
Communication Path to Node [jones_if1] +
+--------------------------------------------------------------------------+
¦ Select Point-to-Point Pair of Discovered Communication Devices to Add ¦
¦ ¦
¦ Move cursor to desired item and press F7. Use arrow keys to scroll. ¦
¦ ONE OR MORE items can be selected. ¦
¦ Press Enter AFTER making all selections. ¦
¦ ¦
¦ # Node Device Device Path Pvid ¦
¦ bondar tty0 /dev/tty0 ¦
¦ hudson tty0 /dev/tty0 ¦
¦ jones tty0 /dev/tty0 ¦
¦ bondar tty1 /dev/tty1 ¦
¦ hudson tty1 /dev/tty1 ¦
¦ > jones tty1 /dev/tty1 ¦
¦ > bondar tty2 /dev/tty2 ¦
¦ hudson tty2 /dev/tty2 ¦
¦ jones tty2 /dev/tty2 ¦
¦ ¦
¦ F1=Help F2=Refresh F3=Cancel ¦
¦ F7=Select F8=Image F10=Exit ¦
F1¦ Enter=Do /=Find n=Find Next ¦
F9+--------------------------------------------------------------------------+
+--------------------------------------------------------------------------+
¦ Select Point-to-Point Pair of Discovered Communication Devices to Add ¦
¦ ¦
¦ Move cursor to desired item and press F7. Use arrow keys to scroll. ¦
¦ ONE OR MORE items can be selected. ¦
¦ Press Enter AFTER making all selections. ¦
¦ ¦
¦ # Node Device Device Path Pvid ¦
¦ bondar tty0 /dev/tty0 ¦
¦ hudson tty0 /dev/tty0 ¦
¦ jones tty0 /dev/tty0 ¦
¦ bondar tty1 /dev/tty1 ¦
¦ hudson tty1 /dev/tty1 ¦
¦ jones tty1 /dev/tty1 ¦
¦ bondar tty2 /dev/tty2 ¦
¦ > hudson tty2 /dev/tty2 ¦
¦ > jones tty2 /dev/tty2 ¦
¦ ¦
¦ F1=Help F2=Refresh F3=Cancel ¦
¦ F7=Select F8=Image F10=Exit ¦
F1¦ Enter=Do /=Find n=Find Next ¦
F9+--------------------------------------------------------------------------+
[Entry Fields]
* Verify, Synchronize or Both [Both] +
Force synchronization if verification fails? [No] +
* Verify changes only? [No] +
* Logging [Standard] +
[Entry Fields]
* Start now, on system restart or both now +
Start Cluster Services on these nodes [jones] +
BROADCAST message at startup? true +
Startup Cluster Lock Services? false +
Startup Cluster Information Daemon? false +
Reacquire resources after forced down ? false +
[Entry Fields]
Resource Group Name adventure
New Resource Group Name []
Participating Node Names (Default Node Priority) [hudson bondar jones] +
[Entry Fields]
* Verify, Synchronize or Both [Both] +
Force synchronization if verification fails? [No] +
* Verify changes only? [No] +
* Logging [Standard] +
D D
A A
B B
X
© Copyright IBM Corporation 2004
Removing a Cluster Node
Using any cluster node, remove the departing node from all resource groups
(ensure that each resource group is left with at least two nodes) and
synchronize your changes.
Stop HACMP on the departing node.
Using one of the other cluster nodes which is not being removed:
Remove the departing node from the cluster's topology (using the
Remove a Node from the HACMP Cluster smit screen in the extended
configuration path) and synchronize your change.
Once the synchronization is completed successfully, the departing node is
no longer a member of the cluster.
Remove the departed node's IP addresses from
/usr/es/sbin/cluster/etc/rhosts on the remaining nodes (prevents the
departed node from interfering with HACMP on the remaining nodes).
Physically disconnect the (correct) rs232 cables.
Disconnect the departing node from the shared storage subsystem (strongly
recommended as it makes it impossible for the departed
node to screw up the cluster's shared storage).
Run through your (updated) test plan.
© Copyright IBM Corporation 2004
Removing an Application
The zwebserver application has been causing problems and a
decision has been made to move it out of the cluster.
bondar hudson
D D
A A
B B
+--------------------------------------------------------------------------+
¦ Select a Resource Group ¦
¦ ¦
¦ Move cursor to desired item and press Enter. ¦
¦ ¦
¦ adventure ¦
¦ ballerina ¦
¦ discovery ¦
¦ ¦
¦ F1=Help F2=Refresh F3=Cancel ¦
¦ F8=Image F10=Exit Enter=Do ¦
F1¦ /=Find n=Find Next ¦
F9+--------------------------------------------------------------------------+
+--------------------------------------------------------------------------+
¦ ARE YOU SURE? ¦
¦ ¦
¦ Continuing may delete information you may want ¦
¦ to keep. This is your last chance to stop ¦
¦ before continuing. ¦
¦ Press Enter to continue. ¦
¦ Press Cancel to return to the application. ¦
¦ ¦
¦ F1=Help F2=Refresh F3=Cancel ¦
F1¦ F8=Image F10=Exit Enter=Do ¦
F9+--------------------------------------------------------------------------+
[Entry Fields]
* Verify, Synchronize or Both [Both] +
Force synchronization if verification fails? [No] +
* Verify changes only? [No] +
* Logging [Standard] +
bondar hudson
D D
A A
[Entry Fields]
* Stop now, on system restart or both now +
Stop Cluster Services on these nodes [bondar,hudson] +
BROADCAST cluster shutdown? true +
* Shutdown mode graceful +
+--------------------------------------------------------------------------+
¦ Select Service IP Label(s)/Address(es) to Remove ¦
¦ ¦
¦ Move cursor to desired item and press F7. ¦
¦ ONE OR MORE items can be selected. ¦
¦ Press Enter AFTER making all selections. ¦
¦ ¦
¦ xweb ¦
¦ yweb ¦
¦ zweb ¦
¦ F1=Help F2=Refresh F3=Cancel ¦
¦ F7=Select F8=Image F10=Exit ¦
F1¦ Enter=Do /=Find n=Find Next ¦
F9+--------------------------------------------------------------------------+
[Entry Fields]
* Network Name net_ether_01
New Network Name []
* Network Type [ether] +
* Netmask [255.255.255.0] +
* Enable IP Address Takeover via IP Aliases [No] +
IP Address Offset for Heartbeating over IP Aliases []
[Entry Fields]
* IP Label/Address [xweb] +
* Network Name net_ether_01
Alternate HW Address to accompany IP Label/Address [4004ac171964]
Don't forget to specify the second LAA for the second service IP label.
© Copyright IBM Corporation 2004
Synchronize Your Changes
Synchronize the changes and run through the test plan.
[Entry Fields]
* Verify, Synchronize or Both [Both] +
Force synchronization if verification fails? [No] +
* Verify changes only? [No] +
* Logging [Standard] +
bondar hudson
D D
A A
[Entry Fields]
SSA Node Number [1] +#
Use the "smit ssaa" fastpath to get to AIX's SSA Adapters menu.
© Copyright IBM Corporation 2004
Configuring the tmssa Devices
This is a three-step process for a two-node cluster as each
node needs tmssa devices which refer to the other node:
1. run cfgmgr on one of the nodes (bondar).
bondar is now ready to respond to tmssa queries.
2. run cfgmgr on the other node (hudson).
hudson is now ready to respond to tmssa queries.
hudson also knows that bondar supports tmssa and has created the
tmssa devices (/dev/tmssa1.im and /dev/tmssa1.tm) which refer to
bondar.
3. run cfgmgr again on the first node (bondar).
bondar now also knows that hudson supports tmssa and has created
the tmssa devices (/dev/tmssa2.im and /dev/tmssa2.tm) which refer to
hudson.
bondar now has /dev/tmssa2.im /dev/tmssa2.tm devices which refer
to hudson
+--------------------------------------------------------------------------+
¦ Select a category ¦
¦ ¦
¦ Move cursor to desired item and press Enter. ¦
¦ ¦
¦ Add Discovered Communication Interface and Devices ¦
¦ Add Predefined Communication Interfaces and Devices ¦
¦ ¦
¦ F1=Help F2=Refresh F3=Cancel ¦
¦ F8=Image F10=Exit Enter=Do ¦
F1¦ /=Find n=Find Next ¦
F9+--------------------------------------------------------------------------+
+--------------------------------------------------------------------------+
¦ Select a category ¦
¦ ¦
¦ Move cursor to desired item and press Enter. ¦
¦ ¦
¦ # Discovery last performed: (Feb 12 18:20) ¦
¦ Communication Interfaces ¦
¦ Communication Devices ¦
¦ ¦
¦ F1=Help F2=Refresh F3=Cancel ¦
¦ F8=Image F10=Exit Enter=Do ¦
F1¦ /=Find n=Find Next ¦
F9+--------------------------------------------------------------------------+
[Entry Fields]
* Verify, Synchronize or Both [Both] +
Force synchronization if verification fails? [No] +
* Verify changes only? [No] +
* Logging [Standard] +
2. Which of the following are not supported by HACMP 5.1? (select all that
apply)
a. Cascading resource group with IPAT via IP aliasing.
b. Custom resource group with IPAT via IP replacement.
c. HWAT in a resource group which uses IPAT via IP aliasing.
d. HWAT in a custom resource group.
e. More than three custom resource groups in a two node cluster.
3. Which of the following sequences of steps implement HWAT in a cluster
currently using custom resource groups?
a. Delete custom RGs, define cascading RGs, places resources in new RGs, disable IPAT
via IP aliasing on network, delete old service IP labels, define new service IP labels,
synchronize
b. Delete custom RGs, define cascading RGs, places resources in new RGs, delete old
service IP labels, disable IPAT via IP aliasing on network, define new service IP labels,
synchronize
c. Delete custom RGs, disable IPAT via IP aliasing on network, delete old service IP labels,
define new service IP labels, define cascading RGs, places resources in new RGs,
synchronize
d. Delete custom RGs, delete old service IP labels, disable IPAT via IP aliasing on network,
define new service IP labels, define cascading RGs, places resources in new RGs,
synchronize
2. Which of the following are not supported by HACMP 5.1? (select all that
apply)
a. Cascading resource group with IPAT via IP aliasing.
b. Custom resource group with IPAT via IP replacement.
c. HWAT in a resource group which uses IPAT via IP aliasing.
d. HWAT in a custom resource group.
e. More than three custom resource groups in a two node cluster.
3. Which of the following sequences of steps implement HWAT in a cluster
currently using custom resource groups? *
a. Delete custom RGs, define cascading RGs, places resources in new RGs, disable IPAT
via IP aliasing on network, delete old service IP labels, define new service IP labels,
synchronize
b. Delete custom RGs, define cascading RGs, places resources in new RGs, delete old
service IP labels, disable IPAT via IP aliasing on network, define new service IP labels,
synchronize
c. Delete custom RGs, disable IPAT via IP aliasing on network, delete old service IP labels,
define new service IP labels, define cascading RGs, places resources in new RGs,
synchronize
d. Delete custom RGs, delete old service IP labels, disable IPAT via IP aliasing on network,
define new service IP labels, define cascading RGs, places resources in new RGs,
synchronize
*Old service IP labels must be deleted before disabling IPAT via IP aliasing and new service IP labels
must exist before they can be placed into the resource groups.
© Copyright IBM Corporation 2004
Unit Summary
Having completed this unit, you should be able to:
Configure HACMP 5.2
Use Standard and Extended Configuration paths
Two-Node Cluster Configuration Assistant
Configure HACMP Topology to include:
IP-based networks enabled for address takeover via both alias and
replacement
Non-IP networks (rs232, tmssa, diskhb)
Hardware Address Takeover
Configure HACMP Resources:
Create resource groups using startup, fallover, and fallback policies
Add and remove resource groups and nodes on an existing cluster
Take a snapshot
Remove a cluster
Start and stop the cluster on one or more cluster nodes
Continuous
Availability
Continuous High
Operations Availability
Initiating
node
Target Target
node node
[Entry Fields]
Select nodes by Resource Group [] +
*** No selection means all nodes! ***
[Entry Fields]
Select nodes by Resource Group
*** No selection means all nodes! ***
[Entry Fields]
Node Name(s) to which disk is attached bondar,hudson +
Device type disk +
Disk Type hdisk
Disk interface ssar
Description SSA Logical Disk Driv>
Parent ssar
* CONNECTION address [] +
Location Label []
ASSIGN physical volume identifier yes +
RESERVE disk on open yes +
Queue depth [] +
Maximum Coalesce [] +
[Entry Fields]
Node Names bondar,hudson
PVID 00055207bbf6edab 0000>
VOLUME GROUP name [bernhardvg]
Physical partition SIZE in megabytes 64 +
Volume group MAJOR NUMBER [207] #
HACMP 5.2-- Enable Cross-Site LVM Mirroring Verification false +
Warning :
Changing the volume group major number may result
in the command being unable to execute
successfully on a node that does not have the
major number currently available. Please check
for a commonly available major number on all nodes
before changing this setting.
Extended Configuration
The volume group must be online somewhere and listed in a resource group or it does
not appear in the pop-up list. © Copyright IBM Corporation 2004
Creating a Shared File System (2 of 2)
Then create the filesystem in the now "previously defined logical volume".
[Entry Fields]
Node Names bondar,hudson
LOGICAL VOLUME name norbertfs
* MOUNT POINT [/norbert]
PERMISSIONS read/write +
Mount OPTIONS [] +
Start Disk Accounting? no +
Fragment Size (bytes) 4096 +
Number of bytes per inode 4096 +
Allocation Group Size (MBytes) 8 +
VGDA = ODM
#
#importvg -V123 -L sharedvg hdisk3
#chvg -an sharedvg
#varyoffvg sharedvg
12 12
11 1 11 1
10 2 10 2
9 3 9 3
8 4 8 4
7 5 7 5
6 6
+--------------------------------------------------------------------------+
¦ File System Name ¦
¦ ¦
¦ Move cursor to desired item and press Enter. ¦
¦ ¦
¦ # Resource Group File System ¦
¦ adventure /norbert ¦
¦ discovery /ron ¦
¦ ¦
¦ F1=Help F2=Refresh F3=Cancel ¦
¦ F8=Image F10=Exit Enter=Do ¦
F1¦ /=Find n=Find Next ¦
F9+--------------------------------------------------------------------------+
[Entry Fields]
Resource Group Name discovery
File system name /ron
NEW mount point [/ron]
SIZE of file system [4000000]
Mount GROUP []
Mount AUTOMATICALLY at system restart? no +
PERMISSIONS read/write +
Mount OPTIONS [] +
Start Disk Accounting? no +
Fragment Size (bytes) 4096
Number of bytes per inode 4096
Compression algorithm no
*This foil describes how priority override locations work for nonconcurrent resource
groups. See the HACMP 5.2 Administration and Troubleshooting Guide
(SC-23-4862-03) for information on how priority override locations work for concurrent
access resource groups
[Entry Fields]
Resource Group to be Moved adventure
Destination Node hudson
Persist Across Cluster Reboot? false +
[Entry Fields]
Resource Group to Bring Offline adventure
Node On Which to Bring Resource Group Offline bondar
Persist Across Cluster Reboot? false +
[Entry Fields]
Resource Group to Bring Online adventure
Destination Node bondar
Persist Across Cluster Reboot? false +
2. True or False?
C-SPOC reduces the need for a change management process.
4. True or False?
It does not matter which node in the cluster is used to initiate a C-SPOC operation.
5. Which log file provides detailed output on HACMP event script execution?
a. /tmp/clstrmgr.debug
b. /tmp/hacmp.out
c. /var/adm/cluster.log
2. True or False?
C-SPOC reduces the need for a change management process.
4. True or False?
It does not matter which node in the cluster is used to initiate a C-SPOC operation.
5. Which log file provides detailed output on HACMP event script execution?
a. /tmp/clstrmgr.debug
b. /tmp/hacmp.out
c. /var/adm/cluster.log
Topology Changes
Adding or removing cluster nodes
Adding or removing networks
Adding or removing communication
interfaces or devices
Swapping a communication interface's IP
address
Resource Changes
All resources can be changed
Topology Changes
Change the name of the cluster
Change the cluster ID*
Change the name of a cluster node
Change a communication interface attribute
Changing whether or not a network uses
IPAT via IP aliasing or via IP replacement
Change the name of a network module*
Add a network interface module*
Removing a network interface module*
Resource Changes
Change the name of a resource group
Change the name of an application server
Change the node relationship
DARE cannot run if two nodes are not at the same HACMP level
© Copyright IBM Corporation 2004
So How Does DARE Work?
DARE uses the three separate copies of the ODM in order to
allow changes to be propagated to all nodes whilst the cluster is
active.
1 2 3 4 5
change topology synchronize topology snapshot taken of cluster manager reads SCD is deleted
or resources in smit or resources in smit the current ACD ACD and refreshes
HACMP HACMP
Move cursor to desired item and press Enter. Move cursor to desired item and press Enter.
SCD
F1=Help F2=Refresh F3=Cancel Esc+8=Image F1=Help F2=Refresh F3=Cancel Esc+8=Image
Esc+9=Shell Esc+0=Exit Enter=Do Esc+9=Shell Esc+0=Exit Enter=Do
HACMP Verification
View Current State
HACMP Log Viewing and Management
Recover From HACMP Script Failure
Restore HACMP Configuration Database from Active Configuration
Release Locks Set By Dynamic Reconfiguration
Clear SSA Disk Fence Registers
HACMP Cluster Test Tool
HACMP Trace Facility
HACMP Event Emulation
HACMP Error Notification
[Entry Fields]
Cluster Snapshot Name jami
Cluster Snapshot Description Cuz -- he did the lab>
Un/Configure Cluster Resources? [Yes] +
Force apply if verify fails? [No] +
HACMP HACMP
Move cursor to desired item and press Enter. Move cursor to desired item and press Enter.
SCD
F1=Help F2=Refresh F3=Cancel Esc+8=Image F1=Help F2=Refresh F3=Cancel Esc+8=Image
Esc+9=Shell Esc+0=Exit Enter=Do Esc+9=Shell Esc+0=Exit Enter=Do
BANG!
HACMP Verification
View Current State
HACMP Log Viewing and Management
Recover From HACMP Script Failure
Restore HACMP Configuration Database from Active Configuration
Release Locks Set By Dynamic Reconfiguration
Clear SSA Disk Fence Registers
HACMP Cluster Test Tool
HACMP Trace Facility
HACMP Event Emulation
HACMP Error Notification
2. Which operations can DARE not perform (select all that apply)?
a. Changing the name of the cluster.
b. Removing a node from the cluster.
c. Changing a resource in a resource group.
d. Change whether a network uses IPAT via IP aliasing or via IP replacement.
3. True or False?
It is possible to roll back from a successful DARE operation using an automatically
generated snapshot.
4. True or False?
Running a DARE operation requires three separate copies of the HACMP ODM.
5. True or False?
Cluster snapshots can be applied while the cluster is running.
2. Which operations can DARE not perform (select all that apply)?
a. Changing the name of the cluster.
b. Removing a node from the cluster.
c. Changing a resource in a resource group.
d. Change whether a network uses IPAT via IP aliasing or via IP replacement.
3. True or False?
It is possible to roll back from a successful DARE operation using an automatically
generated snapshot.
4. True or False?
Running a DARE operation requires three separate copies of the HACMP ODM.
5. True or False?
Cluster snapshots can be applied while the cluster is running.
NFS Client
NFS mount
NFS Server
read-write
NFS mount
read-only
JFS mount
read-only
NFS mount
NFS Client and Server
shared_vg
© Copyright IBM Corporation 2004
NFS Background Processes
NFS uses TCP/IP and a number of background processes to allow
clients to access disk resource on a remote server.
Configuration files are used on the client and server to specify export
and mount options.
NFS Client
NFS Server
n x nfsd and mountd
n x biod
client system
# mount aservice:/fsa /a
The A resource group specifies:
client system sees /fsa as /a
aservice as a service IP label resource
/fsa as a filesystem resource
/fsa as a NFS filesystem to export
A /fsa
# mount /fsa
Bondar Hudson
© Copyright IBM Corporation 2004
NFS Fallover with HACMP
In this scenario, the resource group moves to the surviving node in the cluster,
which exports /fsa. Clients see NFS server not responding during fallover.
client system
/fsa A
# mount /fsa
Bondar Hudson
© Copyright IBM Corporation 2004
Configuring NFS for High Availability
[MORE...10]
/a /fsa /a
aservice
/a /fsa /a
client system
# mount aservice:/fsa /a
aservice
export /fsa
A /fsa
# mount /fsa
# mount aservice:/fsa /a # mount aservice:/fsa /a
Bondar Hudson
© Copyright IBM Corporation 2004
Choosing the Network for Cross-mounts
In a cluster with multiple IP networks, it may be useful to specify
which network should be used by HACMP for cross-mounts.
This is usually done as a performance enhancement.
net_ether_02
aGservice aservice
export /fsa
A /fsa
# mount /fsa
# mount aservice:/fsa /a # mount aservice:/fsa /a
Bondar Hudson
© Copyright IBM Corporation 2004
Configuring HACMP for Cross-mounting
[MORE...10]
/a;/fsa
# mount aservice:/fsa /a
What HACMP does
(on each node in the resource group)
The command 'lvlstmajor' will list the available major numbers for each node in the cluster.
For example:
# lvlstmajor
43,45...99,101...
The VG major number may be set at the time of creating the VG using smit mkvg or by
using the -V flag on the importvg command. For example:
# importvg -V100 -y shared_vg_a hdisk2
C-SPOC will "suggest" a VG major number which is unique across the nodes
when it is used to create a shared volume group.
4. True or False?
HACMP's NFS exporting feature only supports clusters of two nodes.
5. True or False?
IPAT is required in resource groups which export NFS filesystems.
4. True or False?**
HACMP's NFS exporting feature only supports resource groups with two nodes.
5. True or False?
IPAT is required in resource groups which export NFS filesystems.
*/usr/es/sbin/cluster/exports must be used to specify NFS export options if the default of
"read-write to the world" is not acceptable.
**Resource groups larger than two nodes which export NFS filesystems do not provide full NFS
functionality (for example, NFS file locks are not preserved across a fallover).
© Copyright IBM Corporation 2004
Unit Summary
Having completed this unit, you should be able to:
Explain the concepts of Network File System (NFS)
Configure HACMP to support NFS
Understand why Volume Group major numbers must be unique
when using NFS with HACMP
Outline the NFS configuration parameters for HACMP
Event Manager
Recovery
clcallev HACMP Event
Command
HACMP Event
No Counter Yes
RC=0
>0
Boom!
Yes
ODM
HACMP-
Post-Event Script (1) Post-Event Script (n)
Classes
Notify Command
[Entry Fields]
* Cluster Event Name [stop_printq]
* Cluster Event Description [stop the print queues]
* Cluster Event Script Filename [/usr/local/cluster/events/stop_printq]
[Entry Fields]
Notify Command []
Pre-event Command [] +
Post-event Command [stop_printq] +
Recovery Command []
* Recovery Counter [0] #
Recovery
HACMP Event
Command
No Counter Yes
RC=0
>0
Yes
[Entry Fields]
Notify Command []
Pre-event Command [] +
Post-event Command [] +
Recovery Command [/usr/local/bin/recover]
* Recovery Counter [3] #
#!/bin/ksh
[Entry Fields]
HIGH water mark for pending write I/Os per file [33] +#
LOW water mark for pending write I/Os per file [24] +#
[Entry Fields]
syncd frequency (in seconds) [10] #
CPU
COMMAND STATUS
[TOP]
bondar:
bondar: HACMP Resource Error Notify Method
bondar:
bondar: hdisk0 /usr/es/sbin/cluster/diag/cl_failover
bondar: scsi0 /usr/es/sbin/cluster/diag/cl_failover
bondar: hdisk11 /usr/es/sbin/cluster/diag/cl_logerror
bondar: hdisk5 /usr/es/sbin/cluster/diag/cl_logerror
bondar: hdisk9 /usr/es/sbin/cluster/diag/cl_logerror
bondar: hdisk7 /usr/es/sbin/cluster/diag/cl_logerror
bondar: ssa0 /usr/es/sbin/cluster/diag/cl_logerror
hudson:
hudson: HACMP Resource Error Notify Method
[MORE...9]
[Entry Fields]
* Notification Object Name []
* Persist across system restart? No +
Process ID for use by Notify Method [] +#
Select Error Class None +
Select Error Type None +
Match Alertable errors? None +
Select Error Label [] +
Resource Name [All] +
Resource Class [All] +
Resource Type [All] +
* Notify Method []
Mo+--------------------------------------------------------------------------+
¦ Error Label to Emulate ¦
¦ ¦
¦ Move cursor to desired item and press Enter. ¦
¦ ¦
¦ [TOP] ¦
¦ SSA_DISK_ERR3 SSA_DISK_DET_ER ¦
¦ LVM_SA_QUORCLOSE bernhardvg ¦
¦ LVM_SA_QUORCLOSE xwebvg ¦
¦ LVM_SA_QUORCLOSE rootvg ¦
¦ SERVICE_EVENT diagela_SE ¦
¦ FCP_ARRAY_ERR6 fcparray_err ¦
¦ DISK_ARRAY_ERR2 ha_hdisk0_0 ¦
¦ DISK_ARRAY_ERR3 ha_hdisk0_1 ¦
¦ DISK_ARRAY_ERR5 ha_hdisk0_2 ¦
¦ DISK_ERR2 ha_hdisk0_3 ¦
¦ [MORE...39] ¦
¦ ¦
¦ F1=Help F2=Refresh F3=Cancel ¦
¦ F8=Image F10=Exit Enter=Do ¦
F1¦ /=Find n=Find Next ¦
F9+--------------------------------------------------------------------------+
[Entry Fields]
Error Label Name LVM_SA_QUORCLOSE
Notification Object Name xwebvg
Notify Method /usr/es/sbin/cluster/>
Description
QUORUM LOST, VOLUME GROUP CLOSING
Probable Causes
PHYSICAL VOLUME UNAVAILABLE
Detail Data
MAJOR/MINOR DEVICE NUMBER
00C9 0000
QUORUM COUNT
0
ACTIVE COUNT
0
SENSE DATA
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
---------------------------------------------------------------------------
2. Which of the following runs if an HACMP event script fails? (select all that
apply)
a. Pre-event scripts.
b. Post-event scripts.
c. error notification methods.
d. recovery commands.
e. notify methods.
3. What are the recommended values for I/O pacing high and low water
marks?
a. 33,48
b. 48,33
c. 33,24
d. 24,33
4. True or False?
All clusters must be tuned for high availability.
5. True or False?
Writing error notification methods is a normal part of configuring a cluster.
2. Which of the following runs if an HACMP event script fails? (select all that
apply)
a. Pre-event scripts.
b. Post-event scripts.
c. error notification methods.
d. recovery commands.
e. notify methods.
3. What are the recommended values for I/O pacing high and low water
marks?
a. 33,48
b. 48,33
c. 33,24
d. 24,33
4. True or False? *
All clusters must be tuned for high availability.
5. True or False?
Writing error notification methods is a normal part of configuring a cluster.
*The HACMP documentation recommends that you tune the I/O pacing and syncd parameters.
You may experience "difficulties" getting support until you do this.
© Copyright IBM Corporation 2004
Unit Summary
Having completed this unit, you should be able to:
Understand the requirements for application server start and stop
scripts
Perform basic cluster customizations
Change HACMP tuning parameters
Monitor other devices outside the control of HACMP
85.0%
X X A
Halifax Vancouver
© Copyright IBM Corporation 2004
Test Your Cluster before Going Live!
Careful testing of your production cluster before going live reduces
the risk of problems later.
An example test plan might include:
Node Fallover
Network Adapter Swap
IP Network Failure
SSA Adapter Failure
Disk Failure
Clstrmgr Killed
Serial Network Failure
SCSI Adapter for rootvg Failure
Application Failure
Partitioned Cluster
[Entry Fields]
* Automatic cluster configuration verification Enabled +
Node name Default +
* HOUR (00 - 23) [00] +#
[Entry Fields]
* Verify, Synchronize or Both [Both] +
* Automatically correct errors found during [No] +
verification?
* Force synchronization if verification fails? [No] +
* Verify changes only? [No] +
* Logging [Standard] +
Failed/Joined Node
A
Halifax Vancouver
© Copyright IBM Corporation 2004
Emulating a Network Down Event
[Entry Fields]
* Network Name [net_ether_01] +
Node Name [] +
Node Up Event
Node Down Event
Network Up Event
Network Down Event
Fail Standby Event
Join Standby Event
Swap Adapter Event
[Entry Fields]
* Node Name [hudson] +
* Node Down Mode graceful +
# lssrc -g cluster
Subsystem Group PID Status
clstrmgrES cluster 21032 active
clsmuxpdES cluster 17196 active
clinfoES cluster 21676 active
clstrmgr
Mandatory
Cluster
clsmuxpd Components
clinfo
Mandatory Optional
#
# lssrc -g topsvcs
Subsystem Group PID Status
topsvcs topsvcs 12230 active
# lssrc -g grpsvcs
Subsystem Group PID Status
grpsvcs grpsvcs 11736 active
grpglsm grpsvcs 12742 active
# lssrc -g emsvcs
Subsystem Group PID Status
emsvcs emsvcs 12934 active
emaixos emsvcs 13184 active
# lssrc -s clcomdES
Subsystem Group PID Status
clcomdES clcomdES 13420 active
# lssrc -s ctrmc
Subsystem Group PID Status
ctrmc rsct 2954 active
#
1. Isolate the cause of excessive I/O traffic and fix it, and if that does not
work...
2. Turn on I/O pacing, and if that does not work...
3. Increase the frequency of the syncd, and if that does not work...
4. Reduce the failure detection rate for the slowest network, and if that
does not work...
5. Buy a bigger machine
It means that an event script has failed, hung or is taking too long.
HACMP stops processing events until you resolve this issue
[Entry Fields]
Max. Event-only Duration (in seconds) [180] #
Max. Resource Group Processing Time (in seconds) [180] #
HACMP Verification
View Current State
HACMP Log Viewing and Management
Recover From HACMP Script Failure
Restore HACMP Configuration Database from Active Configuration
Release Locks Set By Dynamic Reconfiguration
Clear SSA Disk Fence Registers
HACMP Trace Facility
+--------------------------------------------------------------------------+
¦ Select a Node ¦
¦ ¦
¦ Move cursor to desired item and press Enter. ¦
¦ ¦
¦ bondar ¦
¦ hudson ¦
¦ ¦
¦ F1=Help F2=Refresh F3=Cancel ¦
¦ F8=Image F10=Exit Enter=Do ¦
F1¦ /=Find n=Find Next ¦
F9+--------------------------------------------------------------------------+
Item Checked
2. True or False?
Event emulation can emulate all cluster events.
3. If the cluster manager process should die, what will happen to the cluster
node?
a. It continues running but without HACMP to monitor and protect it.
b. It continues running AIX but any resource groups will fallover.
c. Nobody knows because this has never happened before.
d. The System Resource Controller sends an e-mail to root and issue a "halt -q".
e. The System Resource Controller sends an e-mail to root and issue a "shutdown -F".
4. True or False?
A non-IP network is strongly recommended. Failure to include a non-IP network can cause the
cluster to fail or malfunction in rather ugly ways.
5. (bonus question) my favorite graphic in the lower right hand corner of a foil
was: ____________________________________
2. True or False?
Event emulation can emulate all cluster events.
3. If the cluster manager process should die, what will happen to the cluster
node?
a. It continues running but without HACMP to monitor and protect it.
b. It continues running AIX but any resource groups will fallover.
c. Nobody knows because this has never happened before.
d. The System Resource Controller sends an e-mail to root and issue a "halt -q".
e. The System Resource Controller sends an e-mail to root and issue a "shutdown -F".
4. True or False?
A non-IP network is strongly recommended. Failure to include a non-IP network can cause the
cluster to fail or malfunction in rather ugly ways.
*The correct answer is almost certainly "cluster administrator error" although "poor/inadequate
cluster design" would be a very close second.
© Copyright IBM Corporation 2004
Unit Summary
Having completed this unit, you should be able to:
Understand why HACMP can fail
Identify configuration and administration errors
Understand why the Dead Man's Switch invokes
Know when the System Resource Controller will kill a node
Isolate and recover from failed event scripts
Correctly escalate a problem to IBM support
[Entry Fields]
* File Name <hacmp/log/cluster.haw] /
Cluster Notes []
2. True or False?
Each requirement in the requirements document needs a design element to explain how it will
be satisfied and a documented test to show how it is verified.
3. True or False?
Each test should test one and only one design feature.
5. True or False?
Proper cluster design documentation is a waste of time because nobody keeps it up-to-date.
6. True or False?
The aspect of cluster design which generally receives the most attention is understanding and
then documenting how the application operates within the cluster.
2. True or False?
Each requirement in the requirements document needs a design element to explain how it will
be satisfied and a documented test to show how it is verified.
3. True or False?
Each test should test one and only one design feature.
5. True or False?*
Proper cluster design documentation is a waste of time because nobody keeps it up-to-date.
6. True or False?
The aspect of cluster design which generally receives the most attention is understanding and
then documenting how the application operates within the cluster.
*Even if it is not kept up-to-date, proper cluster design documentation will be very useful in ensuring that
the cluster is at least initially configured correctly. Failure to keep the cluster documentation up-to-date will
probably eventually result in "accidental" outages.
© Copyright IBM Corporation 2004
Unit Summary
Having completed this unit, you should be able to:
Explain the importance of cluster planning
Describe the key cluster planning deliverables
Requirements document
Design document
Test plan
Documented operational procedures
Explain how the requirements, design and test plan documents
should be linked together
Use the export to planning worksheets feature of HACMP 5.2
cover
HACMP Systems
Administration I: Planning and
Implementation
(Course Code AU54)
Trademarks
IBM® is a registered trademark of International Business Machines Corporation.
The following are trademarks of International Business Machines Corporation in the United
States, or other countries, or both:
AFS AIX AIX 5L
Cross-Site DB2 DB2 Universal Database
DFS Enterprise Storage Server HACMP
NetView POWERparallel pSeries
Redbooks Requisite RS/6000
SP Tivoli TME
TME 10 Versatile Storage Server WebSphere
Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the
United States, other countries, or both.
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft
Corporation in the United States, other countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other
countries.
Other company, product and service names may be trademarks or service marks of others.
The information contained in this document has not been submitted to any formal IBM test and is distributed on an “as is” basis without
any warranty either express or implied. The use of this information or the implementation of any of these techniques is a customer
responsibility and depends on the customer’s ability to evaluate and integrate them into the customer’s operational environment. While
each item may have been reviewed by IBM for accuracy in a specific situation, there is no guarantee that the same or similar results will
result elsewhere. Customers attempting to adapt these techniques to their own environments do so at their own risk.
© Copyright International Business Machines Corporation 1998, 2004. All rights reserved.
This document may not be reproduced in whole or in part without the prior written permission of IBM.
Note to U.S. Government Users — Documentation related to restricted rights — Use, duplication or disclosure is subject to restrictions
set forth in GSA ADP Schedule Contract with IBM Corp.
V3.1.0.1
Instructor Exercises Guide
TOC Contents
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Exercise Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
TMK Trademarks
The reader should recognize that the following terms, which appear in the content of this
training document, are official trademarks of IBM or other companies:
IBM® is a registered trademark of International Business Machines Corporation.
The following are trademarks of International Business Machines Corporation in the United
States, or other countries, or both:
AFS® AIX® AIX 5L™
Cross-Site® DB2® DB2 Universal Database™
DFS™ Enterprise Storage Server® HACMP™
NetView® POWERparallel® pSeries®
Redbooks™ Requisite® RS/6000®
SP™ Tivoli® TME®
TME 10™ Versatile Storage Server™ WebSphere®
Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the
United States, other countries, or both.
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft
Corporation in the United States, other countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other
countries.
Other company, product and service names may be trademarks or service marks of others.
Introduction
The scenario that the exercises are based on is a company which is
amalgamating its computer sites to a single location. It is intended to
consolidate computer sites from two cities into one situated roughly in
the middle of the original two. The case study has been designed
around five randomly chosen countries in the world. These countries
and city configurations have been tested in our environment but we
offer the choice to use your own. On to the scenario.
Required Materials
Your imagination.
Paper or a section of a white board.
Exercise Instructions
Preface
For this example we use the Canada cluster. The original configuration was one computer
located in Halifax and one in Calgary. The systems have been named by their city
designation to keep them straight. The corporate Web server resides on Halifax,. Currently
the systems are running on internal disks, on systems too small for the task. As part of the
consolidation new systems are used. These new systems are to be configured in such a
manner as to provide as close to 7x24x365 access to the Web server as possible with
pSeries technology. Corporate marketing is about to launch a major initiative to promote a
new product solely available on the Web. The corporate management has insisted that this
project is successful, and that the new computer center in Regina resolves all of the issues
of reliability that thus far have caused great corporate embarrassment. All eyes are focused
on this project.
A project briefing has been called by the senior executive to get an overview of how the
funds for the equipment are applied.
Your task is to prepare for that meeting to present a solution.
Exercise Steps
__ 1. Draw each of the computer systems as described.
__ 2. Add the applications to the nodes.
__ 3. Add a network connection to each system for access to the outside world.
__ 4. Evaluate the lack of high availability of the initial drawing of the two separate
systems.
__ 5. Combine the services of the existing networks resulting in a single network.
__ 6. Add new SSA disks to your drawing, showing cable connections.
__ 7. Make the disks highly available, RAID/mirror, redundant disks.
__ 8. Define the resources as described in the text.
__ 9. Define the characteristics of the resources.
__ 10. Indicate how the resources fail and recover.
__ 11. Make the diagram simple to understand.
END OF LAB
Introduction
There may be differences in the documentation and the real machines
in the classroom environment. The CPUs, network type, and type of
disk units have been selected to provide a consistent experience but a
variety of equipment may be used. Please ask if you have any
questions.
Note: Throughout this lab the terms shared volume group, shared file
system, node and client refer to components of your HACMP cluster.
The convention of <name> is to be substituted with the appropriate
thing. The example references a generic cluster’s naming of these
components. Some names in your cluster may be different from that
indicated in the notes.
Below is a picture of the generic cluster for this lab. The
communications path may be Ethernet, Token-Ring, FDDI, or any
other network supported by HACMP. There must also be a non IP
serial network-- either RS232, target mode SSA or heartbeat over
disk. The minimum requirement is that there are at least four shared
disks (SCSI, Fiber Channel or SSA) connected to a shared bus so that
two volume groups may be created and passed between nodes. If
adequate disks can be provided for the purposes of mirroring and
EXempty
rootvg rootvg
4.8 GB VG = 4.8 GB
VG =
AU545.0
Exercise Instructions
Part 1: Examine the Cluster Environment and Complete the Cluster
Component Worksheets with Storage Information
Using the cluster component worksheets (located at the end of this exercise), record the
information as listed in the following steps.
__ 1. Write down your team number here: ____. In these lab exercises you must replace
the symbol # with your team number unless otherwise noted.
__ 2. Log in as root on both of your cluster nodes. The root password will be provided by
your instructor.
__ 3. Identify and record in the cluster components worksheet the device names and
location codes of the disk adapters.
__ 4. Identify and record in the cluster components worksheet the device names and
location codes of the external disks (hdisks and pdisk). Note: The external disks
may not have PVIDs on them at this time.
__ 5. Identify and record in the cluster components worksheet the device names and
location codes of the internal disks.
__ 6. The storage needs to be divided into two volume groups. Size of the volume groups
is not important. In a real environment, disks should be mirrored and quorum issues
addressed. Here the emphasis is on the operation of HACMP not how the storage is
organized. You should have four disks so feel free to set up a mirror on one of the
volume groups. Different methods of configuring the disks are going to be used
through out the exercises. Decide on the organization but only create the volume
groups when directed to.
__ 7. Identify and update the cluster planning worksheets with the names of 2 shared
volume groups. Use the following names or choose your own.
»
» shared_vg_a
» shared_vg_b
__ 8. Identify and update the cluster component worksheets the LV component names to
have a shared file system in each of the two volume groups. Select names for the
logical volumes, jfs logs and filesystems. Use the following names or choose your
own.
» data lv’s shared_jfslv_a, shared_jfslv_b
» jfslog lv’s shared_jfslog_a, shared_jfslog_b
» file systems shared_fs_a, shared_fs_b
EXempty __ 9. Now add the just the storage information to the generic cluster diagram of your
cluster. This diagram can be found in Appendix A (there are two blank ones after the
filled in one. One is for in class and the other is to take home). On the other hand
you may want to just compare the information on your component worksheets to the
filled in worksheet at the beginning of Appendix A.
• Only fill in what you know -- the LVM information-- at the bottom of the diagram.
GO NOW TO EXERCISE 3. You return to Part 2 after the lecture for the unit on network
planning.
» toronto#-per 192.168.#3.2
» appA#-svc 192.168.#3.10
» appB#-svc 192.168.#3.11
__ 13. The IP network name is generated by HACMP.
__ 14. Identify and update the cluster components worksheets the name for your cluster
(any string without spaces, up to 32 characters) using the following or choose your
own.
» cluster name is canada#
__ 15. Identify and update the cluster components worksheet the device names and
location codes of the serial ports.
__ 16. The non-IP network name is generated by HACMP.
__ 17. At this point in time most of the names for the various cluster components should
have been selected and populated on the cluster component worksheets. It is
important to have a clear picture of the various names of these components as you
progress through the exercises.
__ 18. Now add the networking information to the generic cluster diagram of your cluster.
This diagram can be found in Appendix A (there are two blank ones after the filled in
one. One is for in class and the other is to take home). On the other hand you may
want to just compare the information on your component worksheets to the filled in
worksheet at the beginning of Appendix A.
• Only fill in what you know -- cluster name, node names (halifax#, toronto#), and
IP information at the top.
***Internal Disk *** 16 Bit LVD SCSI Disk Drive hdiskX 10-80-00-4,0
Internal Disk 1
Internal Disk 2
Internal Disk 3
***Internal Disk *** 16 Bit LVD SCSI Disk Drive hdiskX 10-80-00-4,0
Internal Disk 1
Internal Disk 2
Internal Disk 3
EXempty
Table 3: Shared Components Worksheet
Component Description Value
Shared vg 1 ---------------------N/A-----------------
Shared jfs log 1 --------------------N/A------------------
Shared jfs lv 1 --------------------N/A------------------
Shared filesystem 1 --------------------N/A------------------
-mount point --------------------N/A------------------
Shared vg 2 --------------------N/A------------------
Shared jfs log 2 --------------------N/A------------------
Shared jfs lv 2 --------------------N/A------------------
Shared filesystem 2 --------------------N/A------------------
-mount point --------------------N/A------------------
REPLACEMENT node2:
Service Label/address
Hardware Address ---------------------N/A-----------------
Introduction
The next phase in our scenario is to provide the storage for the highly
available application. We require a filesystem to store the Web pages
on that can be accessed by each machine when that machine is the
active node.
To support the passing of a filesystem between nodes there must be a
volume group, logical volume, and a logical volume for the jfs log.
There are several methods to accomplish this task. Two are going to
be explored during the exercises. First, a manual creation to
emphasize the necessary steps in the process and second, in a later
exercise, an automated cluster aware method will be explored during
the C-SPOC exercise.
Required Materials
• Cluster Planning Worksheets and cluster diagram from the
previous exercise.
• Shared disk storage connected to both nodes.
Exercise Instructions
Configure Volume Group
__ 1. With your cluster planning sheets available, begin the configuration.
__ 2. Log in to both nodes as root.
__ 3. Verify that both nodes have the same number of disks.
__ 4. Identify the internal and shared disks from the cluster worksheet. These disks might
or might not have PVIDs on them.
If they match between the two systems, then you can skip to step 10.
__ 5. On both systems delete only the external hdisks.
__ 6. On one system add all of PVIDs back in.
__ 7. On the other system update the PVIDs.
__ 8. Verify the PVIDs were updated.
__ 9. The hdisks and PVIDs should match on both systems.
__ 10. Find a VG major number not used on either node __________.
__ 11. Go to your halifax# node. Create an Enhanced Concurrent Volume Group called
shared_vg_a. This will be the volume group for appA#’s shared data.
__ 12. Vary on the volume group and create a jfslog logical volume with a name of
shared_jfslog_a. The type is to be jfslog. Only one lp is required.
__ 13. Format the jfslog logical volume.
__ 14. Create a logical volume for data called shared_jfslv_a.
__ 15. Create a filesystem called shared_fs_a using the Add a Journaled File System on a
previously defined logical volume. The mount point should be /shared_fs_a and the
filesystem should not be automatically activated on system restart.
__ 16. Verify the filesystem can be mounted manually.
__ 17. Check the correct log file is active. If you have a loglv00 then you might not have
formatted the jfs log before you created the jfs.
__ 18. Umount the filesystem.
__ 19. Vary off the volume group.
__ 20. On your Toronto# node, import the volume group using the major number, hdisk and
volume group information. The VG name must be the same as the system it was
created on.
__ 21. Set the autovaryon flag to “off” for the volume group.
__ 22. Mount the filesystem on the second node and verify it functions.
__ 23. Check the correct log file is active.
END OF LAB
Introduction
This section establishes the communication networks required for
implementing HACMP. Networking is an important component of
HACMP, so all related aspects are configured and tested. The
information used in this exercise is derived from the previous exercise.
© Copyright IBM Corp. 1998, 2004 Exercise 4. Network Setup and Test 4-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Exercises Guide
rootvg rootvg
4.8 GB VG = 4.8 GB
VG =
Required Materials
• Cluster Planning Worksheets and cluster diagram from exercise 2.
Using tty
__ 14. On both nodes check the device configuration of the unused tty device. If the tty
device does not exist, create it. If it does exist, ensure that a getty is not spawned, or
better still, delete it and redefine.
__ 15. Test the non IP communications:
i. On one node execute stty < /dev/tty# where # is your tty number.
ii. The screen appears to hang. This is normal.
© Copyright IBM Corp. 1998, 2004 Exercise 4. Network Setup and Test 4-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Exercises Guide
iii. On the other node execute “stty </dev/tty#” where # is your tty number.
iv. If the communications line is good, both nodes return their tty settings.
Using SSA
__ 16. If using target-mode SSA for your non IP network, then check if the prerequisites are
there. A unique node number must be set and the device driver must be installed. If
not add it.
__ 17. Test the non IP communication using SSA.
END OF LAB
© Copyright IBM Corp. 1998, 2004 Exercise 5. HACMP Software Installation 5-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Exercises Guide
Exercise Instructions
Preface
• This exercise is composed of two parts, system capacity checks and software
installation.
EXempty - cluster.es.clvm
- cluster.es.cspoc
- cluster.license
- cluster.man.en_US.es
- cluster.msg.en_US.cspoc (lower case en)
- cluster.msg.en_US.es
__ 9. If the HACMP packages pass the prerequisite check, set preview to no and install
the HACMP filesets. If there is a prerequest failure, notify your Instructor.
__ 10. Install HACMP maintenance. Check the /usr/sys/inst.images directory for an HA
updates directory (in many classes it will be the subdirectory ./ha52/ptf1). If you
have questions, ask the instructor.
__ 11. Reboot the nodes.
__ 12. Verify the SMIT menus. Check to see if the HACMP screens are available.
__ 13. (Optional) It would be a good idea to set up your /.profile to include paths to the
HACMP commonly used commands so that you don’t have to keep entering full path
names in the later lab exercises.
__ 14. (Very Optional) If the nodes have a tape subsystem attached, now would be a good
time for a mksysb backup.
__ 15. Ensure Part 2 is also performed for your other node toronto#.
END OF LAB
© Copyright IBM Corp. 1998, 2004 Exercise 5. HACMP Software Installation 5-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Exercises Guide
Exercise Review/Wrapup
This is a good place to stop for a backup.
Introduction
Our scenario has a Web server to be made highly available. We are
required to test the availability traits of the Web server. This exercise
creates a client to test from.
Required Materials
HACMP planning sheets.
AIX bonus pack
Client machine
Exercise Instructions
Preface
• All exercises of this chapter depend on the availability of specific equipment in your
classroom.
• Replace the symbol # with your team number.
EXempty The next three steps prepare you to use clinfoES from the client machine after HACMP is
started in the next exercise.
__ 12. Copy the clstat.cgi script from /usr/es/sbin/cluster to the /var/docsearch/cgi-bin
directory.
__ 13. Verify that the file /var/docsearch/cgi-bin/clstat.cgi is world-executable (755 or
rwxr-xr-x)
__ 14. Test access to clstat.cgi using the URL
http://localhost:49213/cgi-bin/clstat.cgi <-- you should get a window with the
message “Could not initialize clinfo connection”.
__ 15. Put the cluster nodes ip address (that is, halifax#-per and toronto#-per) into the
/usr/es/sbin/cluster/etc/clhosts file. Make sure you can ping these addresses.
__ 16. Reboot and do the ping tests to verify that this client machine functions as expected.
END OF LAB
Exercise Review/Wrapup
We have the client all set and ready to go with communication checked, and name
resolution.
Introduction
The scenario is expanding you now create a custom resource group.
This is the beginning of making an application highly available.
Required Materials
Cluster planning worksheets.
Exercise Instructions
Remember this?
EXempty __ 24. There is another option on the clstat command, the - r# option. This option sets the
refresh rate of the information. For the lab environment “-r 10 “may be a more
appropriate value. Restart clstat with the -r 10 option.
__ 25. Now start Netscape and make sure that the URL to clstat.cgi is working properly.
• The URL is http://localhost:49213/cgi-bin/clstat.cgi
• You should now see a window with cluster information displayed. Be patient if
this window shows that the cluster is unstable.
• Take a moment to familiarize yourself with what you are looking at. Click on the
resource group name app#
• You will use this session to monitor the failover testing that comes next (or you
can run clstat on one of your cluster nodes)
__ 26. Now go to your administrative node (halifax#) and stop it graceful. Watch what
happens in the clstat browser (be patient -- it may take 2 minutes).
__ 27. Now start HACMP and clinfo on BOTH nodes
__ 28. Use the lsvg command to see that the shared vg is varied on in passive mode on the
other node (toronto#).
__ 35. (Optional) - You may wish to swap the service address (and/or) persistent address
back by using C-SPOC.
__ 36. Using the console rather than a telnet session (because you will lose it), monitor the
hacmp.out file on the halifax#-if1x (left) node and disconnect both network cables at
the same time.
__ 37. There should be a network down event executed after a short period of time. What
happens to the resource group on the halifax (left) node, and why?
__ 38. Check the /tmp/hacmp.out file on the toronto# node, it should also have detected a
network failure.
__ 39. Restore both the network connections for the halifax# node. What event do you
observe happens?
__ 40. Where is the resource Group at this time? Verify that the IP labels, volume groups,
and file systems and application are available on that node.
__ 41. You are now going to move resources back from one node to the other. On the
halifax# node monitor the log. On the toronto# node execute smit clstop and stop
the cluster services with the mode of takeover. Leave the default value for the other
fields.
__ 42. The clstat.cgi should change colors from green to yellow (substate unstable,
toronto# leaving) and the state of the toronto# node and interfaces should change to
red (down).
__ 43. All of the components in the resource group should move over to the halifax# node.
Verify the IP labels, volume groups, and file systems on the halifax# node.
__ 44. On the toronto# node restart HACMP. Observe the /tmp/hacmp.out file on the
halifax# node and, of course, the clstat session. The resource group stays put.
END OF LAB
Introduction
The intention is not to become Web server programers but to simply
add an existing application to the HACMP environment. This is to
demonstrate one way to add an application to the HACMP
environment.
Required Materials
A running cluster
The AIX 5L Expansion pack
Exercise Instructions
Preface
• As part of this exercise, C-SPOC and DARE are used to enable the addition of
filesystems, applications and resource changes to the cluster while it is running. If all
things function as designed, no system reboots or HACMP restarts are required.
__ 7. On the other node (toronto#), repeat the previous step. Once installed, delete all of
the information in the directory /usr/HTTPServer/htdocs (only on this node!).
__ 8. Go back to the halifax# node. In the directory /usr/HTTPServer/conf/, edit httpd.conf
and change the “ServerName” variable to be the same as the service IP label
(appA#-svc).
Note: The hostname must be resolvable, that is, host hostname should return a
good answer. If the hostname is not resolvable, add the hostname to the 127.0.0.1
address as an alias. If in doubt, ask the Instructor. Remember to do this on both
nodes otherwise successful takeover does not happen.
__ 9. Use ftp to put a copy of the /usr/HTTPServer/conf/httpd.conf file on the toronto#
node.
EXempty __ 12. While the synchronizing takes place, monitor the HACMP logs until you see the
message start server http_server. Check to see that the Apache server started ok.
__ 13. From the client, start a new window in Netscape and connect to the URL
http://appA#-svc. The Web screen Welcome to the IBM HTTP Server window should
pop up.
__ 14. Perform a failover test by halting the Halifax# node in your favorite manner (for
example, “halt -q” or “echo bye > /dev/kmem”).
__ 15. Wait for takeover to complete and verify what happens to the Web server. Use the
page reload button on your Web browser to see if the Web server is really there.
__ 16. Bring up the Halifax# node again and start HACMP.
__ 17. What has happened to the Resource Group, and why?
END OF LAB
Optional Exercises
For the Web-enabled Candidates
__ 1. Change the Web server pages on the shared disk to prove the location of the data
elements.
END OF LAB
Introduction
In the scenario there are two resource groups to be made highly
available. The addition of the second resource group is done with the
C-SPOC commands with the cluster running.
Exercise Instructions
Preface
• Add the shared_res_grp_b resource group components according to the scenarios.
This will require a second filesystem.
EXempty __ 17. Add a shared file system on the previously created Logical volume called
/shared_fs_b. Using the F3 key traverse back to Configure Volume Groups, Logical
Volumes and Filesystems. Select Shared File Systems
__ 18. The filesystem should be available on node toronto# in a few minutes.
The following was observed during additional testing and may or may not be
repeatable: a message on the smit panel saying that shared_fs_b is not a known file
system, and the failed response was posted. However, when I looked at /etc/filesystems
it was there and a manual mount from the Toronto node worked. I was then able to
move the resource group from one node to another and back using the system
management (C-SPOC) menu.
END OF LAB
Exercise Review/Wrapup
The first part of the exercise looked at using C-CSPOC to add a new resource to the
cluster.
Introduction
To enhance the scenario create two additional resource groups to be
made highly available. The addition of these resource groups and their
behavior modification is done with the C-SPOC commands with the
cluster running.
© Copyright IBM Corp. 1998, 2004 Exercise 10. HACMP Extended Features 10-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Exercises Guide
Exercise Instructions
Preface
• Add additional service aliases, and create an additional custom resource group for each
node.
• Modify the default start and fallback policies of the new resource groups to examine the
resource behavior during cluster startup and reintegration event processing.
• Create
EXempty __ 10. In order to mimic the old rotating we need to change the distribution policy to
network. This is done using the smit extended runtime menu Configure Distribution
Policy for Resource Groups. The cluster must be stopped on both nodes first.
__ 11. Synchronize the cluster. Using the F3 key, traverse back to the ‘Extended
Configuration’ smit screen.
Let’s have a look at configuring a settling timer which allows you to modify the behavior
of the Fallback To Higher Priority Node In The List fallback policy so that there are not
two online operations if you bring up the secondary node first.
© Copyright IBM Corp. 1998, 2004 Exercise 10. HACMP Extended Features 10-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Exercises Guide
__ 23. Verify that the appC_group comes online on halifax# (without first being online on
toronto#). As you can see the purpose of the settling timer is to prevent the
resources from being immediately acquired by the first active node.
__ 24. OPTIONAL -- repeat this part but wait for settling time to expire after starting the
cluster on toronto#. Verify that appC_group comes online on toronto#. Stop the
cluster manager on both nodes, wait 2 minutes, start the cluster manager on both
nodes.
END OF LAB
© Copyright IBM Corp. 1998, 2004 Exercise 10. HACMP Extended Features 10-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Exercises Guide
© Copyright IBM Corp. 1998, 2004 Exercise 11. IPAT via Replacement and HWAT 11-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Exercises Guide
Exercise Instructions
Preface
• The first part shows how to remove a cluster
• The second part of this lab looks at setting up IPAT via replacement and using the
standard configuration path to build a cluster.
• The third part of this lab looks at Gratuitous arp.
• The fourth part of this exercise adds hardware address concepts. HWAT or MAC
address takeover would be used in situations where gratuitous arp may not be
supported, as in older hardware, or non-standard operating systems.
© Copyright IBM Corp. 1998, 2004 Exercise 11. IPAT via Replacement and HWAT 11-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Exercises Guide
__ 28. Bring the appR_group online using the C-SPOC menu. If, on the client, there is no
arp cache entry for the appR-repl service address, then ping the appR-repl service
address.
__ 29. Verify that the alternate hardware address is now configured on the interface for the
appR#-repl service address.
__ 30. Fail the halifax# node in your favorite manner.
__ 31. Check that the halifax service address is on the toronto# node and observe the
hardware address associated with that service address
END OF LAB
© Copyright IBM Corp. 1998, 2004 Exercise 11. IPAT via Replacement and HWAT 11-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Exercises Guide
© Copyright IBM Corp. 1998, 2004 Exercise 12. Network File System (NFS) 12-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Exercises Guide
Exercise Instructions
Preface
• All exercises of this chapter depend on the availability of specific equipment in your
classroom.
EXempty __ 13. The output of the lsnfsexp command on the nodes explains that only the cluster
nodes can use user root. To change this we create an override file. Its name is
/usr/es/sbin/cluster/etc/exports. HACMP uses this file to update the /etc/xtabs file
used by NFS.
__ 14. On the running node, use the lsnfsexp command to copy the current /etc/xtabs file to
the HACMP file and then modify the HACMP file using the following commands:
- lsnfsexp > /usr/es/sbin/cluster/etc/exports
- Edit /usr/es/sbin/cluster/etc/exports and add the client to the list of hosts
- Save the file
- ftp the file to the other node
__ 15. Restart the failed node.
__ 16. From the client try to create a file in the nfs directory on the client of the node you
have just restarted.
END OF LAB
© Copyright IBM Corp. 1998, 2004 Exercise 12. Network File System (NFS) 12-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Exercises Guide
Exercise Review/Wrapup
This exercise looked at various methods of implementing NFS in an HACMP cluster.
© Copyright IBM Corp. 1998, 2004 Exercise 13. Error Notification 13-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Exercises Guide
Exercise Instructions
Preface
• This exercise looks at Automatic Error Notification. Before you configure Automatic
Error Notification, you must have a valid HACMP configuration. Using the SMIT options,
you can use the following methods:
- Configure Automatic Error Notification
- List Automatic Error Notification
- Remove Automatic Error Notification.
• Remember that Error Notification is a function of AIX - HACMP just gives you the smit
screens that make it easier to enter error notification methods.
END OF LAB
rootvg rootvg
4.8 GB VG = 4.8 GB
VG =
AP
Cluster Planning Diagram
client
user
community hostname ___________
if1 _________________
svc alias ____________
_______ IP Label IP Address Hardware Address _______ IP Label IP Address Hardware Address
if1 _______ _________ _______________ if1 _______ _________ _______________
if2 _______ _________ _______________ if2 _______ _________ _______________
Persist _______ _________ Persist _______ _________
Network = _____________netmask=___.___.___.___
rootvg rootvg
4.8 GB VG = 4.8 GB
VG =
Network = _____________netmask=___.___.___.___
rootvg rootvg
4.8 GB VG = 4.8 GB
VG =
backpg
V3.1.0.1
cover
HACMP Systems
Administration I: Planning and
Implementation
(Course Code AU54)
Trademarks
IBM® is a registered trademark of International Business Machines Corporation.
The following are trademarks of International Business Machines Corporation in the United
States, or other countries, or both:
AFS AIX AIX 5L
Cross-Site DB2 DB2 Universal Database
DFS Enterprise Storage Server HACMP
NetView POWERparallel pSeries
Redbooks Requisite RS/6000
SP Tivoli TME
TME 10 Versatile Storage Server WebSphere
Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the
United States, other countries, or both.
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft
Corporation in the United States, other countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other
countries.
Other company, product and service names may be trademarks or service marks of others.
The information contained in this document has not been submitted to any formal IBM test and is distributed on an “as is” basis without
any warranty either express or implied. The use of this information or the implementation of any of these techniques is a customer
responsibility and depends on the customer’s ability to evaluate and integrate them into the customer’s operational environment. While
each item may have been reviewed by IBM for accuracy in a specific situation, there is no guarantee that the same or similar results will
result elsewhere. Customers attempting to adapt these techniques to their own environments do so at their own risk.
© Copyright International Business Machines Corporation 1998, 2004. All rights reserved.
This document may not be reproduced in whole or in part without the prior written permission of IBM.
Note to U.S. Government Users — Documentation related to restricted rights — Use, duplication or disclosure is subject to restrictions
set forth in GSA ADP Schedule Contract with IBM Corp.
V3.1.0.1
Instructor Exercises Guide with Hints
TOC Contents
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Exercise Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
TMK Trademarks
The reader should recognize that the following terms, which appear in the content of this
training document, are official trademarks of IBM or other companies:
IBM® is a registered trademark of International Business Machines Corporation.
The following are trademarks of International Business Machines Corporation in the United
States, or other countries, or both:
AFS® AIX® AIX 5L™
Cross-Site® DB2® DB2 Universal Database™
DFS™ Enterprise Storage Server® HACMP™
NetView® POWERparallel® pSeries®
Redbooks™ Requisite® RS/6000®
SP™ Tivoli® TME®
TME 10™ Versatile Storage Server™ WebSphere®
Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the
United States, other countries, or both.
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft
Corporation in the United States, other countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other
countries.
Other company, product and service names may be trademarks or service marks of others.
Introduction
The scenario that the exercises are based on is a company which is
amalgamating its computer sites to a single location. It is intended to
consolidate computer sites from two cities into one situated roughly in
the middle of the original two. The case study has been designed
around five randomly chosen countries in the world. These countries
and city configurations have been tested in our environment but we
offer the choice to use your own. On to the scenario.
Required Materials
Your imagination.
Paper or a section of a white board.
Exercise Steps
__ 1. Draw each of the computer systems as described.
__ 2. Add the applications to the nodes.
__ 3. Add a network connection to each system for access to the outside world.
__ 4. Evaluate the lack of high availability of the initial drawing of the two separate
systems.
__ 5. Combine the services of the existing networks resulting in a single network.
__ 6. Add new SSA disks to your drawing, showing cable connections.
__ 7. Make the disks highly available, RAID/mirror, redundant disks.
__ 8. Define the resources as described in the text.
__ 9. Define the characteristics of the resources.
__ 10. Indicate how the resources fail and recover.
__ 11. Make the diagram simple to understand.
END OF LAB
Introduction
There may be differences in the documentation and the real machines
in the classroom environment. The CPUs, network type, and type of
disk units have been selected to provide a consistent experience but a
variety of equipment may be used. Please ask if you have any
questions.
Note: Throughout this lab the terms shared volume group, shared file
system, node and client refer to components of your HACMP cluster.
The convention of <name> is to be substituted with the appropriate
thing. The example references a generic cluster’s naming of these
components. Some names in your cluster may be different from that
indicated in the notes.
Below is a picture of the generic cluster for this lab. The
communications path may be Ethernet, Token-Ring, FDDI, or any
other network supported by HACMP. There must also be a non IP
serial network-- either RS232, target mode SSA or heartbeat over
disk. The minimum requirement is that there are at least four shared
disks (SCSI, Fiber Channel or SSA) connected to a shared bus so that
EXempty
rootvg rootvg
4.8 GB VG = 4.8 GB
VG =
AU545.0
__ 1. Write down your team number here: ____. In these lab exercises you must replace
the symbol # with your team number unless otherwise noted.
__ 2. Log in as root on both of your cluster nodes. The root password will be provided by
your instructor.
__ 3. Identify and record in the cluster components worksheet the device names and
location codes of the disk adapters.
» lsdev -Cc adapter
__ 4. Identify and record in the cluster components worksheet the device names and
location codes of the external disks (hdisks and pdisk). Note: The external disks
may not have PVIDs on them at this time.
» lspv
» lsdev -Cc disk
» smitty ssadlog
__ 5. Identify and record in the cluster components worksheet the device names and
location codes of the internal disks.
» lsdev -Cc disk
» lsdev -Cc pdisk
__ 6. The storage needs to be divided into two volume groups. Size of the volume groups
is not important. In a real environment, disks should be mirrored and quorum issues
addressed. Here the emphasis is on the operation of HACMP not how the storage is
organized. You should have four disks so feel free to set up a mirror on one of the
volume groups. Different methods of configuring the disks are going to be used
through out the exercises. Decide on the organization but only create the volume
groups when directed to.
__ 7. Identify and update the cluster planning worksheets with the names of 2 shared
volume groups. Use the following names or choose your own.
»
EXempty » shared_vg_a
» shared_vg_b
__ 8. Identify and update the cluster component worksheets the LV component names to
have a shared file system in each of the two volume groups. Select names for the
logical volumes, jfs logs and filesystems. Use the following names or choose your
own.
» data lv’s shared_jfslv_a, shared_jfslv_b
» jfslog lv’s shared_jfslog_a, shared_jfslog_b
» file systems shared_fs_a, shared_fs_b
__ 9. Now add the just the storage information to the generic cluster diagram of your
cluster. This diagram can be found in Appendix A (there are two blank ones after the
filled in one. One is for in class and the other is to take home). On the other hand
you may want to just compare the information on your component worksheets to the
filled in worksheet at the beginning of Appendix A.
• Only fill in what you know -- the LVM information-- at the bottom of the diagram.
GO NOW TO EXERCISE 3. You return to Part 2 after the lecture for the unit on network
planning.
***Internal Disk *** 16 Bit LVD SCSI Disk Drive hdiskX 10-80-00-4,0
Internal Disk 1
Internal Disk 2
Internal Disk 3
***Internal Disk *** 16 Bit LVD SCSI Disk Drive hdiskX 10-80-00-4,0
Internal Disk 1
Internal Disk 2
Internal Disk 3
EXempty
Table 3: Shared Components Worksheet
Component Description Value
Shared vg 1 ---------------------N/A-----------------
Shared jfs log 1 --------------------N/A------------------
Shared jfs lv 1 --------------------N/A------------------
Shared filesystem 1 --------------------N/A------------------
-mount point --------------------N/A------------------
Shared vg 2 --------------------N/A------------------
Shared jfs log 2 --------------------N/A------------------
Shared jfs lv 2 --------------------N/A------------------
Shared filesystem 2 --------------------N/A------------------
-mount point --------------------N/A------------------
REPLACEMENT node2:
Service Label/address
Hardware Address ---------------------N/A-----------------
Introduction
The next phase in our scenario is to provide the storage for the highly
available application. We require a filesystem to store the Web pages
on that can be accessed by each machine when that machine is the
active node.
To support the passing of a filesystem between nodes there must be a
volume group, logical volume, and a logical volume for the jfs log.
There are several methods to accomplish this task. Two are going to
be explored during the exercises. First, a manual creation to
emphasize the necessary steps in the process and second, in a later
exercise, an automated cluster aware method will be explored during
the C-SPOC exercise.
Required Materials
• Cluster Planning Worksheets and cluster diagram from the
previous exercise.
• Shared disk storage connected to both nodes.
END OF LAB
Introduction
This section establishes the communication networks required for
implementing HACMP. Networking is an important component of
HACMP, so all related aspects are configured and tested. The
information used in this exercise is derived from the previous exercise.
© Copyright IBM Corp. 1998, 2004 Exercise 4. Network Setup and Test 4-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Exercises Guide with Hints
rootvg rootvg
4.8 GB VG = 4.8 GB
VG =
Required Materials
• Cluster Planning Worksheets and cluster diagram from exercise 2.
© Copyright IBM Corp. 1998, 2004 Exercise 4. Network Setup and Test 4-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Exercises Guide with Hints
» vi /etc/hosts
» 127.0.0.1 loopback localhost halifax# toronto#
Note: Assumes using suggested hostnames --client are covered in exercise
6)
» 192.168.#1.1 halifax#-if1
» 192.168.#2.1 halifax#-if2
» 192.168.#3.1 halifax#-per
» 192.168.#3.10 appA#-svc
» 192.168.#1.2 toronto#-if1
» 192.168.#2.2 toronto#-if2
» 192.168.#3.2 toronto#-per
» 192.168.#3.20 appB#-svc
»
__ 10. Verify name resolution and connectivity on BOTH nodes for all IP labels.
» host halifax#-if1
» ping halifax#-if1
» host halifax#-if2
» ping halifax#-if2
» host toronto#-if1
» ping toronto#-if1
» host toronto#-if2
» ping toronto#-if2
Using tty
__ 14. On both nodes check the device configuration of the unused tty device. If the tty
device does not exist, create it. If it does exist, ensure that a getty is not spawned, or
better still, delete it and redefine.
» smitty tty -> change/show tty or add tty
» Enable login = disable
» baud rate = 9600
» parity= No
» bits = 8
» stop bits = 1
Using SSA
__ 16. If using target-mode SSA for your non IP network, then check if the prerequisites are
there. A unique node number must be set and the device driver must be installed. If
not add it.
» lsdev -C | grep ssa
» lsattr -El ssar
» lscfg -vl ssa0
» lslpp -L devices.ssa.tm.rte (If not installed on both nodes, ask your
instructor if it can be added and where are the needed information or
resources.)
• Install missing software.
• Install usable microcode (if required).
» chdev -l ssar -a node_number=<a unique number> (different on each
node)
» run cfgmgr -- must be run on the first node, then the second node, then the
first node.
» ls -l /dev | grep ssa -- verify the existence of the .tm and .im device files
__ 17. Test the non IP communication using SSA.
» Node A: cat < /dev/tmssa<number>.tm <-- node number of node B
» Node B: cat <filename> > /dev/tmssa<number>.im <-- node number of
node A
END OF LAB
© Copyright IBM Corp. 1998, 2004 Exercise 4. Network Setup and Test 4-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Exercises Guide with Hints
© Copyright IBM Corp. 1998, 2004 Exercise 5. HACMP Software Installation 5-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Exercises Guide with Hints
© Copyright IBM Corp. 1998, 2004 Exercise 5. HACMP Software Installation 5-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Exercises Guide with Hints
__ 13. (Optional) It would be a good idea to set up your /.profile to include paths to the
HACMP commonly used commands so that you don’t have to keep entering full path
names in the later lab exercises.
» PATH=$PATH:/usr/es/sbin/cluster:/usr/es/sbin/cluster/utilities
» PATH=$PATH:/usr/es/sbin/cluster/etc
» PATH=$PATH:/usr/es/sbin/cluster/diag
__ 14. (Very Optional) If the nodes have a tape subsystem attached, now would be a good
time for a mksysb backup.
__ 15. Ensure Part 2 is also performed for your other node toronto#.
END OF LAB
© Copyright IBM Corp. 1998, 2004 Exercise 5. HACMP Software Installation 5-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Exercises Guide with Hints
Introduction
Our scenario has a Web server to be made highly available. We are
required to test the availability traits of the Web server. This exercise
creates a client to test from.
Required Materials
HACMP planning sheets.
AIX bonus pack
Client machine
EXempty cluster.adt.es
cluster.es (choose only the three client filesets)
cluster.license
cluster.man.en_US.es
cluster.msg.en_US.es (choose only the client fileset)
» Change to the directory with the filesets
» smitty install_all
__ 9. Install the ptf1 updates
» Change to the directory with the filesets
» smitty update_all
__ 10. In order to use clstat.cgi, verify that httpdlite is running and that Netscape is
available on this machine. If not ask your instructor.
» ps -ef | grep httpdlite
» netscape file:///usr/lpp/bos.sysmgt//mkcd.README.html
__ 11. Verify Netscape starts and can display a URL, like
file:///usr/lpp/bos.sysmgt/mkcd.README.html/
The next three steps prepare you to use clinfoES from the client machine after HACMP is
started in the next exercise.
__ 12. Copy the clstat.cgi script from /usr/es/sbin/cluster to the /var/docsearch/cgi-bin
directory.
» cd /var/docsearch/cgi-bin
» cp /usr/es/sbin/cluster/clstat.cgi ./
__ 13. Verify that the file /var/docsearch/cgi-bin/clstat.cgi is world-executable (755 or
rwxr-xr-x)
» chmod +x clstat.cgi
» ls -al clstat.cgi
__ 14. Test access to clstat.cgi using the URL
http://localhost:49213/cgi-bin/clstat.cgi <-- you should get a window with the
message “Could not initialize clinfo connection”.
__ 15. Put the cluster nodes ip address (that is, halifax#-per and toronto#-per) into the
/usr/es/sbin/cluster/etc/clhosts file. Make sure you can ping these addresses.
__ 16. Reboot and do the ping tests to verify that this client machine functions as expected.
END OF LAB
Exercise Review/Wrapup
We have the client all set and ready to go with communication checked, and name
resolution.
Introduction
The scenario is expanding you now create a custom resource group.
This is the beginning of making an application highly available.
Required Materials
Cluster planning worksheets.
• Were the application start and stop scripts copied over? ________________
• Was the volume group imported to the other node? ____________________
Use the command cldisp | more to answer the following questions:
• What is the cluster name? ______________________________________
• What is the resource group name? _______________________________
• What is the startup policy? ______________________________________
• What is the fallback policy?______________________________________
• What is the vg resource name (if any)? _____________________________
• What is the non-IP network name (if any)? ___________________________
• On what enX is halifax#-if1? _____________________________________
• What is the ip network name? ____________________________________
• Were the start/stop scripts copied over? ____________________________
__ 10. So were you impressed? _________________________________
__ 11. You can now add the ip network and non-IP network names, that we promised
would be generated by HACMP, to your component work sheets and/or the cluster
diagram if you want to.
__ 12. Return to your administrative node (halifax#).
__ 13. Define an additional Non-IP RS232 or a TMSSA network. The lab environment may
help you decide. Note that a network is automatically created when you choose the
pair of devices that form the endpoints of the network.
» smitty hacmp
» Select, ‘Extended Configuration’
» Select Extended Topology Configuration
» Select ‘Configure HACMP Communication Interfaces/Devices’
» Add Communication Interfaces/Devices
» Select ‘Add Discovered Communication Interface and Devices’
» Select ‘Communication Devices’ from the list
» Select, using F7, the Point-to-Point Pair of Discovered Communication
Devices (either a /dev/tty# pair or a TMSSA# pair).
__ 14. Execute the command cltopinfo and see that the additional non-IP network was
configured. Add this name to the worksheet and/or diagram.
__ 15. Add a persistent node address for each node in the cluster -- select ‘Configure
HACMP Persistent Node IP Label/Addresses’ from the ‘Extended Topology
Configuration’ menu,
» Configure a Persistent Node IP Label/Address
» Select Add a Persistent Node IP Label/Address
» Select a node form the list, press enter
» Select (using F4) the network name and IP Label/Address -- the Network
Name will be the same Network Name that the interfaces and service labels
belong too. The suggested IP labels are names of the form XXX-per.
» Repeat this step for the other node.
EXempty __ 16. Synchronize the changes -- Using the F3 key, traverse back to the Extended
Configuration smit screen.
» Select Verify and Synchronize HACMP Configuration’
Review the output upon completion looking for any Errors or Warnings. Errors must
be corrected before continuing, warnings should simply be reviewed and noted.
__ 17. Check to see that your persistent addresses were created. If not then wait until the
cluster is started in Part 3 below and then check again.
» netstat -i
__ 18. Take about 10 minutes to review the Startup, Fallover, and Fallback policies using
the F1 key on the Add a Resource Group menu. When you are ready, proceed to
Part 3.
» smitty hacmp --> Initialization and Standard Configuration
-->Configure HACMP Resource Groups -->Add a Resource Group
__ 25. Now start Netscape and make sure that the URL to clstat.cgi is working properly.
• The URL is http://localhost:49213/cgi-bin/clstat.cgi
• You should now see a window with cluster information displayed. Be patient if
this window shows that the cluster is unstable.
• Take a moment to familiarize yourself with what you are looking at. Click on the
resource group name app#
• You will use this session to monitor the failover testing that comes next (or you
can run clstat on one of your cluster nodes)
__ 26. Now go to your administrative node (halifax#) and stop it graceful. Watch what
happens in the clstat browser (be patient -- it may take 2 minutes).
» smitty clstop <-- stop halifax# graceful
__ 27. Now start HACMP and clinfo on BOTH nodes
» smitty clstart <-- start halifax# and toronto# and clinfo
__ 28. Use the lsvg command to see that the shared vg is varied on in passive mode on the
other node (toronto#).
» lsvg shared_vg_a
END OF LAB
Introduction
The intention is not to become Web server programers but to simply
add an existing application to the HACMP environment. This is to
demonstrate one way to add an application to the HACMP
environment.
Required Materials
A running cluster
The AIX 5L Expansion pack
EXempty __ 6. Check if http filesets listed below are installed. If not, ask your instructor. On many
class images they may be found in the directory /usr/sys/inst.images/web-appl.
Otherwise you may need the AIX 5L Expansion Pack CD.
» http_server.base
» http_server.admin
» http_server.html
» lslpp -L http_server*
» http_server.man (OPTIONAL man pages)
__ 7. On the other node (toronto#), repeat the previous step. Once installed, delete all of
the information in the directory /usr/HTTPServer/htdocs (only on this node!).
» cd /usr/HTTPServer/htdocs
» rm -r ./*
» This is because when the filesystem fails over this will be covered by the
shared filesystem /usr/HTTPServer/htdocs.
__ 8. Go back to the halifax# node. In the directory /usr/HTTPServer/conf/, edit httpd.conf
and change the “ServerName” variable to be the same as the service IP label
(appA#-svc).
Note: The hostname must be resolvable, that is, host hostname should return a
good answer. If the hostname is not resolvable, add the hostname to the 127.0.0.1
address as an alias. If in doubt, ask the Instructor. Remember to do this on both
nodes otherwise successful takeover does not happen.
__ 9. Use ftp to put a copy of the /usr/HTTPServer/conf/httpd.conf file on the toronto#
node.
» ftp toronto#
» put /usr/HTTPServer/conf/httpd.conf /usr/HTTPServer/conf/httpd.conf.
END OF LAB
END OF LAB
Introduction
In the scenario there are two resource groups to be made highly
available. The addition of the second resource group is done with the
C-SPOC commands with the cluster running.
EXempty -Check the Physical partition SIZE and major number (C-SPOC
chooses a valid major number); set enhanced concurrent is true)
» Create the volume group. (on the development system there were some “ok”
error messages after the successful “has been imported” message)
__ 9. Verify the Volume Group exists on both nodes.
» lspv
» lsvg
Now that the volume group is created it must be discovered a resource group must be
created and finally the volume group must be added to the resource group before any
further C-SPOC utilities will access it.
__ 10. Discover the volume group using Extended Configuration in smitty hacmp.
__ 11. Create a resource group called appB_group with the toronto# node as the highest
priority and halifax# node as the next priority.
» smitty hacmp
» Select Initialization and Standard Configuration
» Select Configure HACMP Resource Groups
» Select Add a Resource Group
» Enter the resource group name appB_group, from the planning worksheets.
The participating node names must also be entered -- enter toronto# first
Take the defaults for the policies.
__ 12. Add the volume group to the resource group
» Return (F3) to the menu Configure HACMP Resource Groups then
» Select Change/Show Resources for a Resource Group (standard)
» Select appB_group
» Enter the volume group name using F4.
__ 13. Synchronize the Cluster.
» smitty hacmp
» Select Initialization and Standard Configuration
» Select Verify and Synchronize HACMP Configuration
__ 14. Once synchronized, the Volume Group is varied online, on the owning node
(toronto#). Wait for this to happen. Then on your administrative node halifax# use
C-SPOC to add a jfs log shared logical volume to the shared_vg_b. The name
should be shared_jfslog_b, the LV type should be jfslog, and use 1 PP.
» smitty hacmp
» Select Initialization and Standard Configuration
» Select Configure Resources to make Highly Available
» Select Configure Volume Groups, Logical Volumes and Filesystems
» Select Shared Logical Volumes
» Select Add a Shared Logical Volume
» From the list provided, choose the entry for appB_group shared_vg_b
» smit vg (mirrorvg)
» ensure /shared_fs_b is mounted
» chfs (see man page example)
» lsvg -l shared_vg_b (see new logical volume/filesystem)
» lslv -p hdiskX | grep USED (on both disks -- look for stale)
» umount the new file system
» rmfs
» on disk with stale redeo the lslv command (see stale removed)
» start cluster on toront#.
END OF LAB
Introduction
To enhance the scenario create two additional resource groups to be
made highly available. The addition of these resource groups and their
behavior modification is done with the C-SPOC commands with the
cluster running.
© Copyright IBM Corp. 1998, 2004 Exercise 10. HACMP Extended Features 10-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Exercises Guide with Hints
© Copyright IBM Corp. 1998, 2004 Exercise 10. HACMP Extended Features 10-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Exercises Guide with Hints
» Exit smit
» Verify cluster stopped (lssrc -g cluster on both nodes)
» smit hacmp
» Select Extended Configuration
» Select Extended Resource Configuration
» Select Configure Resource Group Run-Time Policies
» Select Configure Distribution Policy for Resource Groups
» Change the value to network (notice the deprecated message).
__ 11. Synchronize the cluster. Using the F3 key, traverse back to the ‘Extended
Configuration’ smit screen.
» Select, Extended Verification and Synchronization
» Notice the menu option Automatically correct errors found during verification.
You only see this option when the cluster is down.
Let’s have a look at configuring a settling timer which allows you to modify the behavior
of the Fallback To Higher Priority Node In The List fallback policy so that there are not
two online operations if you bring up the secondary node first.
© Copyright IBM Corp. 1998, 2004 Exercise 10. HACMP Extended Features 10-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Exercises Guide with Hints
__ 26. On your administrative node (halifax#), create a delayed fallback timer policy for 30
minutes from now (instructor may modify this time)
» Write down the current time ______________
» Make sure both nodes are using the same time (setclock toronto#-if1).
» smitty hacmp
» Extended Configuration... Extended Resource Configuration
» --> Configure Resource Group Run-Time Policies
» --> --> Configure Delayed Fallback Timer Policies
» --> --> --> Add a Delayed Fallback Timer Policy
» use the following values:
daily, name=my_delayfbt, hour/min=30 min from current time
__ 27. Add the fallback timer policy the resource group appC_group
» smitty hacmp
» Extended Configuration... Extended Resource Configuration
» ... HACMP Extended Resource Group Configuration
» ... ... Change/Show Resources and Attributes for a Resource Group
» Select appC_group
» Fill in the field below as indicated:
» Fallback Timer Policy [my_delayfbt] (use F4)
__ 28. Synchronize
EXempty __ 35. At the time set for the Delayed Fallback Timer, appC_group should move back to
halifax# (you should see activity from tail command)
» Execute clRGinfo to verify that appC_group is online on halifax#.
__ 36. On your administrative node (halifax#), remove the name of the Delayed Fallback
Timer (my_delayfbt) from the resource group appC_group (you can keep the policy
definition if you want).
» smitty hacmp
» Extended Configuration... Extended Resource Configuration
» ... HACMP Extended Resource Group Configuration
» ... ... Change/Show Resources and Attributes for a Resource Group
__ 37. Reset the Settling time to 0 (from the menu ‘Configure Resource Group Run-Time
Policies’
» smitty hacmp
» Select, Extended Configuration
» Select, Extended Resource Configuration
» Select, Configure Resource Group Run-Time Policies
__ 38. Synchronize.
END OF LAB
© Copyright IBM Corp. 1998, 2004 Exercise 10. HACMP Extended Features 10-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Exercises Guide with Hints
Exercise Review/Wrapup
The first part of the exercise looked at using C-CSPOC to add a new resource to the
cluster.
© Copyright IBM Corp. 1998, 2004 Exercise 11. IPAT via Replacement and HWAT 11-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Exercises Guide with Hints
© Copyright IBM Corp. 1998, 2004 Exercise 11. IPAT via Replacement and HWAT 11-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Exercises Guide with Hints
» Enter the Resource Group name (appR_group) and set the participating
nodes (use F4 to choose the nodes). Remember the priority order, the first
node listed, is considered the ‘home or owner’ node. Use the default policies
__ 11. Add Resources to the Resource Group. Using the F3 key, traverse back to the
HACMP extended Resource Group Configuration smit screen.
» Select, Change/Show Resources and Attributes for a Resource Group
» In the appropriate fields, use F4 5o choose a Service IP Label and a Volume
Group. For the purposes of this lab, application servers are not required. You
may add them if you wish.
__ 12. Synchronize the cluster. Using the F3 key, traverse back to the Extended
Configuration smit screen. Select Extended Verification and Synchronization
» Review the results for any errors.
__ 13. Start HACMP on the toronto# node.
» smitty clstart (choose only toronto# and start clinfo)
» monitor the /tmp/hacmp.out during startup.
__ 14. Verify the appR_group did not come online because of the startup policy.
» /usr/es/sbin/cluster/clRGinfo
__ 15. Start HACMP on the halifax# node.
» smitty clstart (choose only halifax# and start clinfo)
__ 16. Verify that the appR_group is online on halifax#
» /usr/es/sbin/cluster/clRGinfo
EXempty __ 20. On the halifax# node generate a swap adapter event. Be aware that you need to do
this fairly quickly before the arp cache times out.
» ifconfig enX down (the interface on which the service label is configured).
__ 21. Check the contents of the arp cache on the client, compare the results with the
previous iteration of the command.
» arp -a
__ 22. The hardware address should have updated in the arp cache on the client without
any intervention.
Note: If the entry is not in the arp cache when the Gratuitous arp is broadcast it is
ignored.
© Copyright IBM Corp. 1998, 2004 Exercise 11. IPAT via Replacement and HWAT 11-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Exercises Guide with Hints
parameters of service IP label via a DARE may result in releasing resource group
appR_group.
» Select “Extended Verification and Synchronization’
» clRGinfo (shows appR_group offline.
__ 28. Bring the appR_group online using the C-SPOC menu. If, on the client, there is no
arp cache entry for the appR-repl service address, then ping the appR-repl service
address.
» Select C-SPOC from the main smit hacmp menu
» Select HACMP Resource Group and Application Management
» Select Bring a Resource Group Online
» Select ‘appR_group offline halifax#’
» BE CAREFUL -- select Restore_Node_Priority_Order
» Accept the next menu
__ 29. Verify that the alternate hardware address is now configured on the interface for the
appR#-repl service address.
» netstat -i
__ 30. Fail the halifax# node in your favorite manner.
__ 31. Check that the halifax service address is on the toronto# node and observe the
hardware address associated with that service address
» netstat -i
© Copyright IBM Corp. 1998, 2004 Exercise 11. IPAT via Replacement and HWAT 11-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Exercises Guide with Hints
Exercise Review/Wrapup
This exercise looked at cascading resource groups, and how to configure both cascading
without Fallback and Inactive Takeover. It also covered setting up and testing Hardware
Address Takeover.
This exercise also looked at rotating resource groups.
© Copyright IBM Corp. 1998, 2004 Exercise 12. Network File System (NFS) 12-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Exercises Guide with Hints
© Copyright IBM Corp. 1998, 2004 Exercise 12. Network File System (NFS) 12-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Exercises Guide with Hints
END OF LAB
© Copyright IBM Corp. 1998, 2004 Exercise 12. Network File System (NFS) 12-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Exercises Guide with Hints
© Copyright IBM Corp. 1998, 2004 Exercise 13. Error Notification 13-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Exercises Guide with Hints
END OF LAB
rootvg rootvg
4.8 GB VG = 4.8 GB
VG =
AP
Cluster Planning Diagram
client
user
community hostname ___________
if1 _________________
svc alias ____________
_______ IP Label IP Address Hardware Address _______ IP Label IP Address Hardware Address
if1 _______ _________ _______________ if1 _______ _________ _______________
if2 _______ _________ _______________ if2 _______ _________ _______________
Persist _______ _________ Persist _______ _________
Network = _____________netmask=___.___.___.___
rootvg rootvg
4.8 GB VG = 4.8 GB
VG =
Network = _____________netmask=___.___.___.___
rootvg rootvg
4.8 GB VG = 4.8 GB
VG =
backpg