You are on page 1of 62

HACMP Basics

History
IBM's HACMP exists for almost 15 years. It's not actually an IBM
product, they ou!ht it from C"AM, #hich #as later renamed to
A$ailant and is no# called "a%e&ie#'ech. (ntil au!ust )**+, all
de$elopment of HACMP #as done y C"AM. ,o#adays IBM does it's
o#n de$elopment of HACMP in Austin, Pou!h%eepsie and Ban!alore
IBM's hi!h a$ailaility solution for AI-, Hi!h A$ailaility Cluster Multi
Processin! .HACMP/, consists of t#o components0
1High Availability0 'he process of ensurin! an application is
a$ailale for use throu!h the use of duplicated and2or shared
resources .eliminatin! 3in!le Points 4f 5ailure 6 3P45's/
.Cluster Multi-Processing0 Multiple applications runnin! on the
same nodes #ith shared or concurrent access to the data.
A hi!h a$ailaility solution ased on HACMP pro$ides automated
failure detection, dia!nosis, application reco$ery and node
reinte!ration. 7ith an appropriate application, HACMP can also
pro$ide concurrent access to the data for parallel processin!
applications, thus offerin! excellent hori8ontal scalaility.
7hat needs to e protected9 (ltimately, the !oal of any I' solution
in a critical en$ironment is to pro$ide continuous ser$ice and data
protection.
'he Hi!h A$ailaility is :ust one uildin! loc% in achie$in! the
continuous operation !oal. 'he Hi!h A$ailaility is ased on the
a$ailaility hard#are, soft#are .43 and its components/, application
and net#or% components.
'he main o:ecti$e of the HACMP is to eliminate 3in!le Points of
5ailure .3P45's/
;<A fundamental desi!n !oal of .successful/ cluster desi!n is the
elimination of sin!le points of failure (SPOFs)<=
Eliminate Single Point of Failure (SPOF)
Cluster >liminated as a sin!le point of failure
o!e "sing multi#le no!es
Po#er 3ource (sin! Multiple circuits or uninterruptile
,et#or%2adapter (sin! redundant net#or% adapters
,et#or% (sin! multiple net#or%s to connect nodes.
'CP2IP 3usystem (sin! non?IP net#or%s to connect ad:oinin!
nodes @ clients
Ais% adapter (sin! redundant dis% adapter or multiple adapters
Ais% (sin! multiple dis%s #ith mirrorin! or BAIA
Application Add node for ta%eo$erC confi!ure application monitor
Administrator Add ac%up or e$ery $ery detailed operations !uide
3ite Add additional site.
Cluster Com#onents
Here are the recommended practices for important cluster
components.
o!es
HACMP supports clusters of up to D) nodes, #ith any comination of
acti$e and standy nodes. 7hile it
is possile to ha$e all nodes in the cluster runnin! applications .a
confi!uration referred to as Emutual
ta%eo$erE/, the most reliale and a$ailale clusters ha$e at least
one standy node ? one node that is normally
not runnin! any applications, ut is a$ailale to ta%e them o$er in
the e$ent of a failure on an acti$e
node.
Additionally, it is important to pay attention to en$ironmental
considerations. ,odes should not ha$e a
common po#er supply ? #hich may happen if they are placed in a
sin!le rac%. 3imilarly, uildin! a cluster
of nodes that are actually lo!ical partitions ."PABs/ #ith a sin!le
footprint is useful as a test cluster, ut
should not e considered for a$ailaility of production applications.
,odes should e chosen that ha$e sufficient I24 slots to install
redundant net#or% and dis% adapters.
'hat is, t#ice as many slots as #ould e reFuired for sin!le node
operation. 'his naturally su!!ests that
processors #ith small numers of slots should e a$oided. (se of
nodes #ithout redundant adapters
should not e considered est practice. Blades are an outstandin!
example of this. And, :ust as e$ery cluster
resource should ha$e a ac%up, the root $olume !roup in each node
should e mirrored, or e on a
$A%& !evice'
,odes should also e chosen so that #hen the production
applications are run at pea% load, there are still
sufficient CP( cycles and I24 and#idth to allo# HACMP to operate.
'he production application
should e carefully enchmar%ed .preferale/ or modeled .if
enchmar%in! is not feasile/ and nodes chosen
so that they #ill not exceed G5H usy, e$en under the hea$iest
expected load.
,ote that the ta%eo$er node should e si8ed to accommodate all
possile #or%loads0 if there is a sin!le
standy ac%in! up multiple primaries, it must e capale of
ser$icin! multiple #or%loads. 4n hard#are
that supports dynamic "PAB operations, HACMP can e confi!ured
to allocate processors and memory to
a ta%eo$er node efore applications are started. Ho#e$er, these
resources must actually e a$ailale, or
acFuirale throu!h Capacity (p!rade on Aemand. 'he #orst case
situation 6 e.!., all the applications on
a sin!le node 6 must e understood and planned for.
et(or)s
HACMP is a net#or% centric application. HACMP net#or%s not only
pro$ide client access to the applications
ut are used to detect and dia!nose node, net#or% and adapter
failures. 'o do this, HACMP uses
B3C' #hich sends hearteats .(AP pac%ets/ o$er A"" defined
net#or%s. By !atherin! hearteat information
on multiple nodes, HACMP can determine #hat type of failure has
occurred and initiate the appropriate
reco$ery action. Bein! ale to distin!uish et#een certain failures,
for example the failure of a net#or%
and the failure of a node, reFuires a second net#or%I Althou!h this
additional net#or% can e ;IP
ased= it is possile that the entire IP susystem could fail #ithin a
!i$en node. 'herefore, in addition
there should e at least one, ideally t#o, non?IP net#or%s. 5ailure
to implement a non?IP net#or% can potentially
lead to a Partitioned cluster, sometimes referred to as '3plit Brain'
3yndrome. 'his situation can
occur if the IP net#or%.s/ et#een nodes ecomes se$ered or in
some cases con!ested. 3ince each node is
in fact, still $ery ali$e, HACMP #ould conclude the other nodes are
do#n and initiate a ta%eo$er. After
ta%eo$er has occurred the application.s/ potentially could e
runnin! simultaneously on oth nodes. If the
shared dis%s are also online to oth nodes, then the result could
lead to data di$er!ence .massi$e data corruption/.
'his is a situation #hich must e a$oided at all costs.
'he most con$enient #ay of confi!urin! non?IP net#or%s is to use
Ais% Hearteatin! as it remo$es the
prolems of distance #ith rs)D) serial net#or%s. Ais% hearteat
net#or%s only reFuire a small dis% or
"(,. Be careful not to put application data on these dis%s. Althou!h,
it is possile to do so, you don't #ant
any conflict #ith the dis% hearteat mechanismI
A!a#ters
As stated ao$e, each net#or% defined to HACMP should ha$e at
least t#o adapters per node. 7hile it is
possile to uild a cluster #ith fe#er, the reaction to adapter
failures is more se$ere0 the resource !roup
must e mo$ed to another node. AI- pro$ides support for
>therchannel, a facility that can used to a!!re!ate
adapters .increase and#idth/ and pro$ide net#or% resilience.
>therchannel is particularly useful for
fast responses to adapter 2 s#itch failures. 'his must e set up #ith
some care in an HACMP cluster.
7hen done properly, this pro$ides the hi!hest le$el of a$ailaility
a!ainst adapter failure. Befer to the IBM
techdocs #esite0 http022###?
*D.im.com2support2techdocs2atsmastr.nsf27eIndex2'A1*1JG5
for further
details.
Many 3ystem p 'M ser$ers contain uilt?in >thernet adapters. If the
nodes are physically close to!ether, it
is possile to use the uilt?in >thernet adapters on t#o nodes and a
Ecross?o$erE >thernet cale .sometimes
referred to as a Edata transferE cale/ to uild an inexpensi$e
>thernet net#or% et#een t#o nodes for
heart eatin!. ,ote that this is not a sustitute for a non?IP
net#or%.
3ome adapters pro$ide multiple ports. 4ne port on such an adapter
should not e used to ac% up another
port on that adapter, since the adapter card itself is a common point
of failure. 'he same thin! is true
of the uilt?in >thernet adapters in most 3ystem p ser$ers and
currently a$ailale lades0 the ports ha$e a
common adapter. 7hen the uilt?in >thernet adapter can e used,
est practice is to pro$ide an additional
adapter in the node, #ith the t#o ac%in! up each other.
Be a#are of net#or% detection settin!s for the cluster and consider
tunin! these $alues. In HACMP terms,
these are referred to as ,IM $alues. 'here are four settin!s per
net#or% type #hich can e used 0 slo#,
normal, fast and custom. 7ith the default settin! of normal for a
standard >thernet net#or%, the net#or%
failure detection time #ould e approximately )* seconds. 7ith
todays s#itched net#or% technolo!y this
is a lar!e amount of time. By s#itchin! to a fast settin! the
detection time #ould e reduced y 5*H .1*
seconds/ #hich in most cases #ould e more acceptale. Be careful
ho#e$er, #hen usin! custom settin!s,
as settin! these $alues too lo# can cause false ta%eo$ers to occur.
'hese settin!s can e $ie#ed usin! a $ariety
of techniFues includin! 0 lssrc 6ls tops$cs command .from a node
#hich is acti$e/ or odm!et
HACMPnim K!rep 6p ether and smitty hacmp.
A##lications
'he most important part of ma%in! an application run #ell in an
HACMP cluster is understandin! the
application's reFuirements. 'his is particularly important #hen
desi!nin! the Besource Lroup policy eha$ior
and dependencies. 5or hi!h a$ailaility to e achie$ed, the
application must ha$e the aility to
stop and start cleanly and not explicitly prompt for interacti$e input.
3ome applications tend to ond to a
particular 43 characteristic such as a uname, serial numer or IP
address. In most situations, these prolems
can e o$ercome. 'he $ast ma:ority of commercial soft#are
products #hich run under AI- are #ell
suited to e clustered #ith HACMP.
A##lication &ata *ocation
7here should application inaries and confi!uration data reside9
'here are many ar!uments to this discussion.
Lenerally, %eep all the application inaries and data #ere possile
on the shared dis%, as it is easy
to for!et to update it on all cluster nodes #hen it chan!es. 'his can
pre$ent the application from startin! or
#or%in! correctly, #hen it is run on a ac%up node. Ho#e$er, the
correct ans#er is not fixed. Many application
$endors ha$e su!!estions on ho# to set up the applications in a
cluster, ut these are recommendations.
Must #hen it seems to e clear cut as to ho# to implement an
application, someone thin%s of a ne#
set of circumstances. Here are some rules of thum0
If the application is pac%a!ed in "PP format, it is usually installed on
the local file systems in root$!. 'his
eha$ior can e o$ercome, y ffcreateNin! the pac%a!es to dis%
and restorin! them #ith the pre$ie# option.
'his action #ill sho# the install paths, then symolic lin%s can e
created prior to install #hich point
to the shared stora!e area. If the application is to e used on
multiple nodes #ith different data or confi!uration,
then the application and confi!uration data #ould proaly e on
local dis%s and the data sets on
shared dis% #ith application scripts alterin! the confi!uration files
durin! fallo$er. Also, rememer the
HACMP 5ile Collections facility can e used to %eep the rele$ant
confi!uration files in sync across the cluster.
'his is particularly useful for applications #hich are installed locally.
Start+Sto# Scri#ts
Application start scripts should not assume the status of the
en$ironment. Intelli!ent pro!rammin! should
correct any irre!ular conditions that may occur. 'he cluster
mana!er spa#ns theses scripts off in a separate
:o in the ac%!round and carries on processin!. 3ome thin!s a
start script should do are0
5irst, chec% that the application is not currently runnin!I 'his is
especially crucial for $5.O users as
resource !roups can e placed into an unmana!ed state .forced
do#n action, in pre$ious $ersions/.
(sin! the default startup options, HACMP #ill rerun the application
start script #hich may cause
prolems if the application is actually runnin!. A simple and
effecti$e solution is to chec% the state
of the application on startup. If the application is found to e
runnin! :ust simply end the start script
#ith exit *.
&erify the en$ironment. Are all the dis%s, file systems, and IP laels
a$ailale9
If different commands are to e run on different nodes, store the
executin! H43',AM> to $ariale.
Chec% the state of the data. Aoes it reFuire reco$ery9 Al#ays
assume the data is in an un%no#n state
since the conditions that occurred to cause the ta%eo$er cannot e
assumed.
Are there prereFuisite ser$ices that must e runnin!9 Is it feasile
to start all prereFuisite ser$ices
from #ithin the start script9 Is there an inter?resource !roup
dependency or resource !roup seFuencin!
that can !uarantee the pre$ious resource !roup has started
correctly9 HACMP $5.) and later has
facilities to implement chec%s on resource !roup dependencies
includin! collocation rules in
HACMP $5.D.
5inally, #hen the en$ironment loo%s ri!ht, start the application. If
the en$ironment is not correct and
error reco$ery procedures cannot fix the prolem, ensure there are
adeFuate alerts .email, 3M3,
3M'P traps etc/ sent out $ia the net#or% to the appropriate support
administrators.
3top scripts are different from start scripts in that most applications
ha$e a documented start?up routine
and not necessarily a stop routine. 'he assumption is once the
application is started #hy stop it9 Belyin!
on a failure of a node to stop an application #ill e effecti$e, ut to
use some of the more ad$anced features
of HACMP the reFuirement exists to stop an application cleanly.
3ome of the issues to a$oid are0
Be sure to terminate any child or spa#ned processes that may e
usin! the dis% resources. Consider
implementin! child resource !roups.
&erify that the application is stopped to the point that the file
system is free to e unmounted. 'he
fuser command may e used to $erify that the file system is free.
In some cases it may e necessary to doule chec% that the
application $endorNs stop script did actually
stop all the processes, and occasionally it may e necessary to
forcily terminate some processes.
Clearly the !oal is to return the machine to the state it #as in
efore the application start script #as run.
5ailure to exit the stop script #ith a 8ero return code as this #ill
stop cluster processin!. P ,ote0 'his is not the case #ith start
scriptsI
Bememer, most $endor stop2starts scripts are not desi!ned to e
cluster proofI A useful tip is to ha$e stop
and start script $erosely output usin! the same format to the
2tmp2hacmp.out file. 'his can e achie$ed
y includin! the follo#in! line in the header of the script0 set ?x @@
P3OQERS*TTP2UE'VR"I,>,4W
Hacm# can be configure! in , (ays'
1. Botatin!
). Cascadin!
D. Mutual 5ailo$er
'he cascadin! and rotatin! resource !roups are the ;classic=, pre?
HA 5.1 types. 'he ne# ;custom= type of resource !roup has een
introduced in HA 5.1 on#ards.
Casca!ing resource grou#0
(pon node failure, a cascadin! resource !roup falls o$er to the
a$ailale node #ith the next priority in the node priority list.
(pon node reinte!ration into the cluster, a cascadin! resource
!roup falls ac% to its home node y default.
Casca!ing (ithout fallbac)
'hisoption, this means #hene$er a primary node fails, the pac%a!e
#ill failo$er to the next a$ailale node in the list and #hen the
primary node comes online then the pac%a!e #ill not fallac%
automatically. 7e need to mo$e pac%a!e to its home node at a
con$enient time.
$otating resource grou#0
'his is almost similar to Cascadin! #ithout fallac%, #hene$er
pac%a!e failo$er to the standy nodes it #ill ne$er fallac% to the
primary node automatically, #e need to mo$e it manually at our
con$enience.
Mutual ta)eover0
Mutual ta%eo$er option, #hich means oth the nodes in this type
are acti$e?acti$e mode. 7hene$er fail o$er happens the pac%a!e on
the failed node #ill mo$e to the other acti$e node and #ill run #ith
already existin! pac%a!e. 4nce the failed node comes online #e can
mo$e the pac%a!e manually to that node.
"seful HACMP comman!s
clstat ? sho# cluster state and sustateC needs clinfo.
cldump ? 3,MP?ased tool to sho# cluster state
cldisp ? similar to cldump, perl script to sho# cluster state.
cltopinfo ? list the local $ie# of the cluster topolo!y.
clsho#sr$ ?a ? list the local $ie# of the cluster susystems.
clfindres .?s/ ? locate the resource !roups and display status.
clBLinfo ?$ ? locate the resource !roups and display status.
clcycle ? rotate some of the lo! files.
clXpin! ? a cluster pin! pro!ram #ith more ar!uments.
clrsh ? cluster rsh pro!ram that ta%e cluster node names as
ar!ument.
cl!etacti$enodes ? #hich nodes are acti$e9
!etXlocalXnodename ? #hat is the name of the local node9
clconfi! ? chec% the HACMP 4AM.
clBLmo$e ? online2offline or mo$e resource !roups.
cldare ? sync2fix the cluster.
clls!rp ? list the resource !roups.
clsnapshotinfo ? create a lar!e snapshot of the hacmp confi!uration.
cllscf ? list the net#or% confi!uration of an hacmp cluster.
clsho#res ? sho# the resource !roup confi!uration.
cllsif ? sho# net#or% interface information.
cllsres ? sho# short resource !roup information.
lssrc ?ls clstrm!r>3 ? list the cluster mana!er state.
lssrc ?ls tops$cs ? sho# hearteat information.
cllsnode ? list a node centric o$er$ie# of the hacmp confi!uration.
HACMP log files
2usr2sin2cluster2etc2rhosts ??? to accept incomin! communication
from clcomd>3 .cluster communucation enahanced security/
2usr2es2sin2cluster2etc2rhosts
,ote0 If there is an unresol$ale lael in the
2usr2es2sin2cluster2etc2rhosts file,
then all clcomd>3 connections from remote nodes #ill e denied.
cluster mana!er clstrmgrES
cluster loc% Aaemon .cloc)!ES/
cluster multi peer extension communication daemon .clsmu-#!ES/
'he clcom!ES is used for cluster confi!uration operations such as
cluster synchronisation
cluster mana!ement .C-SPoC/ P Aynamic re?confi!uration &A$E
confi!uration. .AAB> / operation.
5or clcom!ES there should e atleast )* MB free space in 2$ar file
system.
+var+hacm#+clcom!+clcom!'log ??it reFuires ) MB
+var+hacm#+clcom!+clcom!iag'log ??it reFuires 1GMB
Additional 1 MB reFuired for
+var+hacm#+o!mcache directory
clverfify'log also present in 2$ar directory
+var+hacm#+clverify+current++. contains lo! for mcurrent
execution of cl$erify
+var+hacm#+clverify+#ass++. contains lo!s from the last
passed $erification
+var+hacm#+clverify+#ass'#rev++. contains lo! from the
second last passed $erification
Ste#s / to /0 to configure HACMP
3teps to confi!ure HACMP0
1. Install the nodes, ma%e sure the redundancy is maintained for
po#er supplies, n2# and
fier n2#s. 'hen Install AI- on the nodes.
). Install all the HACMP filesets except HA$ie# and HA'i$oli.
Install all the B3C' filesets from the AI- ase CA.
Ma%e sure that the AI-, HACMP patches and ser$er code are at the
latest le$el .ideally
recommended/.
O. Chec% for fileset os.cl$m to e present on oth the nodes. 'his
is reFuired to ma%e the
&Ls enhanced concurrent capale.
5. &.IMP0 Beoot oth the nodes after installin! the HACMP filesets.
+. Confi!ure shared stora!e on oth the nodes. Also in case of a
dis% hearteat, assi!n a
1LB shared stora!e "(, on oth nodes.
J. Create the reFuired &Ls only on the first node. 'he &Ls can e
either normal &Ls or
>nhanced concurrent &Ls. Assi!n particular ma:or numer to each
&Ls #hile creatin!
the &Ls. Becord the ma:or no. information.
'o chec% the Ma:ar no. use the command0
ls 6lrt 2de$ !rep
Mount automatically at system restart should e set to ,4.
G. &aryon the &Ls that #as :ust created.
Y. &.IMP0 Create lo! "& on each &L first efore creatin! any ne#
"&. Li$e a uniFue
name to lo!"&.
Aestroy the content of lo!"& y0 lo!form 2de$2lo!l$name
Bepeat this step for all &Ls that #ere created.
1*. Create all the necessary "&s on each &L.
11. Create all the necessary file systems on each "& created<..you
can create mount pts
as per the reFuirement of the customer,
Mount automatically at system restart should e set to ,4.
1). umount all the filesystems and $aryoff all the &Ls.
1D. ch$! 6an All &Ls #ill e set to do not mount automatically
at _???
3ystem restart.
1O. Lo to node ) and run cf!m!r 6$ to import the shared $olumes.
15. Import all the &Ls on node )
use smitty import$! import #ith the same ma:or numer as
assi!ned on node _?????
1+. Bun ch$! 6an for all &Ls on node ).
1J. &.IMP0 Identify the oot1, oot), ser$ice ip and persistent ip for
oth the nodes
and ma%e the entry in the 2etc2hosts.
Ste# /1 to configure HACMP
1G. Aefine cluster name.
Ste# /2 &efine Cluster o!es
1Y. Aefine the cluster nodes. Tsmitty hacmp ?Z >xtended
Confi!uration ?Z >xtended topolo!y confi!uration ?Z Confi!ure an
HACMP node ? Z Add a node to an HACMP cluster Aefine oth the
nodes on after the other.
Ste#34 &iscover HACMP config for et(or) settings
)). Aisco$er HACMP confi!0 'his #ill import for oth nodes all the
node info, oot ips,
ser$ice ips from the 2etc2hosts
smitty hacmp ?Z >xtended confi!urations ?Z Aisco$er hacmp
related information
Ste#3/ A!!ing Communication interface
Add HACMP communication interfaces. .>ther interfaces./
smitty hacmp ?Z >xtended Confi!uration ?Z >xtended 'opolo!y
Confi!uration ?Z
Confi!ure HACMP net#or%s ?Z Add a net#or% to the HACMP cluster.
3elect ether and Press enter.
'hen select dis%h and Press enter. Ais%h is your non?tcpip
hearteat.
ste# 33 A!!ing !evice for &is) Heart 5eat
Include the interfaces2de$ices in the ether n2# and dis%h already
defined.
smitty hacmp ?Z >xtended Confi!uration ?Z >xtended 'opolo!y
Confi!uration ?Z
Confi!ure HACMP communication interfaces2de$ices ?Z Add
communication
Interfaces2de$ices.
ste# 36 A!!ing #ersistent %P
Add the persistent IPs0
smitty hacmp ?Z >xtended Confi!uration ?Z >xtended 'opolo!y
Confi!uration ?Z
Confi!ure HACMP persistent nodes IP lael2Addresses
ste# 37 A!!ing Persistent %P labels
Add a persistent ip lael for oth nodes.
ste# 38 &efining %P labels
Aefine the ser$ice IP laels for oth nodes.
smitty hacmp ?Z >xtended Confi!uration ?Z >xtended Besource
Confi!uration ?Z
HACMP extended resource confi!uration ?Z Confi!ure HACMP
ser$ice IP lael
ste# 30 A!!ing $esource 9rou#
Add Besource Lroups0
smitty hacmp ?Z >xtended Confi!uration ?Z >xtended Besource
Confi!uration ?Z
HACMP extended resource !roup confi!uration
Continue similarly for all the resource !roups.
'he node selected first #hile definin! the resource !roup #ill e the
primary o#ner of
that resource !roup. 'he node after that is secondary node.
Ma%e sure you set primary node correctly for each resource !roup.
Also set the failo$er2fallac% policies as per the reFuirement of the
setup
Ste# ,3 : ,, Chec) for cluster Stabili;e : <9 varie!
on
7ait for the cluster to staili8e. [ou can chec% #hen the cluster is
up y follo#in!
commands
a. netstat 6i
. ifconfi! 6a 0 loo%?out for ser$ice ip. It #ill sho# on each node if
the cluster is up.
Chec% #hether the &Ls under clusterNs BLs are $aried?4, and the
filesystems in the
&Ls are mounted after the cluster start.
Here test1$! and test)$! are &Ls #hich are $aried?4, #hen the
cluster is started and
5ilesystems 2test) and 2testD are mounted #hen the cluster starts.
2test) and 2testD are in test)$! #hich is part of the BL #hich is
o#ned y this node.
D). Perform all the tests such as resource ta%e?o$er, node failure,
n2# failure and $erify
the cluster efore releasin! the system to the customer.
Posted y 3antosh Lupta at G01G AM * comments "in%s to this post
ste# ,4 : ,/ Synchroni;e : start Cluster
3ynchroni8e the cluster0
'his #ill sync the info from one node to second node.
3mitty clXsync
'hatNs it. ,o# you are ready to start the cluster.
3mitty clstart
[ou can start the cluster to!ether on oth nodes or start indi$idually
on each node.
[ou can start the cluster to!ether on oth nodes or start indi$idually
on each node.
ste# 32 A!!ing %P label : $9 o(ne! by o!e
Add the ser$ice IP lael for the o#ner node and also the &Ls o#ned
y the o#ner node
4f this resource !roup.
Continue similarly for all the resource !roups.
Posted y 3antosh Lupta at G01) AM * comments "in%s to this post
ste# 31 Setting attributes of $esource grou#
3et attriutes of the resource !roups already defined0
Here you ha$e to actually assi!n the resources to the resource
!roups.
smitty hacmp ?Z >xtended Confi!uration ?Z >xtended Besource
Confi!uration ?Z
HACMP extended resource !roup confi!uration
Posted y 3antosh Lupta at G011 AM * comments "in%s to this post
ste# 30 A!!ing $esource 9rou#
Add Besource Lroups0
smitty hacmp ?Z >xtended Confi!uration ?Z >xtended Besource
Confi!uration ?Z
HACMP extended resource !roup confi!uration
Continue similarly for all the resource !roups.
'he node selected first #hile definin! the resource !roup #ill e the
primary o#ner of
that resource !roup. 'he node after that is secondary node.
Ma%e sure you set primary node correctly for each resource !roup.
Also set the failo$er2fallac% policies as per the reFuirement of the
setup
Posted y 3antosh Lupta at G0*+ AM * comments "in%s to this post
ste# 38 &efining %P labels
Aefine the ser$ice IP laels for oth nodes.
smitty hacmp ?Z >xtended Confi!uration ?Z >xtended Besource
Confi!uration ?Z
HACMP extended resource confi!uration ?Z Confi!ure HACMP
ser$ice IP lael
Posted y 3antosh Lupta at G0*5 AM * comments "in%s to this post
ste# 37 A!!ing Persistent %P labels
Add a persistent ip lael for oth nodes.
Posted y 3antosh Lupta at G0*1 AM * comments "in%s to this post
ste# 36 A!!ing #ersistent %P
Add the persistent IPs0
smitty hacmp ?Z >xtended Confi!uration ?Z >xtended 'opolo!y
Confi!uration ?Z
Confi!ure HACMP persistent nodes IP lael2Addresses
Posted y 3antosh Lupta at J05J AM * comments "in%s to this post
ste#3, A!!ing boot %P : &is) heart beat information
Include all the four oot ips .) for each nodes/ in this ether
interface already defined.'hen include the dis% for hearteat on
oth the nodes in the dis%h already defined
Posted y 3antosh Lupta at J051 AM * comments "in%s to this post
ste# 33 A!!ing !evice for &is) Heart 5eat
Include the interfaces2de$ices in the ether n2# and dis%h already
defined.
smitty hacmp ?Z >xtended Confi!uration ?Z >xtended 'opolo!y
Confi!uration ?Z
Confi!ure HACMP communication interfaces2de$ices ?Z Add
communication
Interfaces2de$ices.
Posted y 3antosh Lupta at J0OJ AM * comments "in%s to this post
Ste#3/ A!!ing Communication interface
Add HACMP communication interfaces. .>ther interfaces./
smitty hacmp ?Z >xtended Confi!uration ?Z >xtended 'opolo!y
Confi!uration ?Z
Confi!ure HACMP net#or%s ?Z Add a net#or% to the HACMP cluster.
3elect ether and Press enter.
'hen select dis%h and Press enter. Ais%h is your non?tcpip
hearteat.
Posted y 3antosh Lupta at J0O5 AM * comments "in%s to this post
Ste#34 &iscover HACMP config for et(or) settings
)). Aisco$er HACMP confi!0 'his #ill import for oth nodes all the
node info, oot ips,
ser$ice ips from the 2etc2hosts
smitty hacmp ?Z >xtended confi!urations ?Z Aisco$er hacmp
related information
Posted y 3antosh Lupta at J0O1 AM * comments "in%s to this post
Ste# /2 &efine Cluster o!es
1Y. Aefine the cluster nodes. Tsmitty hacmp ?Z >xtended
Confi!uration ?Z >xtended topolo!y confi!uration ?Z Confi!ure an
HACMP node ? Z Add a node to an HACMP cluster Aefine oth the
nodes on after the other.
Posted y 3antosh Lupta at J0DY AM * comments "in%s to this post
Thursday, February 28, 2008
Ste# /1 to configure HACMP
1G. Aefine cluster name.
Ste#s / to /0 to configure HACMP
3teps to confi!ure HACMP0
1. Install the nodes, ma%e sure the redundancy is maintained for
po#er supplies, n2# and
fier n2#s. 'hen Install AI- on the nodes.
). Install all the HACMP filesets except HA$ie# and HA'i$oli.
Install all the B3C' filesets from the AI- ase CA.
Ma%e sure that the AI-, HACMP patches and ser$er code are at the
latest le$el .ideally
recommended/.
O. Chec% for fileset os.cl$m to e present on oth the nodes. 'his
is reFuired to ma%e the
&Ls enhanced concurrent capale.
5. &.IMP0 Beoot oth the nodes after installin! the HACMP filesets.
+. Confi!ure shared stora!e on oth the nodes. Also in case of a
dis% hearteat, assi!n a
1LB shared stora!e "(, on oth nodes.
J. Create the reFuired &Ls only on the first node. 'he &Ls can e
either normal &Ls or
>nhanced concurrent &Ls. Assi!n particular ma:or numer to each
&Ls #hile creatin!
the &Ls. Becord the ma:or no. information.
'o chec% the Ma:ar no. use the command0
ls 6lrt 2de$ !rep
Mount automatically at system restart should e set to ,4.
G. &aryon the &Ls that #as :ust created.
Y. &.IMP0 Create lo! "& on each &L first efore creatin! any ne#
"&. Li$e a uniFue
name to lo!"&.
Aestroy the content of lo!"& y0 lo!form 2de$2lo!l$name
Bepeat this step for all &Ls that #ere created.
1*. Create all the necessary "&s on each &L.
11. Create all the necessary file systems on each "& created<..you
can create mount pts
as per the reFuirement of the customer,
Mount automatically at system restart should e set to ,4.
1). umount all the filesystems and $aryoff all the &Ls.
1D. ch$! 6an All &Ls #ill e set to do not mount automatically
at _???
3ystem restart.
1O. Lo to node ) and run cf!m!r 6$ to import the shared $olumes.
15. Import all the &Ls on node )
use smitty import$! import #ith the same ma:or numer as
assi!ned on node _?????
1+. Bun ch$! 6an for all &Ls on node ).
1J. &.IMP0 Identify the oot1, oot), ser$ice ip and persistent ip for
oth the nodes
and ma%e the entry in the 2etc2hosts.
HACMP v7'- &is) Heartbeat !evice
configuration
Creatin! a Ais% Hearteat de$ice in HACMP $5.x
Introduction
'his document is intended to supplement existin! documentation on
ho# to confi!ure, test, and monitor a dis% hearteat de$ice and
net#or% in HACMP2>3 & 5.x. 'his feature is ne# in &5.1, and it
pro$ides another alternati$e for non?ip ased hearteats. 'he intent
of this document is to pro$ide step?y?step directions as they are
currently s%etchy in the HACMP $5.1 pus. 'his #ill hopefully clarify
se$eral misconceptions that ha$e een rou!ht to my attention.
'his example consists of a t#o?node cluster .nodes L'O* @ 3"55/
#ith shared >33 $path de$ices. If more than t#o nodes exist in
your cluster, you #ill need , numer or non?ip hearteat net#or%s.
7here , represents the numer of nodes in the cluster. .i.e. three
node cluster reFuires D non?ip hearteat net#or%s/. 'his creates a
hearteat rin!.
ItNs #orth notin! that one should not confuse concurrent $olume
!roups #ith concurrent resource !roups. And note, there is a
difference et#een concurrent $olume !roups and enhanced
concurrent $olume !roups. A concurrent resource !roup is one
#hich may e acti$e on more than one node at a type. A concurrent
$olume !roup also shares the characteristic that it may e acti$e on
more than one node at a time. 'his is also true for an enhanced
concurrent &LC ho#e$er, in a non?concurrent resource !roup, the
enhanced concurrent &L, #hile it may e acti$e and not ha$e a
3C3I reser$e residin! on the dis%, itNs data is only normally
accessed y one system at a time.
Pre?BeFs
In this document, it is assumed that the shared stora!e de$ices are
already made a$ailale and confi!ured to AI-, and that the proper
le$els of B3C' and HACMP are already installed. 3ince utili8in!
enhanced?concurrent $olume !roups, it is also necessary to ma%e
sure that os.cl$m.enh is installed. 'his is not normally installed as
part of a HACMP installation $ia the installp command.
Ais% Hearteat Aetails
'his pro$ides the aility to use existin! shared dis%s, re!ardless of
dis% type, to pro$ide a serial net#or% li%e hearteat path. A enefit
of this is that one need not dedicate the inte!rated serial ports for
HACMP hearteats .if supported on the su:ect systems/ or
purchase an G?port asynchronous adapter.
'his feature utili8es a special area on the dis% pre$iously reser$ed
for ;Concurrent Capale= $olume !roups .traditionally only for 33A
dis%s/. 3ince AI- 5.) dropped support for the 33A concurrent
$olume !roups, this fit ma%es it a$ailale for use. 'his also means
that the dis% chosen for serial hearteat can e part of a data
$olume !roup. .,ote Performance Concerns elo#/
'he dis% heart eatin! code #ent into the ).).1.D* $ersion of B3C'.
3ome recommended APABs rin! that to ).).1.D1. If you'$e !ot
that le$el installed, and HACMP 5.1, you can use dis% heart eatin!.
'he rele$ant file to loo% for is 2usr2sin2rsct2in2hatsXdis%hXnim.
'hou!h it is supported mainly throu!h B3C', #e recommend AI-
5.) #hen utili8in! dis% hearteat.
'o use dis% hearteats, no node can issue a 3C3I reser$e for the
dis%. 'his is ecause oth nodes usin! it for heart eatin! must e
ale to read and #rite to that dis%. It is sufficient that the dis% e in
an enhanced concurrent $olume !roup to meet this reFuirement. .It
should also e possile to use a dis% that is in no $olume !roup for
dis% heart eatin!. B3C' certainly #on't careC ut HACMP 3MI'
panels may not e particularly helpful in settin! this up./
,o#, in HACMP 5.1 #ith AI- 5.1, enhanced concurrent mode
$olume !roups can e used only in concurrent .or Eonline on all
a$ailale nodesE/ resource !roups. 'his means that dis% heart
eatin! is useful only to people runnin! concurrent confi!urations,
or #ho can allocate such a $olume !roup2dis% .#hich is certainly
possile, thou!h perhaps an expensi$e approach/. In other #ords,
at HACMP 5.1 and AI- 5.1, typical HACMP clusters .#ith a ser$er
and idle standy/ #ill reFuire an additional concurrent resource
!roup #ith a dis% in an enhanced concurrent &L dedicated for
hearteat use. At AI- 5.), dis% hearteats can exist on an enhanced
concurrent &L that resides in a non?concurrent resource !roup. At
AI- 5.), one may also use the fast dis% ta%eo$er feature in non?
concurrent resource !roups #ith enhanced concurrent $olume
!roups. 7ith HACMP 5.1 and AI- 5.), enhanced concurrent mode
$olume !roups can e used in serial access confi!urations for fast
dis% ta%eo$er, alon! #ith dis% heart eatin!. .AI- 5.) reFuires B3C'
).D.1.* or later/ 'hat is, the facility ecomes usale to the a$era!e
customer, #ithout committment of additional resource, since dis%
heart eatin! can occur on a $olume !roup used for ordinary
filesystem and lo!ical $olume acti$ity.
Performance Concerns #ith Ais% Heart Beatin!
Most modern dis%s ta%e some#here around 15 milliseconds to
ser$ice an I4 reFuest, #hich means that they can't do much more
than +* see%s per second. 'he sectors used for dis% heart eatin!
are part of the &LAA, #hich is at the outer ed!e of the dis%, and
may not e near the application data. 'his means that e$ery time a
dis% heart eat is done, a see% #ill ha$e to e done. Ais% heart
eatin! #ill typically .#ith the default parameters/ reFuire four .O/
see%s per second. 'hat is each of t#o nodes #ill #rite to the dis%
and read from the dis% once2second, for a total of O I4P3. 3o, if
possile, a dis% should e selected as a heart eat path that does
not normally do more than aout 5* see%s per second. 'he filemon
tool can e used to monitor the see% acti$ity on a dis%.
In cases #here a dis% must e used for heart eatin! that already
has a hi!h see% rate, it may e necessary to chan!e the heart eat
timin! parameters to pre$ent lon! #rite delays from ein! seen as
a failure.
'he ao$e cautions as stated apply to MB4A confi!urations, and
should e modified ased on the technolo!y of the dis% susystem0
1 If the dis% used for heart eatin! is in a controller that pro$ides
lar!e amounts of cache ? such as the >33 ? the numer of see%s per
second can e much lar!er
1 If the dis% used for heart eatin! is part of a BAIA set #ithout a
cachin! front end controller, the dis% may e ale to support fe#er
see%s, due to the extra acti$ity reFuired y BAIA operations
Pros @ Cons of usin! Ais% Heart Beatin!
Pros0
1. ,o additional hard#are needed.
). >asier to span !reater distances.
D. ,o loss in usale stora!e space and can use existin! data $olume
!roups.
O. (ses enhanced concurrent $!s #hich also allo#s for fast?dis%
ta%eo$er.
Cons0
1. Must e a#are of the de$ices dis%h uses and administer de$ices
properlyP
). "ose the forced do#n option of stoppin! cluster ser$ices ecause
of enhanced concurrent $! usa!e.
PI ha$e had a customer delete all their dis% definitions and run
cf!m!r a!ain to clean up numer holes in their de$ice definition list.
7hen they did, o$iously , the de$ice names did not come ac% in
the same order as they #ere efore. 3o the dis%h de$ice assi!ned
to HACMP, #as no lon!er $alid as a different de$ice #as confi!ured
usin! the old de$ice name and it #as not part of an enhanced
concurrent $!. Hence dis%h no lon!er #or%ed, and since the
customer did not monitor their cluster either, they #ere una#are
that the dis%h no lon!er #or%ed.
Confi!urin! Ais% Hearteat
As mentioned pre$iously, dis% hearteat utili8es enhanced?
concurrent $olume !roups. If startin! #ith a ne# confi!uration of
dis%s, you #ill #ant to create enhanced?concurrent $olume !roups,
either manually, or y utili8in! C?3P4C. My example sho#s usin! C?
3P4C #hich is the est practice to use here.
If you plan to use an existin! $olume !roup for dis% hearteats that
is not enhanced concurrent, then you #ill ha$e to con$ert them to
such usin! the ch$! command. 7e recommend that the &L e
acti$e on only one node, and that the application not e runnin!
#hen ma%in! this chan!e run ch$! 6C $!name to chan!e the &L to
enhanced concurrent mode. &ary it off, then run import$! 6"
$!name on the other node to ma%e it a#are that the $! is no#
enhanced concurrent capale. If usin! this method, you can s%ip to
the ;Creatin! Ais% Hearteat Ae$ices and ,et#or%= section of this
document.
Ais% and &L Preparation
'o e ale to use C?3P4C successfully, it is reFuired that some
asic IP ased topolo!y already exists, and that the stora!e de$ices
ha$e their P&IAs in oth systemNs 4AMs. 'his can e $erified y
runnin! lsp$ on each system. If a P&IA does not exist on each
system, it is necessary to run chde$ ?l ?a p$Qyes on each system.
'his #ill allo# C?3P4C to match up the de$ice.s/ as %no#n shared
stora!e de$ices.
In this example, $path* on L'O* is the same $irtual dis% as $pathD
on 3"55.
(se C?3P4C to create an >nhanced Concurrent $olume !roup. In
the follo#in! example, since $path de$ices are ein! used, the
follo#in! smit screen paths #ere used.
Lo to HACMP Concurrent "o!ical &olumesmitty clXadmin Create a
Concurrent &olume Lroup #ith Aata Concurrent &olume
LroupsMana!ement Path Ae$ices and press >nter
Choose the appropriate nodes, and then choose the appropriate
shared stora!e de$ices ased on p$ids .$path* and $pathD in this
example/. Choose a name for the &L , desired PP si8e, ma%e sure
that >nhanced Concurrent Mode is set to true and press >nter.
.enhconc$! in this example/. 'his #ill create the shared enhanced?
concurrent $! needed for our dis% hearteat. .
ItNs a !ood idea to $erify $ia lsp$ once this has completed to ma%e
sure the de$ice and $! is sho# appropriately as follo#s0
L'O*T2 lsp$
$path* ***aJf5afJGe*cfO enhconc$!
3"55T2lsp$
$pathD ***aJf5afJGe*cfO enhconc$!
Creatin! Ais% Hearteat Ae$ices and ,et#or%
'here are t#o different #ays to do this. 3ince #e ha$e already
created the enhanced concurrent $!, #e can use the disco$ery
method .1/ and let HA find it for us. 4r #e can do this manually $ia
the Pre?defined de$ices method .)/. 5ollo#in! is an example of
each.
1/ Creatin! $ia Aisco$er Method0 .3ee ,ote/
>xtended>nter smitty hacmp PressAisco$er HACMP?related
Information from Confi!ured ,odesConfi!uration >nter
'his #ill run automatically and create a clipXconfi! file that contains
the information it has disco$ered. 4nce completed, !o ac% to the
>xtended Confi!uration menu and chose0
>xtended 'opolo!y Add CommunicationConfi!ure HACMP
Communication Interfaces2Ae$icesConfi!uration Add Aisco$ered
Communication Interface andInterfaces2Ae$ices Choose
appropriate de$ices .ex. $path* andCommunication Ae$ices
Ae$ices $pathD/
3elect Point?to?Point Pair of Aisco$ered Communication Ae$ices to
Add
Mo$e cursor to desired item and press 5J. (se arro# %eys to scroll.
4,> 4B M4B> items can e selected.
Press >nter A5'>B ma%in! all selections.
T ,ode Ae$ice Ae$ice Path P$id
Z nodeL'O* $path* 2de$2$path* ***aJf5afJG
Z node3"55 $pathD 2de$2$pathD ***aJf5afJG
,ote0 Base HA 5.1 appears to ha$e a prolem #hen usin! the
Aisco$ered Ae$ices this method. If you !et this error0 E>BB4B0
In$alid node name ***aJf5afJGe*cfOE.
'hen you #ill need apar I[515YO. 4ther#ise you #ill ha$e to create
$ia the Pre?Aefined Ae$ices method. 4nce corrected, this section
#ill e completed
)/ Creatin! $ia Pre?Aefined Ae$ices Method
7hen usin! this method, it is necessary to create a dis%h net#or%
first, then assi!n the dis%?node pair de$ices to the net#or%. Create
the dis%h net#or% as follo#s0
>xtended 'opolo!y >xtended Confi!uration smitty hacmp Add a
,et#or% to the HACMP cluster Confi!ure HACMP ,et#or%s
Confi!uration >nter desired net#or% name .ex. dis%net1/??
presschoose dis%h >nter
>xtended 'opolo!y >xtended Confi!uration smitty hacmp Add
Confi!ure HACMP Communication Interfaces2Ae$ices Confi!uration
Add Pre?Aefined Communication Interfaces andCommunication
Interfaces2Ae$ices Ae$ices
Choose your dis%h ,et#or% ,ame Communication Ae$ices
Add a Communication Ae$ice
'ype or select $alues in entry fields.
Press >nter A5'>B ma%in! all desired chan!es.
V>ntry 5ieldsW
P Ae$ice ,ame VL'O*Xho$erdis%W
P ,et#or% 'ype dis%h
P ,et#or% ,ame dis%net1
P Ae$ice Path V2de$2$path*W
P ,ode ,ame VL'O*W
5or Ae$ice ,ame, that is a uniFue name you can chose. It #ill sho#
up in your topolo!y under this name, much li%e serial hearteat and
ttys ha$e in the past.
5or the Ae$ice Path, you #ant to put in 2de$2. 'hen choose the
correspondin! node for this de$ice and de$ice name .ex. L'O*/.
'hen press >nter.
[ou #ill repeat this process for the other node .ex. 3"55/ and the
other de$ice .$pathD/. 'his #ill complete oth de$ices for the
dis%h net#or%.
'estin! Ais% Hearteat Connecti$ity
4nce the de$ice and net#or% definitions ha$e een created, it is a
!ood idea to test it and ma%e sure communications is #or%in!
properly. If the $olume !roup is $aried on in normal mode on one of
the nodes, the test #ill proaly not #or%.
2usr2sin2rsct2in2dhXread is used to test the $alidity of a dis%h
connection. 'he usa!e of dhXread is as follo#s0
dhXread ?p de$icename 22dump dis%h sector contents
dhXread ?p de$icename ?r 22recei$e data o$er dis%h net#or%
dhXread ?p de$icename ?t 22transmit data o$er dis%h net#or%
'o test that dis%net1, in the example confi!uration, can
communicate from nodeB.ex. 3"55/ to nodeA .ex. L'O*/, you
#ould run the follo#in! commands0
4n nodeA, enter0
dhXread ?p r$path* ?r
4n nodeB, enter0
dhXread ?p r$pathD ?t
,ote0 'hat the de$ice name is ra# de$ice as desi!nated #ith the ;r=
proceedin! the de$ice name.
If the lin% from nodeB to nodeA is operational, oth nodes #ill
display0
"in% operatin! normally.
[ou can run this a!ain and s#ap #hich node transmits and #hich
one recei$es. 'o ma%e the net#or% acti$e, it is necessary to sync up
the cluster. 3ince the $olume !roup has not een added to the
resource !roup, #e #ill sync up once instead of t#ice.
Add 3hared Ais% as a 3hared Besource
In most cases you #ould ha$e your dis%h de$ice on a shared data
$!. It is necessary to add that $! into your resource !roup and
synchroni8e the cluster.
>xtended >xtended Confi!urationsmitty hacmp Besource
Confi!uration Z Chan!e23ho#>xtended Besource Lroup
Confi!uration and press >nter.Besources and Attriutes for a
Besource Lroup
Choose the appropriate resource !roup, enter the ne# $!
.enhconc$!/ into the $olume !roup list and press >nter.
Beturn to the top of the >xtended Confi!uration menu and
synchroni8e the cluster.
Monitor Ais% Hearteat
4nce the cluster is up and runnin!, you can monitor the acti$ity of
the dis% .actually all/ hearteats $ia lssrc ?ls tops$cs. An example of
the output follo#s0
3usystem Lroup PIA 3tatus
tops$cs tops$cs D)1*G acti$e
,et#or% ,ame Indx Aefd Mrs 3t Adapter IA Lroup IA
dis%net1 V DW ) ) 3 )55.)55.1*.* )55.)55.1*.1
dis%net1 V DW r$pathD *xG+cd1*) *xG+cd1Of
HB Inter$al Q ) secs. 3ensiti$ity Q O missed eats
Missed HBs0 'otal0 * Current !roup0 *
Pac%ets sent 0 ))Y ICMP * >rrors0 * ,o muf0 *
Pac%ets recei$ed0 )1J ICMP * Aropped0 *
,IM's PIA0 )GJ)O
Be a#are that there is a !race period for hearteats to start
processin!. 'his is normally around +* seconds. 3o if you run this
command Fuic%ly after startin! the cluster, you may not see
anythin! at all until hearteat processin! is started after the !race
period time has elapsed.
HACMP failover scenario
HA failover scenarios
1. 9raceful
5or !raceful failo$er, you can run ;smitty clstop= then select
!raceful option. 'his #ill not chan!e anythin! except stoppin! the
cluster on that node.
,ote0 If you stop the cluster, chec% the status usin! lssrc 6! cluster,
sometimes clstrm!r>3 daemon #ill ta%e lon! time to stop, A4 ,4'
\I"" 'HI3 AA>M4,.It #ill stop automatically after a #hile.
[ou can do this on oth the nodes
). =a)eover
5or ta%eo$er, run ;smitty clstop= #ith ta%eo$er option, this #ill stop
the cluster on that node and the standy node #ill ta%e o$er the
pa%a!e
[ou can do this on oth the nodes
D. Soft Pa)c)age Failover
Bun smitty
cmXhacmpXresourceX!roupXandXapplicationXmana!ementXmenu
ZZZMo$e a Besource Lroup to Another ,ode ZZZZselect the
pac%a!e name and node name ZZZenter
'his #ill mo$e the pac%a!e from that node to the node that you
ha$e selected in the ao$e menu. 'his method #ill !i$e lot of
troules in HA O.5 #hereas it runs !ood on HA 5.) unless #e ha$e
any apps startup issues.
[ou can do this on oth the nodes

O. Failover et(or) A!a#ter(s)>
5or this type of testin! , run ;ifconfi! enx do#n= , then pac%a!e IP
#ill failo$er to primary adapter. [ou can not e$en see any outa!e or
anythin!.
7e can manually .ifconfi! enx up/ rin! it ac% to ori!inal adapter ,
ut etter to reoot the ser$er to rin! the pac%a!e ac% to the
ori!inal node
5. Har!(are Failure (crash/0
'his is a standard type of testin!C run the command ;reoot 6F=
then the node #ill !odo#n #ithout stoppin! any apps and come up
immediately. 'he pac%a!e #ill failo$er to the standy node #ith in )
min os do#ntime .>$en tou!h HA failo$er is fast, some apps #ill
ta%e lon! time to start
Posted y 3antosh Lupta at G0D1 AM * comments "in%s to this post
Friday, February 15, 2008
S#ecifying the !efault gate(ay on a s#ecific interface
in HACMP
3pecifyin! the default !ate#ay on a specific interface
7hen you're usin! HACMP, you usually ha$e multiple net#or%
adapters installed and thus multiple net#or% interface to handle
#ith. If AI- confi!ured the default !ate#ay on a #ron! interface
.li%e on your mana!ement interface instead of the oot interface/,
you mi!ht #ant to chan!e this, so net#or% traffic isn't sent o$er the
mana!ement interface. Here's ho# you can do this0
5irst, stop HACMP or do a ta%e?o$er of the resource !roups to
another nodeC this #ill a$oid any prolems #ith applications #hen
you start fiddlin! #ith the net#or% confi!uration.
'hen open up a $irtual terminal #indo# to the host on your HMC.
4ther#ise you #ould loose the connection, as soon as you drop the
current default !ate#ay.
,o# you need to determine #here your current default !ate#ay is
confi!ured. [ou can do this y typin!0 lsattr ?>l inet* and netstat
?nr. 'he lsattr command #ill sho# you the current default !ate#ay
route and the netstat command #ill sho# you the interface it is
confi!ured on. [ou can also chec% the 4AM0 odm!et
?FEattriuteQrouteE CuAt.
,o#, delete the default !ate#ay li%e this0
lsattr ?>l inet* K a#% 'R) ] 2hopcount2 S print R) U' K read L7
chde$ ?l inet* ?a delrouteQRSL7U
If you #ould no# use the route command to specifiy the default
!ate#ay on a specific interface, li%e this0
route add * Vip address of default !ate#ay0 xxx.xxx.xxx.)5OW ?if
en-
[ou #ill ha$e a #or%in! entry for the default !ate#ay. But... the
route command does not chan!e anythin! in the 4AM. As soon as
your system reootsC the default !ate#ay is !one a!ain. ,ot a !ood
idea.
A etter solution is to use the chde$ command0
chde$ ?l inet* ?a addrouteQnet,?hopcount,*,,*,Vip address of default
!ate#ayW
'his #ill set the default !ate#ay to the first interface a$ailale.
'o specify the interface use0
chde$ ?l inet* ?a addrouteQnet,?hopcount,*,if,en-,,*,Vip address of
default !ate#ayW
3ustitute the correct interface for en- in the command ao$e.
If you pre$iously used the route add command, and after that you
use chde$ to enter the default !ate#ay, then this #ill fail. [ou ha$e
to delete it first y usin! route delete *, and then !i$e the chde$
command.
After#ards, chec% #ith lsattr ?>l inet* and odm!et
?FEattriuteQrouteE CuAt if the ne# default !ate#ay is properly
confi!ured. And ofcourse, try to pin! the IP address of the default
!ate#ay and some outside address. ,o# reoot your system and
chec% if the default !ate#ay remains confi!ured on the correct
interface. And startup HACMP a!ainI
HACMP to#ology : usefull comman!s
Hacmp can e confi!ured in D #ays.
1. Botatin!
). Cascadin!
D. Mutual 5ailo$er
'he cascadin! and rotatin! resource !roups are the ;classic=, pre?
HA 5.1 types. 'he ne# ;custom= type of resource !roup has een
introduced in HA 5.1 on#ards.
Casca!ing resource grou#0
(pon node failure, a cascadin! resource !roup falls o$er to the
a$ailale node #ith the next priority in the node priority list.
(pon node reinte!ration into the cluster, a cascadin! resource
!roup falls ac% to its home node y default.
Casca!ing (ithout fallbac)
'hisoption, this means #hene$er a primary node fails, the pac%a!e
#ill failo$er to the next a$ailale node in the list and #hen the
primary node comes online then the pac%a!e #ill not fallac%
automatically. 7e need to mo$e pac%a!e to its home node at a
con$enient time.
$otating resource grou#0
'his is almost similar to Cascadin! #ithout fallac%, #hene$er
pac%a!e failo$er to the standy nodes it #ill ne$er fallac% to the
primary node automatically, #e need to mo$e it manually at our
con$enience.
Mutual ta)eover0
Mutual ta%eo$er option, #hich means oth the nodes in this type
are acti$e?acti$e mode. 7hene$er fail o$er happens the pac%a!e on
the failed node #ill mo$e to the other acti$e node and #ill run #ith
already existin! pac%a!e. 4nce the failed node comes online #e can
mo$e the pac%a!e manually to that node.
"seful HACMP comman!s
clstat ? sho# cluster state and sustateC needs clinfo.
cldump ? 3,MP?ased tool to sho# cluster state
cldisp ? similar to cldump, perl script to sho# cluster state.
cltopinfo ? list the local $ie# of the cluster topolo!y.
clsho#sr$ ?a ? list the local $ie# of the cluster susystems.
clfindres .?s/ ? locate the resource !roups and display status.
clBLinfo ?$ ? locate the resource !roups and display status.
clcycle ? rotate some of the lo! files.
clXpin! ? a cluster pin! pro!ram #ith more ar!uments.
clrsh ? cluster rsh pro!ram that ta%e cluster node names as
ar!ument.
cl!etacti$enodes ? #hich nodes are acti$e9
!etXlocalXnodename ? #hat is the name of the local node9
clconfi! ? chec% the HACMP 4AM.
clBLmo$e ? online2offline or mo$e resource !roups.
cldare ? sync2fix the cluster.
clls!rp ? list the resource !roups.
clsnapshotinfo ? create a lar!e snapshot of the hacmp confi!uration.
cllscf ? list the net#or% confi!uration of an hacmp cluster.
clsho#res ? sho# the resource !roup confi!uration.
cllsif ? sho# net#or% interface information.
cllsres ? sho# short resource !roup information.
lssrc ?ls clstrm!r>3 ? list the cluster mana!er state.
lssrc ?ls tops$cs ? sho# hearteat information.
cllsnode ? list a node centric o$er$ie# of the hacmp confi!uration.
HACMP 5asics
HACMP Basics
History
IBM's HACMP exists for almost 15 years. It's not actually an IBM
product, they ou!ht it from C"AM, #hich #as later renamed to
A$ailant and is no# called "a%e&ie#'ech. (ntil au!ust )**+, all
de$elopment of HACMP #as done y C"AM. ,o#adays IBM does it's
o#n de$elopment of HACMP in Austin, Pou!h%eepsie and Ban!alore
IBM's hi!h a$ailaility solution for AI-, Hi!h A$ailaility Cluster Multi
Processin! .HACMP/, consists of t#o components0
1High Availability0 'he process of ensurin! an application is
a$ailale for use throu!h the use of duplicated and2or shared
resources .eliminatin! 3in!le Points 4f 5ailure 6 3P45's/
.Cluster Multi-Processing0 Multiple applications runnin! on the
same nodes #ith shared or concurrent access to the data.
A hi!h a$ailaility solution ased on HACMP pro$ides automated
failure detection, dia!nosis, application reco$ery and node
reinte!ration. 7ith an appropriate application, HACMP can also
pro$ide concurrent access to the data for parallel processin!
applications, thus offerin! excellent hori8ontal scalaility.
7hat needs to e protected9 (ltimately, the !oal of any I' solution
in a critical en$ironment is to pro$ide continuous ser$ice and data
protection.
'he Hi!h A$ailaility is :ust one uildin! loc% in achie$in! the
continuous operation !oal. 'he Hi!h A$ailaility is ased on the
a$ailaility hard#are, soft#are .43 and its components/, application
and net#or% components.
'he main o:ecti$e of the HACMP is to eliminate 3in!le Points of
5ailure .3P45's/
;<A fundamental desi!n !oal of .successful/ cluster desi!n is the
elimination of sin!le points of failure (SPOFs)<=
Eliminate Single Point of Failure (SPOF)
Cluster >liminated as a sin!le point of failure
o!e "sing multi#le no!es
Po#er 3ource (sin! Multiple circuits or uninterruptile
,et#or%2adapter (sin! redundant net#or% adapters
,et#or% (sin! multiple net#or%s to connect nodes.
'CP2IP 3usystem (sin! non?IP net#or%s to connect ad:oinin!
nodes @ clients
Ais% adapter (sin! redundant dis% adapter or multiple adapters
Ais% (sin! multiple dis%s #ith mirrorin! or BAIA
Application Add node for ta%eo$erC confi!ure application monitor
Administrator Add ac%up or e$ery $ery detailed operations !uide
3ite Add additional site.
Cluster Com#onents
Here are the recommended practices for important cluster
components.
o!es
HACMP supports clusters of up to D) nodes, #ith any comination of
acti$e and standy nodes. 7hile it
is possile to ha$e all nodes in the cluster runnin! applications .a
confi!uration referred to as Emutual
ta%eo$erE/, the most reliale and a$ailale clusters ha$e at least
one standy node ? one node that is normally
not runnin! any applications, ut is a$ailale to ta%e them o$er in
the e$ent of a failure on an acti$e
node.
Additionally, it is important to pay attention to en$ironmental
considerations. ,odes should not ha$e a
common po#er supply ? #hich may happen if they are placed in a
sin!le rac%. 3imilarly, uildin! a cluster
of nodes that are actually lo!ical partitions ."PABs/ #ith a sin!le
footprint is useful as a test cluster, ut
should not e considered for a$ailaility of production applications.
,odes should e chosen that ha$e sufficient I24 slots to install
redundant net#or% and dis% adapters.
'hat is, t#ice as many slots as #ould e reFuired for sin!le node
operation. 'his naturally su!!ests that
processors #ith small numers of slots should e a$oided. (se of
nodes #ithout redundant adapters
should not e considered est practice. Blades are an outstandin!
example of this. And, :ust as e$ery cluster
resource should ha$e a ac%up, the root $olume !roup in each node
should e mirrored, or e on a
$A%& !evice'
,odes should also e chosen so that #hen the production
applications are run at pea% load, there are still
sufficient CP( cycles and I24 and#idth to allo# HACMP to operate.
'he production application
should e carefully enchmar%ed .preferale/ or modeled .if
enchmar%in! is not feasile/ and nodes chosen
so that they #ill not exceed G5H usy, e$en under the hea$iest
expected load.
,ote that the ta%eo$er node should e si8ed to accommodate all
possile #or%loads0 if there is a sin!le
standy ac%in! up multiple primaries, it must e capale of
ser$icin! multiple #or%loads. 4n hard#are
that supports dynamic "PAB operations, HACMP can e confi!ured
to allocate processors and memory to
a ta%eo$er node efore applications are started. Ho#e$er, these
resources must actually e a$ailale, or
acFuirale throu!h Capacity (p!rade on Aemand. 'he #orst case
situation 6 e.!., all the applications on
a sin!le node 6 must e understood and planned for.
et(or)s
HACMP is a net#or% centric application. HACMP net#or%s not only
pro$ide client access to the applications
ut are used to detect and dia!nose node, net#or% and adapter
failures. 'o do this, HACMP uses
B3C' #hich sends hearteats .(AP pac%ets/ o$er A"" defined
net#or%s. By !atherin! hearteat information
on multiple nodes, HACMP can determine #hat type of failure has
occurred and initiate the appropriate
reco$ery action. Bein! ale to distin!uish et#een certain failures,
for example the failure of a net#or%
and the failure of a node, reFuires a second net#or%I Althou!h this
additional net#or% can e ;IP
ased= it is possile that the entire IP susystem could fail #ithin a
!i$en node. 'herefore, in addition
there should e at least one, ideally t#o, non?IP net#or%s. 5ailure
to implement a non?IP net#or% can potentially
lead to a Partitioned cluster, sometimes referred to as '3plit Brain'
3yndrome. 'his situation can
occur if the IP net#or%.s/ et#een nodes ecomes se$ered or in
some cases con!ested. 3ince each node is
in fact, still $ery ali$e, HACMP #ould conclude the other nodes are
do#n and initiate a ta%eo$er. After
ta%eo$er has occurred the application.s/ potentially could e
runnin! simultaneously on oth nodes. If the
shared dis%s are also online to oth nodes, then the result could
lead to data di$er!ence .massi$e data corruption/.
'his is a situation #hich must e a$oided at all costs.
'he most con$enient #ay of confi!urin! non?IP net#or%s is to use
Ais% Hearteatin! as it remo$es the
prolems of distance #ith rs)D) serial net#or%s. Ais% hearteat
net#or%s only reFuire a small dis% or
"(,. Be careful not to put application data on these dis%s. Althou!h,
it is possile to do so, you don't #ant
any conflict #ith the dis% hearteat mechanismI
A!a#ters
As stated ao$e, each net#or% defined to HACMP should ha$e at
least t#o adapters per node. 7hile it is
possile to uild a cluster #ith fe#er, the reaction to adapter
failures is more se$ere0 the resource !roup
must e mo$ed to another node. AI- pro$ides support for
>therchannel, a facility that can used to a!!re!ate
adapters .increase and#idth/ and pro$ide net#or% resilience.
>therchannel is particularly useful for
fast responses to adapter 2 s#itch failures. 'his must e set up #ith
some care in an HACMP cluster.
7hen done properly, this pro$ides the hi!hest le$el of a$ailaility
a!ainst adapter failure. Befer to the IBM
techdocs #esite0 http022###?
*D.im.com2support2techdocs2atsmastr.nsf27eIndex2'A1*1JG5
for further
details.
Many 3ystem p 'M ser$ers contain uilt?in >thernet adapters. If the
nodes are physically close to!ether, it
is possile to use the uilt?in >thernet adapters on t#o nodes and a
Ecross?o$erE >thernet cale .sometimes
referred to as a Edata transferE cale/ to uild an inexpensi$e
>thernet net#or% et#een t#o nodes for
heart eatin!. ,ote that this is not a sustitute for a non?IP
net#or%.
3ome adapters pro$ide multiple ports. 4ne port on such an adapter
should not e used to ac% up another
port on that adapter, since the adapter card itself is a common point
of failure. 'he same thin! is true
of the uilt?in >thernet adapters in most 3ystem p ser$ers and
currently a$ailale lades0 the ports ha$e a
common adapter. 7hen the uilt?in >thernet adapter can e used,
est practice is to pro$ide an additional
adapter in the node, #ith the t#o ac%in! up each other.
Be a#are of net#or% detection settin!s for the cluster and consider
tunin! these $alues. In HACMP terms,
these are referred to as ,IM $alues. 'here are four settin!s per
net#or% type #hich can e used 0 slo#,
normal, fast and custom. 7ith the default settin! of normal for a
standard >thernet net#or%, the net#or%
failure detection time #ould e approximately )* seconds. 7ith
todays s#itched net#or% technolo!y this
is a lar!e amount of time. By s#itchin! to a fast settin! the
detection time #ould e reduced y 5*H .1*
seconds/ #hich in most cases #ould e more acceptale. Be careful
ho#e$er, #hen usin! custom settin!s,
as settin! these $alues too lo# can cause false ta%eo$ers to occur.
'hese settin!s can e $ie#ed usin! a $ariety
of techniFues includin! 0 lssrc 6ls tops$cs command .from a node
#hich is acti$e/ or odm!et
HACMPnim K!rep 6p ether and smitty hacmp.
A##lications
'he most important part of ma%in! an application run #ell in an
HACMP cluster is understandin! the
application's reFuirements. 'his is particularly important #hen
desi!nin! the Besource Lroup policy eha$ior
and dependencies. 5or hi!h a$ailaility to e achie$ed, the
application must ha$e the aility to
stop and start cleanly and not explicitly prompt for interacti$e input.
3ome applications tend to ond to a
particular 43 characteristic such as a uname, serial numer or IP
address. In most situations, these prolems
can e o$ercome. 'he $ast ma:ority of commercial soft#are
products #hich run under AI- are #ell
suited to e clustered #ith HACMP.
A##lication &ata *ocation
7here should application inaries and confi!uration data reside9
'here are many ar!uments to this discussion.
Lenerally, %eep all the application inaries and data #ere possile
on the shared dis%, as it is easy
to for!et to update it on all cluster nodes #hen it chan!es. 'his can
pre$ent the application from startin! or
#or%in! correctly, #hen it is run on a ac%up node. Ho#e$er, the
correct ans#er is not fixed. Many application
$endors ha$e su!!estions on ho# to set up the applications in a
cluster, ut these are recommendations.
Must #hen it seems to e clear cut as to ho# to implement an
application, someone thin%s of a ne#
set of circumstances. Here are some rules of thum0
If the application is pac%a!ed in "PP format, it is usually installed on
the local file systems in root$!. 'his
eha$ior can e o$ercome, y ffcreateNin! the pac%a!es to dis%
and restorin! them #ith the pre$ie# option.
'his action #ill sho# the install paths, then symolic lin%s can e
created prior to install #hich point
to the shared stora!e area. If the application is to e used on
multiple nodes #ith different data or confi!uration,
then the application and confi!uration data #ould proaly e on
local dis%s and the data sets on
shared dis% #ith application scripts alterin! the confi!uration files
durin! fallo$er. Also, rememer the
HACMP 5ile Collections facility can e used to %eep the rele$ant
confi!uration files in sync across the cluster.
'his is particularly useful for applications #hich are installed locally.
Start+Sto# Scri#ts
Application start scripts should not assume the status of the
en$ironment. Intelli!ent pro!rammin! should
correct any irre!ular conditions that may occur. 'he cluster
mana!er spa#ns theses scripts off in a separate
:o in the ac%!round and carries on processin!. 3ome thin!s a
start script should do are0
5irst, chec% that the application is not currently runnin!I 'his is
especially crucial for $5.O users as
resource !roups can e placed into an unmana!ed state .forced
do#n action, in pre$ious $ersions/.
(sin! the default startup options, HACMP #ill rerun the application
start script #hich may cause
prolems if the application is actually runnin!. A simple and
effecti$e solution is to chec% the state
of the application on startup. If the application is found to e
runnin! :ust simply end the start script
#ith exit *.
&erify the en$ironment. Are all the dis%s, file systems, and IP laels
a$ailale9
If different commands are to e run on different nodes, store the
executin! H43',AM> to $ariale.
Chec% the state of the data. Aoes it reFuire reco$ery9 Al#ays
assume the data is in an un%no#n state
since the conditions that occurred to cause the ta%eo$er cannot e
assumed.
Are there prereFuisite ser$ices that must e runnin!9 Is it feasile
to start all prereFuisite ser$ices
from #ithin the start script9 Is there an inter?resource !roup
dependency or resource !roup seFuencin!
that can !uarantee the pre$ious resource !roup has started
correctly9 HACMP $5.) and later has
facilities to implement chec%s on resource !roup dependencies
includin! collocation rules in
HACMP $5.D.
5inally, #hen the en$ironment loo%s ri!ht, start the application. If
the en$ironment is not correct and
error reco$ery procedures cannot fix the prolem, ensure there are
adeFuate alerts .email, 3M3,
3M'P traps etc/ sent out $ia the net#or% to the appropriate support
administrators.
3top scripts are different from start scripts in that most applications
ha$e a documented start?up routine
and not necessarily a stop routine. 'he assumption is once the
application is started #hy stop it9 Belyin!
on a failure of a node to stop an application #ill e effecti$e, ut to
use some of the more ad$anced features
of HACMP the reFuirement exists to stop an application cleanly.
3ome of the issues to a$oid are0
Be sure to terminate any child or spa#ned processes that may e
usin! the dis% resources. Consider
implementin! child resource !roups.
&erify that the application is stopped to the point that the file
system is free to e unmounted. 'he
fuser command may e used to $erify that the file system is free.
In some cases it may e necessary to doule chec% that the
application $endorNs stop script did actually
stop all the processes, and occasionally it may e necessary to
forcily terminate some processes.
Clearly the !oal is to return the machine to the state it #as in
efore the application start script #as run.
5ailure to exit the stop script #ith a 8ero return code as this #ill
stop cluster processin!. P ,ote0 'his is not the case #ith start
scriptsI
Bememer, most $endor stop2starts scripts are not desi!ned to e
cluster proofI A useful tip is to ha$e stop
and start script $erosely output usin! the same format to the
2tmp2hacmp.out file. 'his can e achie$ed
y includin! the follo#in! line in the header of the script0 set ?x @@
P3OQERS*TTP2UE'VR"I,>,4W
A%? Security Chec)list
A%? Security Chec)list
AI- >n$ironment Procedures
'he est #ay to approach this portion of the chec%list is to do a
comprehensi$e physical in$entory of the ser$ers. 3erial numers
and physical location #ould e sufficient.
XXXXBecord ser$er serial numers
XXXXPhysical location of the ser$ers
,ext #e #ant to !ather a rather comprehensi$e list of oth the AI-
and pseries in$entories. By runnin! these next O scripts #e can
!ather the information for analy8e.
XXXXBun these O scripts0 sysinfo, tcpch%, nfsc% and neth#ch%. .3ee
Appendix A for scripts/
XXXXsysinfo0
XXXXAetermine acti$e lo!ical $olume !roups on the ser$ers0 ls$! ?o
XXXX"ist physical $olumes in each $olume !roup0 ls$! 6p E$!nameE
XXXX"ist lo!ical $olumes for each $olume !roup0 ls$! 6l E$!nameE
XXXX"ist physical $olumes information for each hard dis%
XXXXlsp$ hdis%x
XXXXlsp$ 6p hdis%x
XXXXlsp$ 6l hdis%x
XXXX"ist ser$er soft#are in$entory0 lslpp ?"
XXXX"ist ser$er soft#are history0 lslpp 6h
XXXX"ist all hard#are attached to the ser$er0 lsde$ 6C K sort 6d
XXXX"ist system name, nodename, "A, net#or% numer, AI-
release, AI- $ersion and machine IA0 uname 6x
XXXX"ist all system resources on the ser$er0 lssrc 6a
XXXX"ist inetd ser$ices0 lssrc 6t 'ser$ice name' 6p 'process id'
XXXX"ist all host entries on the ser$ers0 hostent ?3
XXXX,ame all nameser$ers the ser$ers ha$e access to0 namersl$ 6
Is
XXXX3ho# status of all confi!ured interfaces on the ser$er0 netstat
6i
XXXX3ho# net#or% addresses and routin! tales0 netstat 6nr
XXXX3ho# interface settin!s0 ifconfi!
XXXXChec% user and !roup system $ariales
XXXXChec% users0 usrc% 6t A""
XXXXChec% !roups0 !rpc% 6t A""
XXXXBun tcc% to $erify if it is enaled0 tcc%
XXXX>xamine the AI- failed lo!ins0 #ho 6s 2etc2security2failedlo!in
XXXX>xamine the AI- user lo!0 #ho 2$ar2adm2#tmp
XXXX>xamine the processes from users lo!!ed into the ser$ers0
#ho 6p 2$ar2adm2#tmp
XXXX"ist all user attriutes0 lsuser A"" K sort 6d
XXXX"ist all !roup attriutes0 ls!roup A""
XXXXtcpch%0
XXXXConfirm the tcp susystem installed0 lslpp 6l K !rep os.net
XXXXAetermine if it is runnin!0 lssrc 6! tcpip
XXXX3earch for .rhosts and .netrc files0 find 2 ?name .rhosts ?print C
find 2 ?name .netrc 6print
XXXXChec%s for rsh functionality on host0 cat 2etc2hosts.eFui$
XXXXChec%s for remote printin! capaility0 cat 2etc2hosts.lpd K !rep
$ T
XXXXnfsch%0
XXXX&erify ,53 is installed0 lslpp ?" K in2!rep nfs
XXXXChec% ,532,I3 status0 lssrc ?! nfs K in2!rep acti$e
XXXXChec%s to see if it is an ,53 ser$er and #hat directories are
exported0 cat 2etc2xta
XXXX3ho# hosts that export ,53 directories0 sho#mount
XXXX3ho# #hat directories are exported0 sho#mount 6e
XXXXneth#ch%
XXXX3ho# net#or% interfaces that are connected0 lsde$ 6Cc if
XXXXAisplay acti$e connection on oot0 odm!et ?F $alueQup CuAt K
!rep nameKcut ?c1*?1)
XXX3ho# all interface status0 ifconfi! A""
$oot level access
XXXX"imit users #ho can su to another (IA0 lsuser 6f A""
XXXXAudit the sulo!0 cat 2$ar2adm2sulo!
XXXX&erify 2etc2profile does not include current directory
XXXX"oc% do#n cron access
XXXX'o allo# root only0 rm 6i 2$ar2adm2cron2cron.deny and rm 6
I 2$ar2adm2cron2cron.allo#
XXXX'o allo# all users0 touch cron.allo# .if file does not already
exist/
XXXX'o allo# a user access0 touch 2$ar2adm2cron2cron.allo# then
echo E(IAEZ2$ar2adm2cron2cron.allo#
XXXX'o deny a user access0 touch 2$ar2adm2cron2cron.deny then
echo E(IAEZ2$ar2adm2cron2cron.deny
XXXXAisale direct herald root access0 add rlo!inQfalse to root in
2etc2security2user file or throu!h smit
XXXX"imit the RPA'H $ariale in 2etc2en$ironment. (se the users
.profile instead.
Authori;ation+authentication a!ministration
XXXXBeport all pass#ord inconsistencies and not fix them0 p#dc% 6
n A""
XXXXBeport all pass#ord inconsistencies and fix them0 p#dc% 6y
A""
XXXXBeport all !roup inconsistencies and not fix them0 !rpc% 6n A""
XXXXBeport all !roup inconsistencies and fix them0 !rpc% 6y A""
XXXXBro#se the 2etc2shado#, etc2pass#ord and 2etc2!roup file
#ee%ly
S"%&+S9%&
XXXXBe$ie# all 3(IA23LIA pro!rams o#ned y root, daemon, and
in.
XXXXBe$ie# all 3>'(IA pro!rams0 find 2 ?perm ?1*** 6print
XXXXBe$ie# all 3>'LIA pro!rams0 find 2 ?perm ?)*** 6print
XXXXBe$ie# all stic%y it pro!rams0 find 2 ?perm ?D*** 6print
XXXX3et user .profile in 2etc2security2.profile
Permissions structures
XXXX3ystem directories should ha$e J55 permissions at a minimum
XXXXBoot system directories should e o#ned y root
XXXX(se the stic%y it on the 2tmp and 2usr2tmp directories.
XXXXBun chec%sum .md5/ a!ainst all 2in, 2usr2in, 2de$ and
2usr2sin files.
XXXXChec% de$ice file permissions0
XXXXdis%, stora!e, tape, net#or% .should e +**/ o#ned y root.
XXXXtty de$ices .should e +))/ o#ned y root.
XXXX2de$2null should e JJJ.
XXXX"ist all hidden files in there directories . the .files/.
XXXX"ist all #ritale directories .use the find command/.
XXXXRH4M> directories should e J1*
XXXXRH4M> .profile or .lo!in files should e +** or +O*.
XXXX"oo% for un?o#ned files on the ser$er0 find 2 ?nouser 6print.
,ote0 Ao not remo$e any 2de$ files.
XXXXAo not use r?type commands0 rsh, rlo!in, rcp and tftp or .netrc
or .rhosts files.
XXXXChan!e 2etc2host file permissions to ++* and re$ie# its
contents #ee%ly.
XXXXChec% for oth tcp2udp failed connections to the ser$ers0
netstat 6p tcpC netstat 6p udp.
XXXX&erify contents of 2etc2exports .,53 export file/.
XXXXIf usin! ftp, ma%e this chan!e to the 2etc2inetd.conf file to
enale lo!!in!.
ftp stream tcp+ no#ait root 2usr2sin2ftpd ftpd 6l
XXXX3et ,53 mounts to 6ro .read only/ and only to the hosts that
they are needed.
XXXXConsider usin! extended AC"'s .please re$ie# the tc man
pa!e/.
XXXXBefore ma%in! net#or% connection collect a full system file
listin! and store it off?line0
ls ?Ba ?laZ2tmp2allfiles.system
XXXXMa%e use of the strin!s command to chec% on files0 strin!s
2etc2hosts K !rep \ashmir
$ecommen!ations
Bemo$e unnecessary ser$ices
By default the (nix operatin! system !i$es us 1*)O ser$ices to
connect to, #e #ant to parse this do#n to a more mana!eale
$alue. 'here are ) files in particular that #e #ant to parse. 'he first
is the 2etc2ser$ices file itself. A !ood startin! point is to eliminate all
unneeded ser$ices and add ser$ices as you need them. Belo# is a
screenshot of an existin! ntp ser$er etc2ser$ices file on one of my
la ser$ers.
T
T ,et#or% ser$ices, Internet style
T
ssh ))2udp
ssh ))2tcp mail
auth 11D2tcp authentication
sftp 1152tcp
ntp 1)D2tcp T ,et#or% 'ime Protocol
ntp 1)D2udp T ,et#or% 'ime Protocol
T
T (,I- specific ser$ices
T
lo!in 51D2tcp
shell 51O2tcp cmd T no pass#ords used
Parse 2etc2rc.tcpip file
'his file starts the daemons that #e #ill e usin! for the tcp2ip
stac% on AI- ser$ers. By default the file #ill start the sendmail,
snmp and other daemons. 7e #ant to parse this to reflect #hat
functionality #e need this ser$er for. Here is the example for my
ntp ser$er.
T 3tart up the daemons
T
echo E3tartin! tcpip daemons0E
trap 'echo E5inished startin! tcpip daemons.E' *
T 3tart up syslo! daemon .for error and e$ent lo!!in!/
start 2usr2sin2syslo!d ERsrcXrunnin!E
T 3tart up Portmapper
start 2usr2sin2portmap ERsrcXrunnin!E
T 3tart up soc%et?ased daemons
start 2usr2sin2inetd ERsrcXrunnin!E
T 3tart up ,et#or% 'ime Protocol .,'P/ daemon
start 2usr2sin2xntpd ERsrcXrunnin!E
'his helps also to etter understand #hat processes are runnin! on
the ser$er.
Bemo$e unauthori8ed 2etc2initta entries
Be a#are of #hat is in the 2etc2initta file on the AI- ser$ers. 'his
file #or%s li%e the re!istry in a Microsoft en$ironment. If an intruder
#ants to hide an automated script, he #ould #ant it launched here
or in the cron file. Monitor this file closely.
Parse 2etc2inetd.conf file
'his is the AI- system file that starts system ser$ices, li%e telnet,
ftp, etc. 7e also #ant to closely #atch this file to see if there are
any ser$ices that ha$e een enaled #ithout authori8ation. If you
are usin! ssh for example this is #hat the inetd.con file should loo%
li%e. Because #e are usin! other internet connections, this file is not
used in my en$ironment and should not e of use to you. 'his is
#hy ssh should e used for all administrati$e connections into the
en$ironment. It pro$ides an encrypted tunnel so connection traffic
is secure. In the case of telnet, it is $ery tri$ial to sniff the (IA and
pass#ord.
TT protocol. EtcpE and EudpE are interpreted as IP$O.
TT
TT ser$ice soc%et protocol #ait2 user ser$er ser$er pro!ram
TT name type no#ait pro!ram ar!uments
TT
>dit 2etc2rc.net
'his is net#or% confi!uration file used y AI-. 'his is the file you
use to set your default net#or% route alon! your no .for net#or%
options/ attriutes. Because the ser$ers #ill not e used as routers
to for#ard traffic and #e do not #ant to use loose source routin! at
you, #e #ill e ma%in! a fe# chan!es in this file. A lot of them are
to protect from A43 and AA43 attac%s from the internet. Also
protects from AC\ and 3[, attac%s on the internal net#or%.
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
TTTTTTTTTTTTTTTTTTTTTTTT
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
TTTTTTTTTTTTTTTTTTTTTTTT
T Chan!es made on *+2*J2*) to ti!hten up soc%et states on this
T ser$er.
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
TTTTTTTTTTTTTTTTTTTTTTTT
if V ?f 2usr2sin2no W C then
2usr2sin2no ?o udpXpmtuXdisco$erQ* T stops autodisco$ery of
M'(
2usr2sin2no ?o tcpXpmtuXdisco$erQ* T on the net#or% interface
2usr2sin2no ?o cleanXpartialXconnsQ1 T clears incomplete D?#ay
conn.
2usr2sin2no ?o castpin!Q* T protects a!ainst smurf icmp attac%s
2usr2sin2no ?o directedXroadcastQ* T stops pac%ets to roadcast
add.
2usr2sin2no ?o ipi!noreredirectsQ1 T pre$ents loose
2usr2sin2no ?o ipsendredirectsQ* T source routin!
2usr2sin2no ?o ipsrcrouterec$Q* T attac%s on
2usr2sin2no ?o ipsrcroutefor#ardQ* T our net#or%
2usr2sin2no ?o ip+srcroutefor#ardQ* T from usin! indirect
2usr2sin2no ?o icmpaddressmas%Q* T dynamic routes
2usr2sin2no ?o nonlocsrcrouteQ* T to attac% us from
2usr2sin2no ?o ipfor#ardin!Q* T 3tops ser$er from actin! li%e a
router
fi
Securing root
Chan!e the 2etc2motd anner
'his computer system is the pri$ate property of -[^ Insurance. It is
for authori8ed use only. All users .authori8ed or non?authori8ed/
ha$e no explicit or implicit expectations of pri$acy.
Any or all users of this system and all the files on this system may
e intercepted, monitored, recorded, copied, audited, inspected and
disclosed to -[^ Insurance's mana!ement personnel.
By usin! this system, the end user consents to such interception,
monitorin!, recordin!, copyin!, auditin!, inspection and disclosure
at the discretion of such personnel. (nauthori8ed or improper use of
this system may result in ci$il and2or criminal penalities and
administrati$e or disciplinary action, as deemed appropriate y said
actions. By continuin! to use this system, the indi$idual indicates
his2her a#areness of and consent to these terms and conditions of
use.
"4L 455 IMM>AIA'>"[ if you do not a!ree to the pro$isions stated
in this #arnin! anner.
Modify 2etc2security2user
root0
lo!inretries Q 5 6 failed retries until account loc%s
rlo!in Q false 6 Aisales remote herald access to a root shell. ,eed
to su from another (IA.
adm!roups Q system
mina!e Q * 6 minimum a!in! is no time $alue
maxa!e Q O 6 maximum a!in! is set to D* days or O #ee%s
umas% Q ))
'i!hten up 2etc2security2limits
'his is an attriute that should e chan!ed due to a runa#ay
resource ho!. 'his orphaned process can !ro# to use
an exorinate amount of dis% space. 'o pro$ent this #e can set the
ulimit $alue here.
default0
Tfsi8e Q )*YJ151
fsi8e Q GDGG+*O 6 sets the soft file loc% si8e to a max of G Li!.
&ariale chan!es in 2etc2profile
3et the R'M4(' $ariale in 2etc2profile. 'his #ill cause an open
shell to close after 15 minutes of inacti$ity. It #or%s in con:unction
#ith the screensa$er, to pre$ent an open session to e used to
either delete the ser$er or #orse corrupt data on the ser$er.
T Automatic lo!out, include in export line if uncommented
'M4('QY**
O.+.5 3udo is your friend<.
'his is a nice piece of code that the system administrators can use
in order to allo# Eroot?li%eE functionality. It allo#s a non?root user
to run system inaries or commands. 'he 2etc2sudoers file is used
to confi!ure exactly #hat the user can do. 'he ser$ice is confi!ured
and runnin! on ufxcpide$. 'he de$elopers are runnin! a script
called chan!eperms in order to ta! there .ear files #ith there o#n
o#nership attriutes.
5irst #e setup sudo to allo# root?li%e or superuser doer access to
sxnair.
T sudoers file.
T
T 'his file M(3' e edited #ith the '$isudo' command as root.
T
T 3ee the sudoers man pa!e for the details on ho# to #rite a
sudoers file.
T
T Host alias specification
T (ser alias specification
T Cmnd alias specification
T (ser pri$ile!e specification
root A""Q.A""/ A""
sxnair,:lade,$naidu ufxcpide$Q2in2cho#n P
2usr27e3phere2App3er$er2installedApps2P
T
T
T 4$erride the uilt in default settin!s
Aefaults syslo!Qauth
Aefaults lo!fileQ2$ar2lo!2sudo.lo!
5or more details, please see the -[^ Company Insurance 7or%
Beport that I compiled, or $isit this
(B"0 http022###.courtesan.com2sudo2.
'i!hten user2!roup attriutes
Chan!e 2etc2security2user
'hese are some of the chan!es to the 2etc2security2user file that
#ill promote a more hei!htened
confi!uration of default user attriutes at your company.
default0
umas% Q *JJ 6 defines umas% $alues 6 )) is readale only for that
(IA
p#d#arntime Q J 6 days of pass#ord expiration #arnin!s
lo!inretries Q 5 6 failed lo!in attempts efore account is loc%ed
histexpire Q 5) 6 defines ho# lon! a pass#ord cannot e re?used
histsi8e Q )* 6 defines ho# many pre$ious pass#ords the system
rememers
mina!e Q ) 6 minimum numer of #ee%s a pass#ord is $alid
maxa!e Q G 6 maximum numer of #ee%s a pass#ord is $alid
maxexpired Q O 6 maximum time in #ee%s a pass#ord can e
chan!ed after it exp
HACMP log files
2usr2sin2cluster2etc2rhosts ??? to accept incomin! communication
from clcomd>3 .cluster communucation enahanced security/
2usr2es2sin2cluster2etc2rhosts
,ote0 If there is an unresol$ale lael in the
2usr2es2sin2cluster2etc2rhosts file,
then all clcomd>3 connections from remote nodes #ill e denied.
cluster mana!er clstrmgrES
cluster loc% Aaemon .cloc)!ES/
cluster multi peer extension communication daemon .clsmu-#!ES/
'he clcom!ES is used for cluster confi!uration operations such as
cluster synchronisation
cluster mana!ement .C-SPoC/ P Aynamic re?confi!uration &A$E
confi!uration. .AAB> / operation.
5or clcom!ES there should e atleast )* MB free space in 2$ar file
system.
+var+hacm#+clcom!+clcom!'log ??it reFuires ) MB
+var+hacm#+clcom!+clcom!iag'log ??it reFuires 1GMB
Additional 1 MB reFuired for
+var+hacm#+o!mcache directory
clverfify'log also present in 2$ar directory
+var+hacm#+clverify+current++. contains lo! for mcurrent
execution of cl$erify
+var+hacm#+clverify+#ass++. contains lo!s from the last
passed $erification
+var+hacm#+clverify+#ass'#rev++. contains lo! from the
second last passed $erification

You might also like