You are on page 1of 67

Thanks to: Kunal L, Rakesh S,

Bjrn Rodn (roden@ae.ibm.com) Rajeev N, Steve D,


http://www.ibm.com/systems/services/labservices/ Paul M, Bernd, B,
http://www.ibm.com/systems/power/support/powercare/ Gary C, Dino Q, et al

High Availability with PowerHA and GPFS

Copyright IBM Corporation 2015


Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM.
9.0
Session Objectives
This session discuss considerations for designing and deploying high and
continuously available IT systems.
With
PowerHA SystemMirror Standard Edition from 7.1.3
General Parallel File System (GPFS) from 3.5

objective You will learn how


to approach high
availability and
recovery planning
and deploying

Copyright IBM Corporation 2015 2


Business challenges & needs
Information management for business processes needs to
Ensure appropriate level of service
Manage risks (mitigate, ignore, transfer)
Reduce cost (CAPEX/OPEX)

93%
of companies that
40% lost their data center
for 10 days or more
of companies due to a disaster filed
that suffer a for bankruptcy within
massive data loss one year of the
will never reopen 1 disaster2

Reference: (1) Disaster Recovery Plans and Systems Are Essential, Gartner Group, 2001
Reference: (2) US National Archives and Records Administration

Copyright IBM Corporation 2015 3


Business Continuity in IT perspective

Ability to adapt and respond to risks as well as


Business
opportunities in order to maintain continuous
Continuity
business operations

The attribute of a system to provide service during


High
defined periods, at acceptable or agreed upon
Availability
levels and masks unplanned outages

Disaster Capability to recover a data center at a different


Recovery site if the primary site becomes inoperable

Continuous The attribute of a system to continuously operate


Operations and mask planned outages

Copyright IBM Corporation 2015 4


What protection is the solution expected to provide?

Global Distance
Recovery
Metro Distance
Compliance Recovery
High
Data Loss Availability
Local Disaster Regional Disaster
or Single System Human error
Electric grid failure
Failure Electric grid failure
Corruption HAVC or power failures
Floods
Human error Hurricanes
Burst water pipe
Software error Earthquakes
Building fire
Component failures Tornados
Architectural failures
Single system failures Tsunamis
Gas explosion
Warfighting
Terrorist attack

Copyright IBM Corporation 2015 5


IT Availability Life cycle

DESIGN > BUILD > OPERATE > REPLACE

architecture, solution design,


deployment, governance, system

A lot to analyze, plan, do and check


maintenance and change
management, skill building, migration
and decommissioning

Copyright IBM Corporation 2015 6


Key IT Availability Metrics

Copyright IBM Corporation 2014


What are your key Availability Requirements?

Recovery Time Objective (RTO)


How long time can you afford to be without your systems?

Recovery Point Objective (RPO)


How much data can you afford to recreate or lose?

Maximum Time To Restart/Recover (MTTR)


How long time until services are restored for the users?

Degree of Availability (Coverage Requirement)


Annual percentage of a given time period when the business
service should be available?

Copyright IBM Corporation 2015 8


Notes on Degree of Availability
IT service availability can be measured in percentage of a given time period when the
business service is available for its intended purpose
Usually expressed with a number of nines (9) over a year (rounded):
99% => 88 hours/year
99.9% => 9 hours/year
99.95% => 4 1/2 hours/year
99.99% => 52 min/year
99.999% => 5 min/year
99.9999% => min/year

IT system vs. IT service (ripple effect)


e.g. IT service dependent on five IT systems, if all target levels are met but not at the same time:
PROBABILITY((99.9*99.9*99.5*99.5*99.0)/1005) => 97.82% or 191-192h/period
MINIMUM(99.9*99.9*99.5*99.5*99.0) => 99.00% or 88h/period

Determine the time period for the degree of availability


Are time for planned maintenance excluded during the year?
Such as planned service windows and/or fixed number of days per month/quarter
How many hours are used per year
Calendar year hours
8760 h for 365 days non-leap years
8784 h for 366 days leap years
Decided amount of time per year (global coverage with 24 time zones, add one day)
365 days (non-leap), then if global coverage add 24h d/y=366 or 8784 h
366 days (leap), then if global coverage add 24h d/y=367 or 8808 h

Copyright IBM Corporation 2015 9


Common Availability and Disaster requirements

High Availability Disaster Tolerance


RPO zero (or near zero) data loss RPO near zero data loss (may require manual
RTO measured in minutes at the most recovery of orphaned data)
NRO zero RTO/NRO measured in hours, days, weeks
PRO zero from UPS & generator PRO depend on generator fuel storage
Coverage Requirement (e.g. 24x7 / 24x365) Maximum Tolerable Period of Degraded Operations
Degree of Availability (e.g. 99.9% or ~9h/year) Maximum Time To Restart/Recover (MTTR)
No single point-of-failure (SPOF) System level Business Process Recovery Objective (BPRO)
Geographic affinity (Metro distance) No single point-of-failure (SPOF) DC level
Automatic failover/continuance/recovery to Geographic dispersion (Global distance)
redundant components including application Declaring disaster is a management decision
components up to in-flight transaction integrity Rotating site swap or periodic site swap
Full or Partial swap
Timeline
Minimum Service
Checkpoint System
Outage Service Delivery
in Time repair Delivery at 100%

RPO New Business

RTO

Your Recovery Objectives - Example


PRO Power Recovery Objective
NRO Network Recovery Objective
DOT Degraded Operations Tolerance

Copyright IBM Corporation 2015 10


Identify Points of Failure

Copyright IBM Corporation 2014


Redundancy and Single Points of Failure (SPOF)

ISP (external)
Find the
Bjrn Rodn

Enterprise environment

Your major goal throughout the


Site environment FW/IPS
SPOF
planning
Data Centre environmentprocess is to eliminate single
Routers
points
Server ofServer
failure Server
and verify redundancy.
Storage Storage

MA
A single point of failure exists when a
Switches
N
WA critical Service function is provided by a
Application
N
SAN
Network
single component. Middleware

Operating System &


Servers
System Software
UPS
Gen
If thatLocal
component
Area Network fails, the Service has no Kernel stack
Logical/Virtual Machine
Storage
Storage Area Network
.
other way of providing that function, and
Physical Machine
the application or service dependent on
Network SwitchesStorage
that component becomes unavailable. Hypervisor

http://publib.boulder.ibm.com/infocenter/aix/v6r1/topic/com.ibm.aix.powerha.plangd/ha_plan_over_ppg.htm
Hardware (cores, cache, nest)
Storage

Copyright IBM Corporation 2015 12


Cluster partitioning, aka node isolation or split brain 1/2
Cluster partitioning, aka node isolation or split brain, is a failure situation where
more than one server acts as a primary.
Partitioning occurs when a cluster node stops receiving all interconnecting heartbeat traffic from its
peer-node, and assumes that the peer-node has failed.
Due to the lack of synchronization, a split brain situation is problematic and can cause undesirable behaviour,
such as data corruption.
Once the peer-node is determined to be down due to lack of heartbeats, both nodes on each side of the
cluster attempt to take over resources (if so configured) from a node that is actually still active and running.
When the interconnection is restored and hearbeats resume, the cluster will merge and at this point,
the cluster manager identify that a partitioning has occurred, and the cluster node with the highest node
number will stop itself immediately.
During partitioning, if both nodes have acquired its respective peer-nodes resource groups and have had
applications running with users connected and updating data for the same application on both nodes
separately, data integrity is lost.

Copyright IBM Corporation 2015 14


Cluster partitioning, aka node isolation or split brain 2/2
Common approaches regarding cluster partitioning:
Maximize independent interconnects between sites
Use multiple IP and non-IP interconnects for cluster node heartbeats, with all physical links provided separately, and
well isolated from failure at the same time, such as:
Dual IP-networks (LAN), each over separate physical adapters and network switches, and interconnection between
cluster node sites.
Dual non-IP-networks (SAN), each over separate physical adapters and network switches, and interconnection
between cluster node sites.
Consider using a third network interconnect for heartbeat only between nodes, such as if primary interconnections
between nodes/sites use DWDM, use a non-landbased or VPN over ISP connection.

Use third site as tie breaker


Using Tie-Breaker concept, where a third site disk or node is used to determine surviving partition.
Optimally also use separate physical interconnect from each cluster node site to the third site.
For PowerHA refer to:
http://www-01.ibm.com/support/knowledgecenter/SSPHQG_7.1.0/com.ibm.powerha.admngd/ha_admin_mergesplit_policy_713.htm
For GPFS refer to:
http://www-03.ibm.com/systems/resources/configure-gpfs-for-reliability.pdf

Classify node-failure as site-down event and/or start secondary by operator


Active site declares itself down and expect that secondary site will take over the failed services, secondary site takes
over services if communication is lost to previous active site.
Active site declares itself down, and secondary site is started by operator.

Accept as-is
Decide that the risk for partitioning occurring is unlikely, the cost for redundancy is too high, and accepting longer
downtime relying on backup restore in case of data inconsistency.

NOTE: External access to cluster nodes can still be available, even if site interconnects fail between the cluster nodes.
Copyright IBM Corporation 2015 15
PowerHA SystemMirror
PowerHA SystemMirror Edition basics
PowerHA SystemMirror for AIX Standard Edition
Cluster management for the data center
Monitors, detects and reacts to events
Multiple channels heartbeat between the systems
> Network
> SAN
> Central Repository
Enables automatic switch-over
SAN shared storage clustering
Smart Assists
HA agent Support Discover, Configure, and Manage
Resource Group Management Advanced Relationships
Support for Custom Resource Management
Out of the box support for DB2, WebSphere, Oracle, SAP, TSM, LDAP, IBM HTTP, etc
PowerHA SystemMirror for AIX Enterprise Edition
Cluster management for the Enterprise (Disaster Tolerance)
Multi-site cluster management
Automated or manual confirmation of swap-over
Third site tie-breaker support
Separate storage synchronization
Metro Mirror, Global Mirror, GLVM, HyperSwap with DS8800 (<100KM)

Copyright IBM Corporation 2015 17


PowerHA SystemMirror support
PowerHA 6.1 End of Support (EOS): 30-Apr-2015 (extended from 30-Sep-2014)
End of Support (EOS) is the last date on which IBM will deliver standard support services for a
given version/release of a product.
Any further service support extension, you will find it on this website:
http://www-01.ibm.com/software/support/aix/lifecycle/index.html

R=Rolling Upgrade
S=Snapshot Upgrade
O=Offline Upgrade

Copyright IBM Corporation 2015 18


Some key changes in PowerHA 7.1 vs. 6.1
Architectural changes from PowerHA 6.1 (CAA/RSCT, Heatbeating, RG)
PowerHA 7.1 is built on Cluster Aware AIX (CAA) functionality which provide fundamental clustering
capabilities in the base operating system. PowerHA 6.1.0 use Reliable Scalable Clustering Technology
(RSCT) for clustering framework.
PowerHA 7.1.3 require AIX 6.1 TL9 SP1 or AIX 7.1 TL3 SP1
http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/TD101347
Cluster Aware AIX (CAA) manage the heartbeats in 7.1, not RSCT as in 6.1
CAA use a Repository Disk to store configuration information persistent and must be shared by all cluster
nodes.
Event management with AHAFS in 7.1
Event management is handled by using AIX pseudo file-system architecture Autonomic Health Advisor File
System (AHAFS), not cluster manager and RSCT
IP multicast or unicast TCP with gossip protocol in 7.1, not unicast UDP IP as in 6.1
With 7.1.3 unicast (TCP) is the default option in addition to multicast
Non-IP networks diskhb, mndhb, rs232 etc removed from 7.1
No IPAT via Replacement (HW Address Takeover / HWAT) in 7.1
Some restrictions on changing hostname in 7.1
Communication Path to a node can be set from 7.1.2 (IP address mapping to hostname)
Eased further in 7.1.3 (capability to dynamically modify the host name of a clustered node)
Smart Assist technology improved and extended for 7.1.3

Copyright IBM Corporation 2015 19


Basic implementation flow PowerHA 7.1
1. Plan for network, storage, and application
Eliminate single points of failure.
2. Define, prepare and configure the infrastructure
Application planning, and start and stop scripts
Networks (IP interfaces, /etc/hosts, non-IP devices)
Storage (adapters, LVM volume group, filesystem)
3. Install the PowerHA filesets
4. Configure the PowerHA environment:
Topology:
Cluster, node names, PowerHA IP networks, Repository Disk and SFWcomm
Multicast or unicast network for heartbeat
Cluster Aware AIX (CAA) cluster
Resources, resource groups, attributes:
Resources: Application server, service label, volume group
Resource group: Identify name, nodes, policies
Add attributes: Application server, service label, VG, filesystem.
5. Synchronize, save configuration (snapshot)
6. Start/stop cluster services
7. Verify, test configuration

Copyright IBM Corporation 2015 20


Building a dual node PowerHA cluster
1. Baseline each cluster node (software levels & configuration files)
2. Check all disk devices has reservation_policy set to no_reserve (NPIV on LPAR, VSCSI on VIOS):
lsdev -Cc disk -Fname|xargs -I lspv -Pl {} -a reservation_policy
3. Correlate disks and paths between cluster nodes using PVID, UUID, UDID:
lspv u AND lsmpio (eqiuv)
4. Create a cluster (clmgr add cluster)
5. Add service IP (clmgr add service_ip)
6. Define application controller (clmgr add application_controller)
7. Create resource group (clmgr add rg)
8. Verify and synchronize cluster (clmgr sync cluster)
9. Start cluster
clmgr command:
http://www-01.ibm.com/support/knowledgecenter/SSPHQG_7.1.3/com.ibm.powerha.admngd/clmgr_cmd.htm

root@stglbs1:/: clmgr add cluster CL1 repository=hdisk99,hdisk98 nodes=CL1N1,CL1N2


root@stglbs1:/: clmgr add service_ip CL1VIP network=net_ether_01
root@stglbs1:/: clmgr add application_controller AC1 \
startscript="/ha/start.sh" stopscript="/ha/stop.sh
root@stglbs1:/: clmgr add rg RG1 nodes=CL1N1,CL1N2 startup=ohn fallback=nfb \
service_label=CL1VIP volume_group=cl1vg1 application=AC1
root@stglbs1:/: clmgr sync cluster
root@stglbs1:/: clmgr start cluster
root@stglbs1:/: clmgr query cluster

Copyright IBM Corporation 2015 22


PowerHA 7.1 with dual node single/dual site

Baseline
Ordinary run-of-the-mill dual node cluster
Using Mirror Pools for LVM mirroring
Single Virtual Ethernet adapter per node backed by
the same VIOS SEA LAGG
Set "Communication Path to Node" to the cluster
HA1 HA2
PowerHA LPAR LPAR nodes hostname network interface (using IP-address
cluster and symbolic hostname from /etc/hosts)
netmon.cf configured for ping outside the box from
partition (cluster file)
/usr/es/sbin/cluster/netmon.cf
rhosts configured cluster nodes (cluster file)
/etc/cluster/rhosts
netsvc.conf configured with DNS (system file)
/etc/netsvc.conf
Single or dual SAN Fabric
If dual sites, within a few km distance for minimal
LVM latency and throughput degradation
Mirror
Single LAN with ISL
If dual sites, use VLAN spanning
Single or Dual
Enterprise
Storage If the cluster node (partition) have multiple Virtual Ethernet
adapters, set the "Communication Path to Node" to the IP
address and Virtual Ethernet network interface device
which maps to the hostname.

Copyright IBM Corporation 2015 23


PowerHA 7.1 with dual node single/dual site

Multicast between nodes


Multicast is optional from 7.1.3
Default with 7.1.3 is TCP unicast
If desired, verify multicast is working between nodes
before creating the 7.1 cluster
Multicast IP can be set manually, or CAA will assign
HA1 HA2 one based on the nodes lower 24-bit IP address after
PowerHA LPAR LPAR upper 8-bit multicast of 228, such as: 192.1.2.3 =>
cluster 228.1.2.3
Check assigned multicast IP:
lscluster -i | grep -i multi
Test with the mping command:
Start receiver first
mping -r -c 100
Start sender
mping -s -c 100
Use the -a <multicastip> flag to set the
LVM multicast address to be used by mping
Mirror
Customer network teams seem to usually prefer to use
unicast TCP for IP heartbeating instead of multicast.
Single or Dual Multi-homed nodes can set the network to private with
Enterprise CAA and it will not be used for heartbeating:
Storage clmgr modify network <network> PUBLIC=private
lscluster -i

http://www-01.ibm.com/support/knowledgecenter/SSPHQG_7.1.0/com.ibm.powerha.trgd/ha_trgd_test_multicast.htm
http://www-01.ibm.com/support/knowledgecenter/SSPHQG_7.1.0/com.ibm.powerha.admngd/clmgr_cmd.htm
Copyright IBM Corporation 2015 24
PowerHA 7.1 with dual node single/dual site

Repository Disk
The cluster repository disk is used as the central
repository for the cluster configuration data.
When CAA is configured with repos_loss mode set to assert
and CAA loses access to the repository disk, the system
automatically shuts down.
Access from all nodes and paths.
HA1 HA2
PowerHA LPAR LPAR Start with ~10GB for up to 32 nodes (min=512MB,
cluster
max=460 GB, thin provisioning is supported).
Direct access by CAA only, raw disk I/O.
Define a spare for the repos disk.
Verify the disk reserve attribute is set to no_reserve
Do not manually write to the repos disk !
Check repos disk status
/usr/es/sbin/cluster/utilities/clmgr
query repository
/usr/lib/cluster/clras lsrepos
LVM /usr/lib/cluster/clras dumprepos
Mirror /usr/lib/cluster/clras dumprepos -r
repo
<reposdisk>
/usr/lib/cluster/clras dpcomm_status
Single or Dual
Enterprise
If IP heartbeating fails, cluster nodes will keep alive if the
Storage
repository disk is accessible from all nodes.
http://www-01.ibm.com/support/knowledgecenter/ssw_aix_71/com.ibm.aix.clusteraware/claware_repository.htm
https://www.ibm.com/developerworks/community/blogs/6eaa2884-e28a-4e0a-a158-
7931abe2da4f/entry/powerha_caa_repository_disk_management
Copyright IBM Corporation 2015 25
PowerHA 7.1 with dual node single/dual site

Storage Framework
Fibre Channel adapters with target mode support
only
On fcsX tme=yes
On fscsiX dyntrk=yes & fc_err_recov=fast_fail
Enable the new settings (reboot)
HA1 HA2 All physical FC adapters WWPNs zoned
PowerHA LPAR LPAR One Fabric supported with SFWcomm
cluster
For dual Fabric, it is supposed to work, if it do not
TM-ZONE work with your implementation and system software
levels, please open a PMR with IBM Support
LPM do not migrate SFWcomm configuration
It is recommended that SAN communication be
reconfigured after LPM is performed
Using datalink layer communication over VLAN
between AIX cluster node and VIOS with the
physical FC adapters
LVM Check SFWcomm status
Mirror lscluster -i
sfwinfo -a
clras sancomm_status
Single or Dual
Enterprise If the IP hearbeat and repository disk are not sufficient to
Storage meet heartbeat requirements, also enable SFWcomm.

http://www-01.ibm.com/support/knowledgecenter/ssw_aix_71/com.ibm.aix.clusteraware/claware_comm_setup.htm
http://www-01.ibm.com/support/knowledgecenter/SSPHQG_7.1.0/com.ibm.powerha.concepts/ha_concepts_ex_san.htm
Copyright IBM Corporation 2015 26
PowerHA IP heartbeating over VIOS SEA

Network heartbeating is used as a reliable means of monitoring an adapter's state over


a long period of time.
When heartbeating is broken, a decision has to be made as to whether the local adapter has gone bad, or
the neighbor (or something between them) has a problem.
The local node only needs to take action if the local adapter is the problem; if its own adapter is good,
then we assume it is still reachable by other clients regardless of the neighbor's state (the neighbor is
responsible for acting on its local adapters failures).
This decision (local vs remote bad) is made based on whether any network traffic can be seen on
the local adapter, using the inbound byte count of the interface.
Where Virtual Ethernet is involved, this test becomes unreliable since there is no way to distinguish
whether inbound traffic came in from the VIO server's connection to the outside world, or just from a
neighbouring VIO client (This is a design point of VIO that its virtual adapters be indistinguishable to the
LPAR from a real adapter).
Configure netmon.cf for Virtual Ethernet and single adapter PowerHA cluster node network adapters.

Consierations regarding multiple IP heartbeat networks over virtual Ethernet


1. For dual node single site clusters, one IP network is ordinarily sufficient, if backed by dual VIOS SEA as per PowerVM
Virtualization Best Practice the base/boot IP and service IPs can be on the same routable subnet.
2. Using additional Virtual Ethernet over same VIOS Shared Ethernet Adapter do not improve redundancy
3. Using additional Virtual Ethernet over different Shared Ethernet Adapter on same VIOS do not improve redundancy
4. Using additional Virtual Ethernet over different Shared Ethernet Adapter on different VIOS might improve redundancy
(also depending on network switch and routing layer)
With separate hypervisor virtual switches for each
With partition link aggregation in network backup interface mode
Can use a single PowerHA/CAA IP heartbeat network also for SRIOV/dedicated adapters
With dual SRIOV ports from separate Ethernet adapters assigned to each cluster node (in separate servers), each
port connected to a separate network switch and configured with link aggregation in Network Interface Backup mode.
Copyright IBM Corporation 2015 27
Cluster Topology Configuration netmon.cf facility

Without this feature network link and !REQD <owner> <target>

network switch failure will not be Parameters:


----------
properly detected by the cluster node. !REQD : An explicit string; it *must* be at the
beginning of the line (no leading spaces).

For single adapter PowerHA cluster node network <owner> : The interface this line is intended to be
adapters, use the netmon.cf configuration file: used by; that is, the code monitoring the
adapter specified here will determine its
/usr/es/sbin/cluster/netmon.cf own up/down status by whether it can ping
any of the targets (below) specified in
When netmon needs to stimulate the network to these lines.
The owner can be specified as a hostname, IP
ensure adapter function, it sends ICMP ECHO address, or interface name. In the case of
requests to each IP address. hostname or IP address, it *must* refer to
the boot name/IP (no service aliases).
After sending the request to every address, In the case of a hostname, it must be
resolvable to an IP address or the line will
netmon checks the inbound packet count before be ignored.
determining whether an adapter has failed or not. The string "!ALL" will specify all adapters.

Specify remote hosts that are not in the cluster <target> : The IP address or hostname you want the
configuration and that can be accessed from owner to try to ping.
As with normal netmon.cf entries, a hostname
PowerHA interfaces, and who reply consistently to target must be resolvable to an IP address
ICMP ECHO without delay, such as default in order to be usable.

gateways and equiv.


Up to 32 different targets can be provided for each
interface, if *any* given target is pingable, the
adapter will be considered up (ICMP ECHO).

http://www-01.ibm.com/support/docview.wss?uid=isg1IZ01332
Copyright IBM Corporation 2015 28
Basic PowerHA cluster functionality verification

Verify PowerHA cluster functionality


After system functionality verification (file systems, users, network, backup, etc)
Before or after cluster application server verification (start/stop/monitor integration hardening)
Before end-to-end application resiliency verification (environment/enterprise wide failure scenarios)

Procedure Actions --- EXAMPLES Excepted outcome Actual outcome


Reboot both NODE1 & NODE2 and restart HA on both
RG stop on NODE1 w/RG on NODE1 clRGmove -d
RG start on NODE1 clRGmove -u
RG stop on NODE1 w/RG on NODE1 clRGmove -d
RG start on NODE2 clRGmove -u
RG stop on NODE2 w/RG on NODE2 clRGmove -d
RG move from NODE2 to NODE1 w/RG on NODE2 clRGmove -m
RG move from NODE1 to NODE2 w/RG on NODE1 clRGmove -m
IP Failure test NODE1 ifconfig en# down w/ RG on NODE1
Reintegrate NODE1 ifconfig en# up on NODE1
IP Failure test NODE2 ifconfig en# down w/ RG on NODE2
Reintegrate NODE2 ifconfig en# up on NODE2
IP Failure test NODE1&NODE2 ifconfig en# down on NODE1 & NODE2
Reintegrate NODE1 & NODE2 ifconfig en# down on NODE1 & NODE2
Stop of PowerHA on NODE1 w/ migration to NODE2 cl_clstop
Re-start PowerHA on NODE1 to reintegrate
Stop of PowerHA on NODE2 w/ migration to NODE1 cl_clstop
Re-start PowerHA on NODE2 to reintegrate
SAN Availability Test on NODE1 SAN Admin
Reintegrate NODE1 SAN SAN Admin
SAN Availability Test on NODE2 SAN Admin
Reintegrate NODE2 SAN SAN Admin
HMC Power Off of NODE1 w/ RG on NODE1 chsysstate
HMC Activate of NODE1 & Restart Power-HA on NODE1 chsysstate
HMC Power Off of NODE2 w/ RG on NODE2 chsysstate
HMC Activate of NODE2 & Re-start Power-HA on NODE2 chsysstate
Reboot both NODE1 & NODE2 and restart HA on both chsysstate

Copyright IBM Corporation 2015 30


GPFS
IBM General Parallel File System (GPFS)
The IBM General Parallel File System (GPFS) is a cluster file system.
GPFS provides concurrent access to a single file system or set of file systems from multiple nodes.
GPFS nodes can all be SAN attached or a mix of SAN and network attached.
This enables high performance access to this common set of data to support a scale-out solution
or provide a High Availability platform.

Number of nodes:
Up to 1530 (AIX)
Up to 9620 (Linux/x86)
File system Up to 64 (Windows)

Maximum file system size 299 bytes (architecture)


Current tested limit is ~18 PB file systems
Maximum file size equals file system size
264 files per file system (architecture)
Current tested limit is 9 giga files
2048 disks in a file system
256 file systems per cluster

Copyright IBM Corporation 2015 32


GPFS licensing and support
The GPFS Server license permits the licensed node to perform GPFS management functions such as
cluster configuration manager, quorum node, manager node, and Network Shared Disk (NSD) server.
In addition, the GPFS Server license permits the licensed node to share GPFS data directly through any
application, service protocol or method such as Network File System (NFS), Common Internet File System
(CIFS), File Transfer Protocol (FTP), or Hypertext Transfer Protocol (HTTP).
The GPFS Client license permits exchange of data between nodes that locally mount the same GPFS
file system.
No other export of the data is permitted.
The GPFS Client may not be used for nodes to share GPFS data directly through any application, service,
protocol or method, such as NFS, CIFS, FTP, or HTTP. For these functions, a GPFS Server license would be
required.

http://www-01.ibm.com/software/support/aix/lifecycle/index.html

http://www-01.ibm.com/support/knowledgecenter/SSFKCN/com.ibm.cluster.gpfs.doc/gpfs_faqs/gpfsclustersfaq.html

Copyright IBM Corporation 2015 33


GPFS Global Namespace and Network Shared Disk (NSD)
GPFS provides simultaneous file access from multiple nodes
Using a global namespace, shared file system access among GPFS cluster nodes
High recoverability and data availability through replication, ability to make changes to mounted file system
NSD is the name describing disks used by GPFS, and can be defined on various block level device types
Each node have a GPFS daemon
Performs all I/O operations and buffer management
Dynamic Discovery
During GPFS node initialization the GPFS daemon will attempt to read all local disks searching for NSD disks
If a NSD is discovered locally, then the local storage NSD path will be used
If the local path fails, then the network NSD server path will be used provided a primary and/or secondary server
have been defined for the NSD
If a local NSD path becomes available (again), then the local path will be used
GPFS use network interface for IP communication
Do not use hostname alias for the interconnect, multiple subnets can be used (4.1)
GPFS uses a sophisticated token management system
Providing data consistency while allowing multiple independent paths to the same file by the same name
from anywhere in the cluster.
Quorum
Is a way for cluster nodes to decide whether it is safe to continue I/O operations in the case of a
communication failure, and is used to prevent a cluster from becoming partitioned.
Node Quorum (majority or set by mmchconfig minQuorumNodes)
Node Quorum with Tiebreaker Disks (1 or 3)
GPFS manager functions NSD NSD NSD NSD NSD NSD NSD NSD
direct direct direct direct direct direct net net
One active cluster manager per cluster
One active file system manager per file system SAN SAN
One active metanode per open file (data integrity)
disk disk
(configuration manager) Copyright IBM Corporation 2015
disk disk disk
34
GPFS Direct Attached

Direct Attach App


GPFS Server
Direct Attached GPFS Servers

Disk

SAN

Storage
Typical 2-8 GPFS
Failure Failure
Server nodes for
Group #1 Group #2 commercial high
availability clusters

Copyright IBM Corporation 2015 35


Typical NSD Use Cases

Direct Network File Server Multiple level


Attached Attached file serving

Network Network
Protocol Protocol
Client Client

CIFS, NFS, ..
LAN Attach App Network

HTTP, FTP
LAN Attach protocol file
GPFS Server GPFS Server serving

NSD Client NSD Client


NSD

NSD
App
Direct Attach Direct Attach Direct Attach Network Direct Attach
GPFS Server GPFS Server GPFS Server protocol file GPFS Server
serving

Disk Disk Disk Disk

Copyright IBM Corporation 2015 36


Configure single site GPFS cluster and filesystem(s)
With all SAN disks directly accessible from 2-8 GPFS nodes:
1. Verify (on each node)
Firmware/microcode, device drivers, multi-path drivers, and other software levels and tunables as
recommended by the storage vendor
Consistent hostname resolution between nodes
Time synchronization between nodes (preferred)
SSH between the nodes without password prompt or /etc/motd display from ssh login (~/.hushlogin)
Ensure disks are in no_reserve, tuned and accessible from all GPFS nodes (read/write)
2. Install GPFS LPPs (on each node)
AIX: gpfs.base, gpfs.docs.data, gpfs.msg.en_US and prereqs
3. Create GPFS cluster (from one node)
a) Create the GPFS cluster (mmcrcluster)
b) Enable GPFS license for server nodes (mmchlicense)
c) Start the GPFS cluster (mmstartup)
d) Verify status of GPFS cluster (mmgetstate)
4. Create GPFS file system (from one node)
a) Format GPFS disks (NSDs) for file systems (mmcrnsd)
i. Stop GPFS cluster (mmshutdown)
ii. Update GPFS with quorum tiebreak NSDs, and optional tuning parameters (mmchconfig)
iii. Start GPFS cluster (mmstartup)
b) Create the GPFS file system (mmcrfs)
c) Mount the file system (mmmount)

The GPFS concept of Failure Groups, is synchronous


mirroring of up to three copies (from GPFS 3.5)

Copyright IBM Corporation 2015 37


Building a dual node GPFS cluster
1. Create the GPFS cluster (mmcrcluster)
http://www-01.ibm.com/support/knowledgecenter/SSFKCN_3.5.0/com.ibm.cluster.gpfs.v3r5.gpfs100.doc/bl1adm_mmcrcluster.htm

2. Enable GPFS license for the server node (mmchlicense)


http://www-01.ibm.com/support/knowledgecenter/SSFKCN_3.5.0/com.ibm.cluster.gpfs.v3r5.gpfs100.doc/bl1adm_mmchlicense.htm

3. Start the GPFS server on the node (mmstartup)


http://www-01.ibm.com/support/knowledgecenter/SSFKCN_3.5.0/com.ibm.cluster.gpfs.v3r5.gpfs100.doc/bl1adm_mmstartup.htm

4. Verify status of GPFS cluster (mmgetstate)


http://www-01.ibm.com/support/knowledgecenter/SSFKCN_3.5.0/com.ibm.cluster.gpfs.v3r5.gpfs100.doc/bl1adm_mmgetstate.htm

root@stglbs1:/: mmcrcluster -N stglbs1:manager-quorum -p stglbs1 -r /usr/bin/ssh -R /usr/bin/scp


Sat Nov 1 02:40:10 GST 2014: 6027-1664 mmcrcluster: Processing node stglbs1
mmcrcluster: Command successfully completed
mmcrcluster: 6027-1254 Warning: Not all nodes have proper GPFS license designations.
Use the mmchlicense command to designate licenses as needed.

root@stglbs1:/: mmchlicense server --accept -N stglbs1


The following nodes will be designated as possessing GPFS server licenses:
stglbs1
mmchlicense: Command successfully completed

root@stglbs1:/: mmstartup -a
Sat Nov 1 02:40:46 GST 2014: 6027-1642 mmstartup: Starting GPFS ...

root@stglbs1:/: mmgetstate -a

Node number Node name GPFS state


------------------------------------------
1 stglbs1 active

Copyright IBM Corporation 2015 38


Building a dual node GPFS cluster
1. Add a node to the GPFS cluster (mmaddnode)
http://www-01.ibm.com/support/knowledgecenter/SSFKCN_3.5.0/com.ibm.cluster.gpfs.v3r5.gpfs100.doc/bl1adm_mmaddnode.htm

2. Enable GPFS license for the node (mmchlicense)


http://www-01.ibm.com/support/knowledgecenter/SSFKCN_3.5.0/com.ibm.cluster.gpfs.v3r5.gpfs100.doc/bl1adm_mmchlicense.htm

3. Start the GPFS server on the node (mmstartup)


http://www-01.ibm.com/support/knowledgecenter/SSFKCN_3.5.0/com.ibm.cluster.gpfs.v3r5.gpfs100.doc/bl1adm_mmstartup.htm

Adding a node to the cluster in this manner is not required, the


additional node(s) can be included already at cluster creation time.

root@stglbs1:/: mmaddnode -N stglbs2


Sat Nov 1 02:46:04 GST 2014: 6027-1664 mmaddnode: Processing node stglbs2
mmaddnode: Command successfully completed
mmaddnode: 6027-1254 Warning: Not all nodes have proper GPFS license designations.
Use the mmchlicense command to designate licenses as needed.
mmaddnode: 6027-1371 Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.

root@stglbs1:/: mmchlicense server --accept -N stglbs2


The following nodes will be designated as possessing GPFS server licenses:
stglbs2
mmchlicense: Command successfully completed
mmchlicense: 6027-1371 Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.

root@stglbs1:/: mmstartup -N stglbs2


Sat Nov 1 02:46:57 GST 2014: 6027-1642 mmstartup: Starting GPFS ...

Copyright IBM Corporation 2015 39


Building a dual node GPFS cluster
1. Enable the node as secondary configuration server in the GPFS cluster (mmchcluster)
http://www-01.ibm.com/support/knowledgecenter/SSFKCN_3.5.0/com.ibm.cluster.gpfs.v3r5.gpfs100.doc/bl1adm_mmchcluster.htm

2. Enable the node as both quorum and manager (mmchnode)


http://www-01.ibm.com/support/knowledgecenter/SSFKCN_3.5.0/com.ibm.cluster.gpfs.v3r5.gpfs100.doc/bl1adm_mmchnode.htm

root@stglbs1:/: mmchcluster -s stglbs2


mmchcluster: GPFS cluster configuration servers:
mmchcluster: Primary server: stglbs1
mmchcluster: Secondary server: stglbs2
mmchcluster: Command successfully completed

root@stglbs1:/: mmchnode --quorum --manager -N stglbs2


Sat Nov 1 02:50:38 GST 2014: 6027-1664 mmchnode: Processing node stglbs2
mmchnode: 6027-1371 Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.

Copyright IBM Corporation 2015 40


Building a dual node GPFS cluster
Verify status of GPFS cluster (mmlscluster)
http://www-01.ibm.com/support/knowledgecenter/SSFKCN_3.5.0/com.ibm.cluster.gpfs.v3r5.gpfs100.doc/bl1adm_mmlscluster.htm

root@stglbs1:/: mmlscluster

GPFS cluster information


========================
GPFS cluster name: gpfscl1
GPFS cluster id: 5954771470676922314
GPFS UID domain: stglbs1
Remote shell command: /usr/bin/ssh
Remote file copy command: /usr/bin/scp

GPFS cluster configuration servers:


-----------------------------------
Primary server: stglbs1
Secondary server: stglbs2

Node Daemon node name IP address Admin node name Designation


---------------------------------------------------------------------
1 stglbs1 10.22.226.204 stglbs1 quorum-manager
2 stglbs2 10.22.226.205 stglbs2 quorum-manager

Copyright IBM Corporation 2015 41


Building a dual node GPFS cluster
Set some GPFS daemon tunables (mmchconfig)
http://www-01.ibm.com/support/knowledgecenter/SSFKCN_3.5.0/com.ibm.cluster.gpfs.v3r5.gpfs100.doc/bl1adm_mmchconfig.htm
The values are specific to this example and are not generic recommendations
GPFS tuning for Oracle, please refer to:
http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs300.doc/bl1ins_oracle.htm

root@stglbs1:/: mmchconfig maxMBpS=1200


mmchconfig: Command successfully completed
mmchconfig: 6027-1371 Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.

root@stglbs1:/: mmchconfig prefetchThreads=150


mmchconfig: Command successfully completed
mmchconfig: 6027-1371 Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.

root@stglbs1:/: mmchconfig worker1Threads=96


mmchconfig: Command successfully completed
mmchconfig: 6027-1371 Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.

root@stglbs1:/: mmchconfig pagepool=6g


mmchconfig: Command successfully completed
mmchconfig: 6027-1371 Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.

Copyright IBM Corporation 2015 42


Building a dual node GPFS cluster
Check the configuration of the GPFS cluster (mmlsconfig)
http://www-01.ibm.com/support/knowledgecenter/SSFKCN_3.5.0/com.ibm.cluster.gpfs.v3r5.gpfs100.doc/bl1adm_mmlsconfig.htm

root@stglbs1:/: mmlsconfig
Configuration data for cluster gpfscl1.stglbs1:
-----------------------------------------
myNodeConfigNumber 1
clusterName gpfscl1.stglbs1
clusterId 5954771470676922314
autoload no
dmapiFileHandleSize 32
minReleaseLevel 3.5.0.11
maxMBpS 1200
prefetchThreads 150
worker1Threads 96
pagepool 6g
adminMode central

File systems in cluster gpfscl1.stglbs1:


----------------------------------
(none)

Copyright IBM Corporation 2015 43


Building a dual node GPFS cluster
Create GPFS NSDs from LUNs (mmcrnsd)
http://www-01.ibm.com/support/knowledgecenter/SSFKCN_3.5.0/com.ibm.cluster.gpfs.v3r5.gpfs100.doc/bl1adm_mmcrnsd.htm

DeclareTieBreaker disks (mmchconfig)


http://www-01.ibm.com/support/knowledgecenter/SSFKCN_3.5.0/com.ibm.cluster.gpfs.v3r5.gpfs100.doc/bl1adm_mmchconfig.htm
The cluster needs to be down during this operation.

Input file BEFORE mmcrnsd processing


%nsd:
device=hdisk21
Servers=stglbs1,stglbs2 Input file AFTER mmcrnsd processing
%nsd: %nsd: nsd=gpfs42nsd
device=hdisk22 device=hdisk21
Servers=stglbs1,stglbs2 servers=stglbs1,stglbs2
%nsd: nsd=gpfs43nsd
root@stglbs1:/: mmcrnsd -F 2nsd.txt device=hdisk22
servers=stglbs1,stglbs2
mmcrnsd: Processing disk hdisk21 servers are the
mmcrnsd: Processing disk hdisk22 hostnames for
mmcrnsd: Propagating the cluster configuration data to all network access to the
affected nodes. This is an asynchronous process. NSD instead of direct
SAN attached, such
as if direct disk
root@stglbs1:/: mmlsnsd -d "gpfs42nsd;gpfs43nsd"
access fails.

File system Disk name NSD servers


---------------------------------------------------------------------------
(free disk) gpfs42nsd stglbs1,stglbs2
(free disk) gpfs43nsd stglbs1,stglbs2

root@stglbs1:/: mmshutdown a; mmchconfig tiebreakerDisks="gpfs42nsd"; mmstartup -a

Copyright IBM Corporation 2015 44


Building a dual node GPFS cluster
Create GPFS filesystem (mmcrfs)
http://www-01.ibm.com/support/knowledgecenter/SSFKCN_3.5.0/com.ibm.cluster.gpfs.v3r5.gpfs100.doc/bl1adm_mmcrfs.htm
http://www-01.ibm.com/support/knowledgecenter/SSFKCN/com.ibm.cluster.gpfs.doc/gpfs_faqs/gpfsclustersfaq.html

Plan the GPFS filesystem configuration settings !


root@stglbs1:/: mmcrfs /gpfs/data10 gdata10 -F 2nsd.txt -A yes -B 512k -n 4

GPFS: 6027-531 The following disks of gdata10 will be formatted on node stglbs1:
gpfs42nsd: size 1073741824 KB
gpfs43nsd: size 1073741824 KB
GPFS: 6027-540 Formatting file system ...
GPFS: 6027-535 Disks up to size 8.8 TB can be added to storage pool system.
Creating Inode File
Creating Allocation Maps
Creating Log Files
Clearing Inode Allocation Map
Clearing Block Allocation Map
Formatting Allocation Map for storage pool system
GPFS: 6027-572 Completed creation of file system /dev/gdata10.
mmcrfs: 6027-1371 Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.

root@stglbs1:/: mmlsnsd -d "gpfs42nsd;gpfs43nsd"

File system Disk name NSD servers


---------------------------------------------------------------------------
gdata10 gpfs42nsd stglbs1,stglbs2
gdata10 gpfs43nsd stglbs1,stglbs2

Copyright IBM Corporation 2015 45


in a row

Copyright IBM Corporation 2014


Get the ducks in a row

Know why
Business and regulatory requirements
Services, Risks, Costs
Key Performance Indicators (KPIs)
Understand how
Architect, Design, Plan
Can implement
Build, verify, inception, monitor, maintain, skill-up
Will govern
Service and Availability management
Change, Incident and problem management
Security and Performance management
Capacity planning
Migrate, replace and decommission

Copyright IBM Corporation 2015 47


Thank you Tack !

Bjrn Rodn
roden@ae.ibm.com
http://www.linkedin.com/in/roden
Copyright IBM Corporation 2015
Continue growing your IBM skills

ibm.com/training provides a
comprehensive portfolio of skills and career
accelerators that are designed to meet all
your training needs.

Training in cities local to you - where and


when you need it, and in the format you want
Use IBM Training Search to locate public training classes
near to you with our five Global Training Providers
Private training is also available with our Global Training
Providers

Demanding a high standard of quality


view the paths to success
Browse Training Paths and Certifications to find the
course that is right for you

If you cant find the training that is right for you


with our Global Training Providers, we can help.
Contact IBM Training at dpmc@us.ibm.com
Global Skills
Initiative

Copyright IBM Corporation 2015 49


Page intentionally left blank

Copyright IBM Corporation 2015 50


Balance business impact vs. solution costs

Consider the whole solution lifecycle

Down Time Costs Needs


&
(Business Impact) Reqs

Total
Cost Down
Cost

Solution
Costs Balance Time
Balance1 Costs

Solution Costs
(CAPEX/OPEX) Risk

Business Recovery Time

(1): Quick Total Cost Balance (TCB) = TCO or TCA + Business Down Time Costs

Copyright IBM Corporation 2015 51


Brief systematic approach
IT services continuity with Availability governance focus:
1. Identify critical business processes (from BIA/BCP)
2. Identify risk & threats (from BIA/BCP)
3. Identify business impacts & costs (from BIA/BCP)
4. Identify/Decide acceptable levels of service, risk, cost (from BIA/BCP)
----------------------------------------------------------------------------------------------
5. Define availability categories and classifying business applications according to business impact of
unavailability
6. Architect Availability infrastructure
7. Design solution from Availability architecture
8. Plan Availability solution implementation
9. Build Availability solution
10. Verify Availability solution
11. Operate and Maintain deployed Availability solution
----------------------------------------------------------------------------------------------
12. Validate Availability solution SLO, implementation, design and architecture
13. Decommission/Migrate/Replace

BIA Business Impact Analysis


BCP Business Continuity Plan
SLO Service Level Objectives

Copyright IBM Corporation 2015 52


Review your Availability Architecture
Is the Availability Architecture still in place?
Or might it have been altered when performing changes for:
Servers
Storage
Networks
Data Centres
Software upgrades
IT Service Management
Staffing
External suppliers and vendors
Assumption:
The longer time duration an IT environment is exposed to opportunities for human error, the risk
increase for deviation between Reality (facts on the ground) and the Availability Architecture (the
map)
Key areas:
Redundancy and Single Points of Failure (SPOF)
Communication flow and Server Service Dependencies
Local Area Network and Storage Area Network cabling
Application, system software and firmware currency
Staff attrition, mobility and cross skill focus

Copyright IBM Corporation 2015 53


Identify critical IT resources information flow perspective

Business process information flow


DONT DONT
FORGET FORGET

Information CORE Information


providing receiving
systems SYSTEMS systems
Depend-on Needed-by

Buffer Buffer
Degree of time Degree of time Degree of
Availability Availability Availability

Copyright IBM Corporation 2015 54


Identify critical IT resources deployment connectivity
perspective
Protocols (colors):
RMI / IIOP
HTTP / HTTPS
CIFS
NFS
LPD / IPP
MQ
DB2
JDBC
Java serializing

Copyright IBM Corporation 2015 55


Application, data and access resiliency notes

Application
Application restart after node failure (stop-start)
active / standby (automatic/manual)
Application concurrency (scale out)
active / active (separate or shared transaction tracking)

Data
Single site, single or dual storage
Storage based controlled by host (Hyperswap)
Host based (LVM mirroring/GPFS)
Database based (transaction replication)
Dual site, dual storage
Storage based (Metro/Global mirror)
Host based (GLVM/GPFS)
Database based (transaction replication)
Access
Primary site entry
Automated or manual redirection
Multi site concurrent entry
Automated or manual load balancing

Copyright IBM Corporation 2015 56


Architecting for Business Continuity

1
Can use BCI Good Practice or Note that Business Continuity
similar, or just start with Management (BCM) encompass
1. Develop contingency planning policy much more than IT Continuity.
2. Perform Business Impact Analysis Some national and international
3. Identify preventive controls standards and organizational
recommendations:
4. Develop recovery strategies
(1)BCI, Good Practice,
5. Develop IT contingency plan http://www.thebci.org/
(2)DRII, Professional Practices,
http://www.drii.org/
(3)ITIL IT Service Continuity: Continuity
Focus on business purpose management is the process by which
plans are put in place and managed to
ensure that IT Services can recover and
continue should a serious incident occur.
(4) ISO Information Security and Continuity, ISO
17799/27001
(5) US NIST Contingency Planning Guide for
Note: Information Technology Systems, NIST 800-34
-ITIL: Availability Management To optimize the capability of the IT (6) British Standard for Business Continuity
infrastructure, services and supporting organization to deliver a cost Management: BS 25999-1:2006
effective and sustained level of availability enabling the business to
(7) British Standard for Information and
meet their objectives.
Communications Technology Continuity
-COBIT: DS4 Ensure Continuous Service objectives are control over
Management: BS 25777:2008 (Paperback)
the IT process to ensure continuous service that satisfies the
business requirement for IT of ensuring minimal business impact in (8) BITS Basniv fr informationsskerhet,
the event of an IT service interruption. https://www.msb.se/RibData/Filer/pdf/24855.pdf
Copyright IBM Corporation 2015 57
Architecting for IT Service Continuity

1
Can use TOGAF ADM to bring
clarity and understanding from an
enterprise perspective on the
availability/continuity requirements
for different IT services

Focus on IT design & governance

(1) The Open Group Architecture Framework (TOGAF)


Architecture Development Method (ADM) is a step-by-step
approach to developing an enterprise architecture. The term http://www.opengroup.org/
"enterprise" in the context of "enterprise architecture" can be
used to denote both an entire enterprise or just a specific
domain within the enterprise.

Copyright IBM Corporation 2015 58


Controlling IT Service Continuity

1
COBIT DS4 to bring clarity and
understanding from an enterprise
perspective on the
availability/continuity requirements
for different IT services

Focus on control of IT processes

http://www.itgi.org/

IT
Governance

Resource
(1) The IT Governance Institute (ITGI) Control Objectives for Information Management
and related Technology (COBIT) is an international unifying framework
that integrates all of the main global IT standards, including ITIL,
CMMI and ISO17799, which provides good practices, representing the
consensus of experts, across a domain and process framework and
presents activities in a manageable and logical structure, focused on
control.

Copyright IBM Corporation 2015 59


Migrating to
PowerHA 7.1.3

Copyright IBM Corporation 2014


Eliminating SPOF by using redundant components
Cluster components To eliminate as single point of failure PowerHA SystemMirror supports
Nodes Use multiple nodes Up to 16.
Power sources Use multiple circuits or uninterruptible power supplies As many as needed.
Networks Use multiple networks to connect nodes Up to 48.
Network interfaces, devices, Use redundant network adapters Up to 256.
and labels
TCP/IP subsystems Use networks to connect adjoining nodes and clients As many as needed.
Disk adapters Use redundant disk adapters As many as needed.
Controllers Use redundant disk controllers As many as needed.
Disks Use redundant hardware and disk mirroring, striping, or As many as needed.
both
Applications Assign a node for application takeover, to configure an Flexible configuration policies for high availability within a
application monitor, and to configure clusters with nodes site and between sites.
at more than one site.
Sites Use more than one site for disaster recovery. Up to two sites.
Resource groups Use resource groups to specify how a set of entities Up to 64 per cluster.
should perform.
Cluster resources Use multiple cluster resources. Up to 128 for the clinfo daemon (more can exist).

Virtual I/O Server (VIOS) Use redundant VIOS As many as needed.


HMC Use redundant HMC Up to 2.
Managed System hosting a Use separate managed systems for each cluster node Up to 16.
cluster node
Cluster repository disk Use RAID protection One active repository disk per site that has the ability to
replace the disk after a failure. You must have a spare
disk that is available to replace the failed repository disk
in the live cluster.

http://www-01.ibm.com/support/knowledgecenter/SSPHQG_7.1.0/com.ibm.powerha.plangd/ha_plan_eliminate_spf.htm

Copyright IBM Corporation 2015 61


Migration process to PowerHA 7.1.3 from 6.1
1. Verify current PowerHA 6.1 availability functionality
Run cluster verification and make sure no errors are reported
2. Verify PowerHA 7.1 preconditions, heartbeat networks and SPOFs
3. AIX upgrade
Upgrade all nodes in the cluster to AIX 6.1 TL9 SP1 or AIX 7.1 TL3 SP1 or higher
Leverage altdisk install and rotating one node at a time
http://publib.boulder.ibm.com/infocenter/aix/v6r1/topic/com.ibm.aix.install/doc/insgdrf/alt_disk_migration.htm

4. Migrate the PowerHA 6.1 cluster


New install and configure
Design and install PowerHA cluster from scratch.
Rolling migration
You can upgrade a PowerHA cluster while keeping your applications running and available, during the upgrade
process, a new version of the software is installed on each cluster node while the remaining nodes continue to
run the earlier version.
Offline upgrade
This type of migration involves bringing down the entire PowerHA cluster, reconfiguring the active cluster to fit,
installing the new PowerHA and restarting cluster services one node at a time.
Snapshot upgrade
This type of migration involves bringing down the entire PowerHA cluster, reconfiguring the snapshot
configuration, installing the new PowerHA and restarting cluster services one node at a time.

5. Verify cluster and high availability functionality


Cluster system functionality tests
Component failure tests
Failure scenario tests

Copyright IBM Corporation 2015 62


Todo before migration
Software levels for currency
Upgrade AIX and RSCT to supporting levels and ensure that the same level of cluster software (including PTFs) are
on all nodes before beginning a migration
AIX 6.1 TL9 SP1
AIX 7.1 TL3 SP1
RSCT 3.1.2 or later
Ensure that the PowerHA cluster software is committed (not applied)
When performing a rolling migration, all nodes in the cluster must be upgraded to the new base release before
applying any updates for that release
Run cluster verification and make sure no errors are reported
Take a snapshot of the cluster configuration
Backup and mksysb
The "Communication Path to Node" on
Use the /usr/sbin/clmigcheck tool the PowerHA cluster nodes must be set
to an IP-address mapping to the
hostname.
7.1 AIX 6.1 TL6+ AIX 6.1 RSCT 3.1.0.0 or higher All cluster node hostnames must be
AIX 7.1 AIX 7.1 RSCT 3.1.0.0 resolved locally using the /etc/hosts file
7.1.1 AIX 6.1 TL7 SP2 RSCT 3.1.2.0 or higher for both AIX 6.1 and 7.1 (IP address and label), use netsvc.conf,
AIX 7.1 TL1 SP2 irs.conf or NSORDER in
/etc/environment to set the order.
7.1.2 AIX 6.1 TL8 SP1 RSCT 3.1.2.0 or higher for both AIX 6.1 and 7.1
AIX 7.1 TL2 SP1 Pre-7.1.3: After you have synchronized
the initial cluster configuration, it is not
7.1.3 AIX 6.1 TL9 SP1 RSCT 3.1.2.0 or higher for both AIX 6.1 and supported to change the hostname or IP
AIX 7.1 TL3 SP1 7.1 resolution of the hostname.

http://www-01.ibm.com/support/knowledgecenter/SSPHQG_7.1.0/com.ibm.powerha.insgd/ha_install_required_aix.htm
Copyright IBM Corporation 2015 63
Todo before migration
Verify cluster conditions and settings
Use clstat to review the cluster state and to make certain that the cluster is in a stable state
Review the /etc/hosts file on each node to make certain it is correct
Review the /etc/netsvc.conf (equiv) file on each node to make certain it is correct
Review the /usr/es/sbin/cluster/netmon.cf file on each node to make certain it is correct
After AIX Version 6.1.6, or later is installed, enter the fully qualified host name of every node in the cluster in
the /etc/cluster/rhosts file
Take a snapshot of the cluster configuration and save off customized scripts, such as start, stop,
monitor and event script files
Remove configurations which cant be migrated
Configurations with IPAT via replacement or hardware address takeover (MAC address)
Configurations with heartbeat via IP aliasing
Configurations with non-IP networking, such as RS232, TMSCSI/SSA, DISKHB or MNDHB
Configurations which use other than Ethernet for network communication, such as FDDI, ATM, X25, TokenRing
Note that clmigcheck doesn't flag an error if DISKHB network is found and PowerHA migration utility automatically
takes care of removing that network
SAN storage for Repository Disk and Target Mode
The repository is stored on a disk that must be SAN attached and zoned to be shared by every node in the cluster and
only the nodes in the cluster and not part of a volume group
SAN zoning of FC adapters WWPN for Target Mode communication
Multicast IP address for the monitoring technology (optional)
You can explicitly specify multicast addresses, or one will be assigned by CAA
Ensure that multicast communication is functional in your network topology before migration
Note that from PowerHA 7.1.3 unicast is default

Copyright IBM Corporation 2015 64


clmigcheck tool (1/2)

clmigcheck tool is part of base AIX from 6.1 TL6 or 7.1 (/usr/sbin/clmigcheck)
An interactive tool that verifies the current cluster configuration, checks for unsupported elements, and collects
additional information required for migration
Saves migration check to file /tmp/clmigcheck/clmigcheck.log
You must run this command on all cluster nodes, one node at a time, before installing PowerHA 7.1.3
When the clmigcheck command is run on the last node of the cluster before installing PowerHA 7.1.3, the CAA
infrastructure will be started (check with lscluster -m command).

----------[PowerHA System Mirror Migration Check] -------------


Please select one of the following options:
1 = Check ODM configuration.
2 = Check snapshot configuration.
3 = Enter repository disk and multicast IP addresses.

Select one of the above, "x" to exit or "h" for help:

Copyright IBM Corporation 2015 65


clmigcheck tool (2/2)

Option 1
Checks configuration data (/etc/es/objrepos) and provides errors and warnings if there are any elements
in the configuration that must be removed manually.
In that case, the flagged elements must be removed, cluster configuration verified and synchronized, and
clmigcheck must be rerun until the configuration data check completes without errors.
Option 2
Checks a snapshot (present in /usr/es/sbin/cluster/snapshots) and provides error information if there are
any elements in the configuration that will not migrate.
Errors checking the snapshot indicate that the snapshot cannot be used as it is for migration, and
PowerHA do not provide tools to edit a snapshot.
Option 3
Queries for additional configuration needed and saves it in a file in /var on every node in the cluster.
When option 3 is selected from the main screen, you will be prompted for repository
disk and multicast dotted decimal IP addresses.
Newer version of AIX has updated /usr/sbin/clmighcheck command and ask to select "Unicast" or
"Multicast.

Use either option 1 or option 2 successfully before running option 3, which collects and
stores configuration data in the node file /var/clmigcheck/clmigcheck.txt, which is used
when PowerHA 7.1.3 is installed.

Copyright IBM Corporation 2015 66


Rolling Migration Overview Steps
1. Stop cluster services on one node (move rg as needed)
2. Upgrade AIX (if needed) and reboot
Also install additional CAA filesets, bos.cluster and bos.ahafs
3. Verify /etc/hosts and /etc/netsvc.conf (and /usr/es/sbin/cluster/netmon.cf)
4. Update /etc/cluster/rhosts
Enter cluster node hostname IP addresses. Only one IP address per line.
5. Refresh -s clcomd
6. Execute clmigcheck (option1, then option 3)
7. Upgrade PowerHA
Install base level install images and complete upgrade procedures
Then comeback and apply lastest SPs on top of it. Can be done non-disruptively.
8. Review the /tmp/clconvert.log file
9. Restart cluster services (move rg back if needed)
10. Repeat steps above for each node (minus the additional options on clmigcheck)

Copyright IBM Corporation 2015 67


Further reading

PowerHA for AIX


http://www-03.ibm.com/systems/power/software/availability/aix/index.html
PowerHA for AIX Version Compatibility Matrix
http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/TD101347
PowerHA Hardware Support Matrix
http://w3-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/TD105638
PowerHA 7.1 Infocenter
http://www-01.ibm.com/support/knowledgecenter/SSPHQG_7.1.0/com.ibm.powerha.navigation/powerha_pdf.htm
http://www-01.ibm.com/support/knowledgecenter/SSPHQG_7.1.0/com.ibm.powerha.insgd/ha_install_offline_61to710.htm
http://www-01.ibm.com/support/knowledgecenter/SSPHQG_7.1.0/com.ibm.powerha.insgd/ha_install_upgrade_snapshot_61to71x.htm
http://www-01.ibm.com/support/knowledgecenter/SSPHQG_7.1.0/com.ibm.powerha.insgd/ha_install_rolling_migration_61to710.htm

What's new in PowerHA 7.1


http://publib.boulder.ibm.com/infocenter/aix/v7r1/topic/com.ibm.aix.powerha.navigation/powerha_whatsnew.htm
PowerHA 7.1.3 Release Notes
http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/PRS5241
PowerHA 7.1.3 Announcement letter
http://www-01.ibm.com/common/ssi/cgi-bin/ssialias?infotype=AN&subtype=CA&htmlfid=897/ENUS213-416
IBM PowerHA SystemMirror for AIX 7.1.3 Enhancements
http://www.redbooks.ibm.com/abstracts/tips1097.html
IBM PowerHA cluster migration
http://www.ibm.com/developerworks/aix/library/au-aix-powerha-cluster-migration/

Copyright IBM Corporation 2015 68


GPFS Documentation
GPFS Infocenter, FAQ, and Redbooks
http://www-01.ibm.com/support/knowledgecenter/SSFKCN/gpfs_welcome.html
https://www.google.com.sa/search?q=site%3Aredbooks.ibm.com+GPFS
If you have any comments, suggestions or questions regarding the information provided in the FAQ you can send
email to gpfs@us.ibm.com.
GPFS developerWorks
http://www.ibm.com/developerworks/forums/forum.jspa?forumID=479&categoryID=13
http://www.ibm.com/developerworks/wikis/display/hpccentral/General+Parallel+File+System+%28GPFS%29
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20
(GPFS)/page/GPFS%20Wiki
GPFS support and software support lifecycle pages
http://www-03.ibm.com/systems/software/gpfs/resources.html
http://www-947.ibm.com/support/entry/portal/Overview/Software/Other_Software/General_Parallel_File_System
http://www-01.ibm.com/software/support/lifecycleapp/PLCSearch.wss?q=General+Parallel+File+System+for+AIX
GPFS 3.5 announcement letter
http://www01.ibm.com/common/ssi/rep_ca/7/897/ENUS212047/ENUS212047.PDF

GPFS 4.1 announcement letter
http://www-01.ibm.com/common/ssi/rep_ca/9/877/ENUSZP14-0099/ENUSZP14-0099.PDF
GPFS LPP sample files
/usr/lpp/mmfs/samples/
IBM Flash storage
http://www-03.ibm.com/systems/storage/flash/

Copyright IBM Corporation 2015 69


IBM Systems Lab Services and Training

You might also like