HADR1

Enabling High Availability Using DB2 HADR and
IBM PowerHA in an SAP Environment

- with application monitoring and failover automation
Applies to:
SAP NetWeaver 7.0 or higher on DB2 9.5 or higher for Linux, UNIX, and Windows.
Summary
You want to use the DB2 High Availability Disaster Recover (HADR) feature to improve the availability of
your SAP systems. In your landscape, IBM PowerHA/HACMP is the chosen cluster environment instead of
IBM Tivoli SA MP. This paper describes how to automate IBM DB2 HADR failover using IBM
PowerHA/HACMP in a two-node cluster SAP environment, including application monitoring and failover
automation capability.
Author: Tao (Stephen) Sun

Author Bio
Tao joined SAP in 2007 as support consultant. He was providing development support for performance
problems to SAP customers on IBM DB2 for Linux, UNIX, and Windows in APJ, US and EMEA regions.
Currently he works on technical architecture and infrastructure design topics, covers data center design,
general infrastructure, scalability and load balancing, high availability and disaster recovery solution, SAP
platform including SAP HANA, virtualization and cloud deployment. He can be reached at
stephen.sun@sap.com.
ENABLING HIGH AVAILABILITY USING DB2 HADR AND IBM POWERHA IN AN SAP ENVIRONMENT
TABLE OF CONTENTS
1 INTRODUCTION .......................................................................................................................... 4
1.1 High Availability for SAP NetWeaver ......................................................................................... 4
1.2 DB2 HADR ................................................................................................................................... 4
1.3 IBM PowerHA .............................................................................................................................. 5
2 PLANNING AND DESIGNING A TWO-NODE IBM POWERHA CLUSTER WITH DB2 HADR IN
AN SAP ENVIRONMENT ............................................................................................................................ 6
2.1 System Architecture ................................................................................................................... 6
2.2 Sample Test System Configuration ........................................................................................... 7
3 IMPLEMENTATION OF THE TWO-NODE POWERHA CLUSTER WITH DB2 HADR IN AN SAP
ENVIRONMENT .......................................................................................................................................... 8
3.1 General Setup Procedures for Systems with DB2 HADR ......................................................... 8
3.1.1 Install SAP NetWeaver System with DB2 Database without DB2 HADR .................................. 8
3.1.2 Create a DB2 HADR Standby Database ..................................................................................... 8
3.1.3 Configure Database to Enable DB2 HADR ................................................................................ 9
3.2 General IBM PowerHA Cluster Configuration Steps ................................................................. 9
3.3 Sample IBM PowerHA Configuration for SAP NetWeaver with DB2 HADR ............................11
3.3.1 Application Server Configuration .............................................................................................11
3.3.2 Custom Application Monitor .....................................................................................................12
3.3.3 Resources and Attributes for Sample Resource Group ..........................................................13
3.4 Recommended High Availability Test and Verification Scenarios ..........................................13
4 FURTHER DISCUSSIONS ON SPECIFIC TOPICS .....................................................................18
4.1 Virtual IP or Automatic Client Reroute......................................................................................18
4.2 DB2 HADR State Flow, Potential Issue for Automatic Failover ...............................................18
4.3 Alternative IBM PowerHA-based DB2 HADR Failover Automation Solutions.........................19
4.3.1 Solution with Custom Application Monitor but without Instance Failover Capability ............20
4.3.2 Parallel Resource Groups with Monitoring for Both DB2 Primary and DB2 Standby
Databases..................................................................................................................................................20
4.3.3 Variations of the Solution Proposed in this Paper ...................................................................20
4.4 Script Adjustments due to Version Changes ...........................................................................22
4.4.1 Adding "Peer Window Only" Option .........................................................................................22
4.4.2 Mandatory Script Adjustments as of DB2 10.1.........................................................................22
5 SCRIPTS FOR THE SAMPLE IBM POWERHA CONFIGURATION ............................................26
5.1 IBM PowerHA Scripts ................................................................................................................26
5.2 SAP CS and ERS Instance or Start Profiles .............................................................................30
6 RELATED CONTENT .................................................................................................................34
2
Change History
Version 1.0 (January 2017) Initial version
3
1 INTRODUCTION
1.1 High Availability for SAP NetWeaver

System outages can cost companies millions of dollars. The outages derive from Single Point of Failures
(SPOFs) like network, hardware, and software failures, or even human errors. High Availability (HA) is the
requirement to maximize system availability. From an end-users perspective, the technology behind HA is
hidden.
Recovery Point Objective (RPO) and Recovery Time Objective (RTO) are common terms to specify
requirements for HA/DR solutions. RPO defines how much business data can be lost in terms of business
activity time - or phrased differently - how far you can go back in time to resume application processing. So it
is the target tolerance threshold for lost business transactions. RTO focuses on how fast the application can
be "restarted" and brought back online, so it is the target maximum time to resume application functionality.
In an HA context, normally what we have is the demand of "zero" for RPO (no data loss) and several minutes
for RTO. The HA solution is expected to be capable of a fully automated failover for an active-passive
cluster, or for continuous availability for an active-active cluster while no data is lost and business can
continue.
SAP addresses these challenges within the SAP NetWeaver platform. SAP NetWeaver is a combination of
several technical components, forming a technology platform to run SAP Business Suite or SAP BW
systems. Depending on the SAP NetWeaver products and versions, the SAP NetWeaver platform has three
kinds of installation variants:
SAP NetWeaver ABAP (ABAP execution environment only)
SAP NetWeaver Java (Java / Java EE execution environment only)
SAP NetWeaver Dual Stack (both execution environments in a single system)
For each exists the following:

A single Central Services (CS) instance for the communication between application server instances,
and application locks handling
One or multiple Application Server (AS) instances serving user requests
One database management system
Global file systems for all instances, such as /sapmnt/<SID> and /usr/sap/trans
The SAP application locks (enqueue lock table) information of the enqueue server needs to be retained after
unexpected enqueue server failures. In order to preserve the enqueue locks, a duplicated enqueue table is
needed. The SAP Enqueue Replication Server (ERS) does exactly this.
Natively a SAP NetWeaver system can scale by adding more AS instances to form an SAP NetWeaver AS
cluster, so AS instances are not Single Point of Failures (SPOF). However, at least two AS instances need to
run on different physical servers to protect from hardware failures. All other components namely the SAP
CS instance, database management system, and global file systems are classified as SPOFs, as there exist
one and only one copy of these components for each SAP NetWeaver system. From simplification
perspective, we only recommend to protect components classified as SPOFs using component-specific
features or third-party cluster solutions, either a failover cluster or active-active cluster.
1.2 DB2 HADR

IBM DB2 for Linux, UNIX, and Windows (LUW) is one of the supported database management systems of
SAP NetWeaver. DB2 High Availability Disaster Recovery (HADR) is a DB2 LUW feature for high availability
and disaster recovery. It is the preferred solution for HA and DR on DB2. It replicates logged activities from a
primary database to one or more standby databases. In case the primary database fails, a standby database
takes over as the new primary. This feature has been first introduced in DB2 LUW version 8.2 with one
single standby. Starting with DB2 LUW 10.1, it supports multiple standby databases. As of DB2 LUW 10.5, it
can run in a DB2 pureScale environment. This makes DB2 LUW capable of providing a complete Disaster
Recovery (DR), High Availability (HA) and Continuous Availability solution in a single and easily manageable
feature for both partial and complete site failures. In this paper, we just focus on the HA part of DB2 HADR
feature.
4
DB2 HADR is based on transaction log shipping and replay. You initialize the standby database with a
backup or split mirror image of the primary database. You then configure and start DB2 HADR on the
primary and standby databases. The primary ships its transaction log data to the standby via a TCP
connection, and the standby continuously replays the log records. Defined by database configuration
parameter HADR_SYNCMODE, DB2 HADR synchronization modes (SYNC, NEARSYNC, ASYNC,
SUPERASYNC) control the risk of protection against transaction loss during the log shipping. To keep the
standby databases in sync with the primary, the primary must wait for the standby to acknowledge (ACK)
before it can commit. For this, SYNC or NEARSYNC mode should be used. The only difference between
SYNC and NEARSYNC mode is: SYNC mode needs database logs to be written to disk on the standby,
while database logs are only needed to be received in memory on the standby in NEARSYNC mode.
Note: The key difference between HA and DR is the Recovery Time Objective (RPO) definition. While DR
normally accepts reasonable RPO, HA requires RPO zero, that is zero data loss.
SYNC mode can have a performance impact on the application workload even with sufficient network
bandwidth and small network latency. Therefore, SAP recommends to use NEARSYNC as this provides
adequate data protection with smaller performance impact.
Actually the primary will not wait infinitely for the standby to acknowledge before it can commit, but up to the
boundary defined by the hadr_timeout and hadr_peer_window database configuration parameters.
1.3 IBM PowerHA

IBM PowerHA SystemMirror (formerly also known as HACMP - High Availability Cluster Multiprocessing, or
IBM PowerHA) provides HA solutions on IBM AIX and IBM i operating systems. In this document, we just
use IBM PowerHA as common name for this cluster manager on IBM AIX operating systems. By default, IBM
Tivoli SA MP is the high availability cluster solution bundled with DB2 HADR. But in some companies, IBM
PowerHA/HACMP is the chosen standard cluster manager for all solutions, so you would need to use IBM
PowerHA/HACMP for DB2 HADR failover automation (also for other components) instead of IBM Tivoli SA
MP.
IBM PowerHA relies on IBMs Reliable Scalable Cluster Technology (RSCT). RSCT includes a daemon
called group services (grpsvcs) that coordinates the response to events to the cluster. Up to IBM PowerHA
V6.1, RSCT also monitored cluster nodes, networks and network adapters for failures using the topology
services daemon (topsvcs). As of IBM PowerHA V7.1, RSCT still provides a coordinate response between
nodes, but monitoring and communication are provided by the Cluster Aware AIX (CAA) infrastructure. CAA
is a clustering infrastructure built into the operating system and exploited by RSCT and PowerHA. CAA
replaces the function provided by Topology Services (topsvcs) in RSCT in previous releases of IBM
PowerHA.
IBM PowerHA depends on the redundancy of hardware components, so make sure the redundancy is
guaranteed on network, storage, and server level. After that, to keep specific business applications highly
available, resource, resource group, application server/controller, and application monitoring concepts are
employed. A resource is any component that is required to bring up a service application, and it can contain
many things: file systems, volume groups, IP addresses, Network File System (NFS) mounts, applications.
To bring up an application, a set of resources is usually required. This set is called the resource group. The
application server/controller is used to control the start and stop behaviors of certain applications. For each
application, you can also create application monitoring methods to perform automatic checks on the
application to verify that the application functionalities work correctly. If an application monitoring method
fails, the cluster moves the resource group to another node or only notifies someone. In some cases,
multiple applications are distributed together between cluster nodes, so this design becomes an issue if you
do not check whether any type of relationship or start/stop sequence exists between applications. To
address this issue, PowerHA allows you to define the start and stop orders for the resource groups and
restrictions about how to bring up the nodes: parent/child dependencies, location dependencies, and start-
after/stop-after policies.
5
2 PLANNING AND DESIGNING A TWO-NODE IBM POWERHA CLUSTER WITH DB2 HADR IN AN
SAP ENVIRONMENT
2.1 System Architecture
Fig ur e 1: I BM Po w er HA Sy st e m T o po logy wit h R ed un d an cy
The above figure 1 shows an example topology with redundancy on hardware components network,
storage, and server level. Technically, only one storage subsystem is needed, as the modern storage
subsystems normally provide enough HA features on both processing and data redundancy aspects. With
one additional storage subsystem, the DB2 primary database and standby databases can allocate the
storage space from different storage subsystems, so even if the storage subsystem for the DB2 primary
database is completely unavailable, we could still resume the database service very fast without involving
database restore procedures.
Fig ur e 2: C o mp on ent s De pl oy m en t wi thi n th e I BM Po w er HA Cl ust er
6
As per our recommendations, only SPOFs are protected, namely SAP CS/ERS instance, DB2, and NFS for
global file systems, using IBM PowerHA. For SAP Application Server (AS) instances, we just install multiple
SAP AS instances across two servers. In the sample setup, all SPOFs are configured in one IBM PowerHA
resource group, so in case of a failover/fallover, all active components are always together. The SAP CS,
NFS services, and DB2 HADR database are accessed with one single virtual IP address. SAP ERS
instances are installed locally on each host.
2.2 Sample Test System Configuration

Hardware configuration:
Storage subsystem: IBM System Storage DS8700
Server type: IBM Power 595 9119-FHA
Server capacity for each server: 25 CPU cores, 200GB physical memory
Network: FC for SAN storage, and Gigabit Ethernet
Software configuration:
AIX 6.1 TL8 SP03
IBM PowerHA SystemMirror 6.1 SP12
IBM DB2 LUW 9.7 FP9
System Identifier:
SID, sid, <SID>, <sid> may be used as placeholder for system identifiers in SAP environments.
DST is used for SID in some sample configurations.
7
3 IMPLEMENTATION OF THE TWO-NODE POWERHA CLUSTER WITH DB2 HADR IN AN SAP

ENVIRONMENT
3.1 General Setup Procedures for Systems with DB2 HADR
3.1.1 Install SAP NetWeaver System with DB2 Database without DB2 HADR
As first step, install a SAP NetWeaver-based system following the standard SAP installation procedure. You
can find the latest versions of SAP installation guides on SAP Service Marketplace at
http://service.sap.com/instguides.
In our sample setup, the SAP CS and DB2 database need to be installed on the same virtual IP (start SWPM
using option sapinst SAPINST_USE_HOSTNAME=<virtual IP>). On AIX, the ifconfig command can be used
to manually assign and delete virtual IPs before they are available via IBM PowerHA configuration, such as
ifconfig en0 inet 10.10.10.101 netmask 255.255.255.0 alias
ifconfig en0 inet 10.10.10.101 delete
3.1.2 Create a DB2 HADR Standby Database

To create a DB2 HADR standby database for SAP systems, it is recommended to perform a homogeneous
system copy (database copy method) using SAP Software Provisioning Manager (SWPM). A custom
installation must be selected to be able to manually enter the same user IDs and group IDs used in the
primary server. SWPM also provides DB2 HADR-specific installation options to integrate IBM Tivoli System
Automation for Multiplatforms for DB2 (IBM TSA MP) as cluster type, but in our scenario, this TSA MP option
must NOT be selected. The SAP homogeneous system copy guides can be found at the same link as the
SAP installation guides.
Before the SAP homogeneous system copy, the SAP global file system directories /sapmnt/<SID> should be
mounted on the DB2 standby host, if it is not already mounted there. A homogeneous system copy creates
all users, sets up the environment, installs the database software, creates the instance on the standby, and
then prompts the user to restore the database from a backup from the primary database. If you prefer to do
this manually, you can perform the following steps (for DB2 9.7):
Manually create all users and groups, and make sure the user environments are the same as on the
primary server.
Install the same version and Fix Pack of the database software using the db2setup executable as on
the primary database. You can use the graphical user interface, or silent mode as follows:
<installation_media>/AIX_64/ESE/disk1/db2setup -i en -l /tmp/ db2setup.log -t /tmp/db2setup.trc -r
/tmp/unix_ese.rsp
The sample unix_ese.rsp file for DB2 V9.7 contains the following:
PROD=ENTERPRISE_SERVER_EDITION
FILE=/db2/db2<sid>/db2_software
LIC_AGREEMENT=ACCEPT
INSTALL_TYPE=TYPICAL
LANG=EN
CONFIG_ONLY=NO
INSTALL_ITMA=NO
INSTALL_TSAMP=NO
Create the DB2 instance on the standby server

The following command can be used:
/db2/db2<sid>/db2_software/instance/db2icrt -s ESE -a SERVER_ENCRYPT -u db2<sid> db2<sid>
Install DB2 license using db2licm command
Maintain the database ports in the /etc/services file

The database ports in the /etc/services file from the standby database host must be maintained the
same as on the primary database host. See the following example:
sapdb2<SID> 5912/tcp
<SID>_HADR_1 5951/tcp # DB2 HADR log shipping
8
<SID>_HADR_2 5952/tcp # DB2 HADR log shipping

DB2_db2<sid> 5914/tcp
DB2_db2<sid>_1 5915/tcp
DB2_db2<sid>_2 5916/tcp
DB2_db2<sid>_END 5917/tcp
Update the DB2 registry variables and database manager configuration on the standby server
Set the most important DB2 registry variables as follows (important: only set other registry variables if
you are advised by SAP to do so):
db2set DB2_WORKLOAD=SAP
db2set "DB2ENVLIST=INSTHOME SAPSYSTEMNAME dbs_db6_schema DIR_LIBRARY LIBPATH"
Update the standby database manager configurations as on the primary server.
Restore the database from a backup taken from the primary database (or using split-mirror/snapshot
backup technology) and put the database in rollfoward pending status.
3.1.3 Configure Database to Enable DB2 HADR

Configure the DB2 HADR buffer size according to the system workload. The buffer must be able to handle
peaks from the primary database workload. For example, set it to 4GB as the recommended soft limit for
systems with high workload by using command db2set DB2_HADR_BUF_SIZE=1048575.
Update the following database configuration parameters accordingly for both the primary and the standby to
enable DB2 HADR:
HADR_LOCAL_HOST
HADR_LOCAL_SVC
HADR_REMOTE_HOST
HADR_REMOTE_SVC
HADR_REMOTE_INST
HADR_TIMEOUT
HADR_PEER_WINDOW
INDEXREC
LOGINDEXBUILD
HADR_SPOOL_LIMIT
Note: HADR_PEER_WINDOW and the "PEER WINDOW ONLY" option of the TAKEOVER HADR command
were introduced with DB2 V9.5. HADR_SPOOL_LIMIT has been available since DB2 10.1.
Start and synchronize the DB2 HADR. DB2 HADR must be started on the standby first and then on the
primary using the START HADR command. DB2 HADR is now enabled and the standby will begin to replay
the logs to catch up to the primary. The "db2pd -d <SID> -HADR" command can be used to monitor whether
DB2 HADR has reached PEER state. After DB2 HADR initialization, the DB2 activate and deactivate
commands are recommended for starting and stopping DB2 HADR instead of START HADR and STOP
HADR, except the role changes are needed.
3.2 General IBM PowerHA Cluster Configuration Steps

To establish and configure an IBM PowerHA cluster, the following general steps need to be performed (they
might vary slightly depending on version and Service Pack levels, and, for example, whether you use smitty
sysmirror instead of smitty hacmp):
Create a Cluster
Use the following menu path and enter the cluster name:
smitty hacmp Extended Configuration Extended Topology Configuration Configure an HACMP
Cluster Add/Change/Show an HACMP Cluster
Add Cluster Nodes

Use the following menu path, fill in the node name, and select the "Communication Path to Node":
9
smitty hacmp Extended Configuration Extended Topology Configuration Configure HACMP

Nodes Add a Node to the HACMP Cluster
Add Network(s)
Use the following menu path, fill in the network name, and select the network type and corresponding
settings:
Networks Add a Network to the HACMP Cluster
Add Communication Interfaces and Devices

Use the following menu path, fill in the network name, network type, node name, and network interface
or device path:
Networks Configure HACMP Communication Interfaces/Devices Add Communication
Interfaces/Device Add Pre-defined Communication Interfaces and Devices Communication
Interfaces
or
Networks Configure HACMP Communication Interfaces/Devices Add Communication
Interfaces/Device Add Pre-defined Communication Interfaces and Devices Communication
Devices
Add Service (virtual) IPs

Use the following menu path and fill in the network name and IP label/address:
smitty hacmp Extended Configuration Extended Resource Configuration HACMP Extended
Resources Configuration Configure HACMP Service IP Labels/Addresses Add a Service IP
Label/Address Configurable on Multiple Nodes
Add Application Servers

Use the following menu path and fill in the server name, start script, and stop script:
Resources Configuration Configure HACMP Applications Servers Add an Application Server
Add Application Monitors

Use the following menu path and fill in customized application monitor details:
Resources Configuration Configure HACMP Application Servers Configure HACMP Application
Monitoring Configure Custom Application Monitors Add a Custom Application Monitor
Add Resource Groups

Use the following menu path and fill in the resource group name, participating nodes, startup, fallover
and fallback policy:
Resource Group Configuration Add a Resource Group
Change Resource Groups

Use the following menu path and associate the defined virtual IPs (service IP labels/addresses),
applications servers, volume groups, and corresponding NFS cross-mounting options:
Resource Group Configuration Change/Show Resources and Attributes for a Resource Group
Verify and Synchronize Cluster

Use the following menu path and choose the desired options:
smitty hacmp Extended Configuration Extended Verification and Synchronization
Start and Stop Cluster Services
10
Use the following menu path and choose the desired options:
smitty hacmp System Management (C-SPOC) Manage HACMP Services Start Cluster
Services
or
smitty hacmp System Management (C-SPOC) Manage HACMP Services Stop Cluster
Services
3.3 Sample IBM PowerHA Configuration for SAP NetWeaver with DB2 HADR
The following is a sample PowerHA configuration from the demo and test environment.
3.3.1 Application Server Configuration
Fig ur e 3: Sa mp l e Po we r HA App li c ati on S erv er a pp _e c cd b C onf ig ura ti on
The Start Script /hacmp/new_hadr/db2_hadr_start.sh contains:

#!/bin/ksh
sh /hacmp/new_hadr/r3startCS.sh
sh /db2/db2dst/db2_software/samples/hacmp/rc.hadr.start db2dst db2dst dst verbose
The Stop Script /hacmp/new_hadr/db2_hadr_stop.sh contains:

#!/bin/ksh
DB2PID=$(ps -ef | grep db2sysc | grep -v grep | awk '{print $2}')
sh /hacmp/new_hadr/r3stopCS.sh
if [ ! -z "${DB2PID}" ]; then
sh /db2/db2dst/db2_software/samples/hacmp/rc.hadr.stop db2dst db2dst dst verbose
fi
The start and stop scripts call DB2 standard sample scripts for the DB2 HADR environment. We have made
slight adjustments to one of the standard scripts, see section 4.4. Additionally, since we integrate SAP CS,
ERS and DB2 HADR in one PowerHA resource group, the start and stop scripts also control the start and
stop of the SAP CS and ERS instances. The complete contents of all scripts are shown in chapter 5.
Note: You need to replace db2dst and dst with your DB2 instance name and DBSID accordingly. This
applies to all the following chapters of this document.
11
3.3.2 Custom Application Monitor
Fig ur e 4: Sa mp l e Po we r HA Cu sto m Ap pli c ati on M onit or ec cd b _ AM for App li ca ti on Se rv er ap p _e c cd b
The DB2 monitor script /hacmp/new_hadr/db2_hadr_monitor.sh contains:

#!/bin/ksh
DB2STATE=$(su - db2dst -c "db2gcf -s" | awk '{print $4}' | tr "." " " | awk '{print $1}' | tail -1)
HADRROLE=$(su - db2dst -c "db2pd -db dst -hadr" | grep -p Role | grep "[a-zA-Z]" | tail -1 | awk '{print $1}')
if [ ${DB2STATE} = "Available" ]; then
if [ "${HADRROLE}" = "Primary" ]; then
rc=0
else
rc=50
fi
else
rc=100
fi
echo $rc
exit $rc
The cleanup script /hacmp/new_hadr/db2_hadr_cleanup.sh contains:

#!/bin/ksh
su - db2dst -c "db2gcf -k"
The restart script /hacmp/new_hadr/db2_hadr_restart.sh contains:

#!/bin/ksh
su - db2dst -c "db2gcf -u"
In the above custom application monitor configuration, you can find the "Restart Count" set to zero, so once
an application failure is detected, a failover will be triggered. The cleanup and restart methods will not be
executed, but the stop script (in the Application Server configuration) will be called on the host where the
failure was identified.
Note: When application monitoring is configured, the resource group online operation will call the
corresponding monitor to check whether the application is online already. So after the above monitor is
configured, the DB2 HADR primary database should not be activated (but the DB2 instance should be
started) before the PowerHA resource group online operation. Otherwise, the Application Server Start Script
needs to be manually executed after the resource group is online.
The DB2 HADR primary database failure will trigger failover for both DB2 HADR primary and SAP CS/ERS,
but the sample customer application monitor does not contain SAP CS and SAP ERS monitoring
functionality. So the SAP native monitoring and restart capabilities for SAP CS and ERS can be leveraged.
Customize the command option for the SAP message server, enqueue server, and enqueue replication
server to start with "Restart" in the instance profile (or start profile, see SAP Notes 1898687 and 768727) as
follows:
Restart_Program_00 = local $(_MS) pf=$(_PF)
12
Restart_Program_01 = local $(_EN) pf=$(_PF)

Restart_Program_00 = local $(_ER) pf=$(_PFL) NR=$(SCSID)
With this command option, SAP message server, enqueue server, and enqueue replication server will be
configured with self-monitoring and restart capabilities. Although we have SAP CS and ERS running on the
same host, if the enqueue server and enqueue replication server do not fail at the same time, the restart of
the enqueue server will not result in lost enqueue locks.
For the failover situation, SAP CS and ERS will be restarted on the standby node, so enqueue locks are lost
then. But since at the same time, open DB2 transactions are rolled back after failover, the loss of enqueue
locks is a desired behavior in a two-node cluster configuration. Otherwise, a manual enqueue lock cleanup
may be needed after failover.
3.3.3 Resources and Attributes for Sample Resource Group
Fig ur e 5: Sa mp l e R es our c es and At trib ut es f or R e so u rce G ro up s
After specifying startup, fallover and fallback policies in the "Add a Resource Group" step, configure the
defined virtual IPs (service IP labels/addresses), application servers, volume groups, and corresponding NFS
cross-mounting options as explained above in "Change/Show Resources and Attributes for a Resource
Group". Specifically, the SAP global file systems /sapmnt/<SID> and /usr/sap/trans can be handled in this
configuration step if you want to use the integrated HA NFS services from IBM PowerHA.
3.4 Recommended High Availability Test and Verification Scenarios

The following are test scenarios performed on the sample configuration. For different solution designs, the
test scenario actions and expected behavior may slightly vary, but we recommend these minimum test
scenarios for similar purposes:
No. Test Scenario Name Test Scenario Description Expected Behavior and Results
1 Complete system start Preconditions: Expected behavior:

The system stopped and the cluster The DB2 primary database, NFS
nodes are shut down. service, SAP CS, and ERS are
Actions:
13
Start up cluster nodes. online on cluster node 1 (primary

Start the DB2 instances (both) and node).
activate the DB2 HADR standby The DB2 standby database is online
(only). on cluster node 2 (secondary node).
Bring the PowerHA cluster service DB2 HADR could reach peer state in
and resource groups online. short time.
Start the SAP application server The SAP system can be accessed
instances and other monitoring normally.
services.
2 Complete system stop Preconditions: Expected behavior:

The SAP system works normally. The system stopped without
All instances are up and running. exception.
Actions: The cluster nodes are shut down
Stop the SAP application server without exception.
instances.
Bring the PowerHA cluster service
and resource groups offline.
Deactivate the DB2 HADR standby
database and stop the DB2
instance.
Shut down the PowerHA cluster
nodes.
3 Stop SAP and DB2 Preconditions: Expected behavior:

instances (excluding The SAP system works normally. SAP and DB2 instances are shut
PowerHA related All instances are up and running. down without exception.
services) Actions: Resource groups and NFS services
Suspend the PowerHA Customer (and corresponding mounts) are still
Application Monitor. online.
Stop the SAP application server
instances.
Deactivate the DB2 HADR primary
instance.
Deactivate the DB2 HADR standby
instance.
4 Start SAP and DB2 Preconditions: Expected behavior:

instances (excluding SAP and DB2 instances are offline. The DB2 primary database, NFS
PowerHA related Resource groups and NFS services service, SAP CS, and ERS are
services) (and corresponding mounts) are online on cluster node 1 (primary
online. node).
Actions: The DB2 standby database is online
Start the DB2 instance and activate on cluster node 2.
the DB2 HADR standby database. DB2 HADR could reach peer state in
Start the DB2 instance and activate short time.
the DB2 HADR primary database. The SAP system can be accessed
Start the SAP application server normally.
instances.
Resume the PowerHA Customer
Application Monitor.
5 Switchover (move) Preconditions: Expected behavior:

PowerHA resource group The SAP system works normally. The DB2 primary database, NFS
All instances are up and running. service, SAP CS, and ERS are
DB2 HADR is in peer state and the moved to cluster node 2 where the
standby database is running on DB2 HADR standby database was
node 2. running originally.
Actions:
14
Move the resource group to node 2 The SAP system can be accessed
where the DB2 HADR standby normally after switchover.
database is running. Follow-Up action:
Bring the DB2 standby database
manually online on cluster node 1
and make sure DB2 HADR reach
peer state again.
6 Fallback (move) PowerHA Preconditions: Expected behavior:

resource group The SAP system works normally. The DB2 primary database, NFS
All instances are up and running. service, SAP CS, and ERS are
DB2 HADR is in peer state and the moved back to cluster node 1.
standby database is running on The SAP system can be accessed
node 1. normally after fallback.
Actions: Follow-up action:
Move the resource group back to Bring the DB2 standby database
node 1 where the DB2 HADR manually online on cluster node 2
standby database is currently and make sure DB2 HADR reach
running. peer state again.
7 Service failure simulation: Preconditions: Expected behavior:

SAP CS exits abnormally The SAP system works normally. The killed processes are recovered
(on node 1) All instances are up and running. automatically.
SAP CS is running on node 1. No enqueue locks are lost.
Actions:
Find the PID for SAP message
server and enqueue server
processes using the ps command.
Terminate the corresponding
process using the kill -9 pid
command.

SAP ERS exits The SAP system works normally. The killed process is recovered
abnormally (on node 1) All instances are up and running. automatically.
SAP ERS is running on node 1. Enqueue replication resumes.
Actions:
Find the PID for SAP enqueue
replication server processes using
the ps command.
command.

SAP CS exits abnormally The SAP system works normally. The killed processes are recovered
(on node 2) All instances are up and running. automatically.
SAP CS is running on node 2. No enqueue locks are lost.
Actions:
Find the PID for SAP message
server and enqueue server
processes using the ps command.
command.

SAP ERS exits The SAP system works normally. The killed process is recovered
abnormally (on node 2) All instances are up and running. automatically.
SAP ERS is running on node 2. Enqueue replication resumes.
Actions:
15
Find the PID for SAP enqueue

replication server processes using
the ps command.
command.
11 Service failure simulation: Preconditions: Expected Behavior:

DB2 HADR primary exits The SAP system works normally. The DB2 primary database, NFS
abnormally All instances are up and running. service, SAP CS, and ERS fail over
DB2 HADR is in peer state. to the other cluster node where the
Actions: DB2 HADR standby database was
Find the PID for DB2 primary running originally.
instance processes using the ps The SAP system can be accessed
command. normally after the failover.
Terminate the corresponding Follow-up action:
process using the kill -9 pid Bring the DB2 standby database
command. manually online on the cluster node
where the DB2 primary was
originally running and make sure
DB2 HADR reach peer state again.

DB2 HADR standby exits The SAP system works normally. The SAP system can be accessed
abnormally All instances are up and running. normally after a while (determined by
DB2 HADR is in peer state. hadr_peer_window+hadr_timeout).
Actions: The HADR standby database is
Find the PID for DB2 standby offline.
instance processes using the ps Follow-up action:
command. Bring the DB2 standby database
Terminate the corresponding manually online on the cluster node
process using the kill -9 pid where the DB2 standby was
command. originally running and make sure
13 System failure simulation: Preconditions: Expected Behavior:

AIX/LPAR/Server exits The SAP system works normally. The DB2 primary database, NFS
abnormally (primary node) All instances are up and running. service, SAP CS, and ERS fail over
DB2 HADR is in peer state. to the other cluster node where the
Actions: DB2 HADR standby database was
Simulate a system crash on the running originally.
primary node as follows: The SAP system can be accessed
(1) halt -q normally after the failover.
(2) LPAR deactivated Follow-up action:
(3) power off server Start up the failing primary cluster
node
manually online on the failing cluster
node where DB2 primary was
14 System failure simulation: Preconditions: Expected Behavior:

AIX/LPAR/Server exits The SAP system works normally. The SAP system can be accessed
abnormally (standby All instances are up and running. normally after a while (determined by
node) DB2 HADR is in peer state. hadr_peer_window).
Actions: The HADR standby database is
Simulate a system crash on the offline.
standby node as follows: Follow-up action:
(1) halt -q Start up the failing standby cluster
(2) LPAR deactivated node
(3) power off server
16

manually online on the cluster node
where the DB2 standby was
17
4 FURTHER DISCUSSIONS ON SPECIFIC TOPICS
4.1 Virtual IP or Automatic Client Reroute

DB2 HADR provides the Automatic Client Reroute (ACR) feature that allows a client application to continue
(after reconnection) its work with minimal interruption after a DB2 HADR failover. Technically, there are two
methods to maintain the client connectivity of the SAP systems for a DB2 HADR failover or takeover:
Using a Virtual IP (VIP)
The primary database server always binds a Virtual IP to the network interface. After a
failover/switchover, the Virtual IP is moved to the network interface of the original standby database
server (the new primary database server).
Using Automatic Client Reroute (ACR)
The client is configured to know the two database servers. If the database client cannot connect to the
configured primary database server, the database client tries to connect to the configured standby
(alternate) server.
In SAP NetWeaver systems, there are several SAP work processes (for the ABAP stack) or threads (for the
JAVA stack). Each of them maintains its own database connection. So technically, each work process or
thread can connect to separate databases. This can be true in a split brain scenario if ACR is configured,
that is, some work processes/threads can reconnect to the new primary database server, while other work
processes/threads still connect to the old primary database server. The VIP method does not permit this
scenario since all work processes/threads are always connected to a database through VIP, which is
guaranteed to unique by cluster managers.
We strongly recommend to use VIP instead of ACR. For information about the limited support of ACR, see
SAP Note 1568539.
4.2 DB2 HADR State Flow, Potential Issue for Automatic Failover
After DB2 HADR starts, the standby database is in one of the following five states: local catchup, remote
catchup pending, remote catchup, peer, or disconnected peer. These states are determined by the log
shipping status. The following figure shows the DB2 HADR state change flow for the standby database:
Fig ur e 6: DB2 HADR s ta te c ha ng e fl o w f or st an dby da tab a se
For more information regarding the DB2 HADR state change flow of standby databases for your DB2
version, please refer to the corresponding IBM DB2 Knowledge Center, for example, for DB2 11.1 at
18
http://www.ibm.com/support/knowledgecenter/SSEPGG_11.1.0/com.ibm.db2.luw.admin.ha.doc/doc/c001174
8.html .
In different states, the allowed TAKEOVER HADR command options (with or without BY FORCE clause) and
the corresponding takeover behavior are different. Also depending on your DB2 version, the behavior of the
TAKEOVER HADR command for each possible state and option combination can slightly vary. For more
information, see the IBM Knowledge Center at
http://www.ibm.com/support/knowledgecenter/SSEPGG_11.1.0/com.ibm.db2.luw.admin.cmd.doc/doc/r00115
53.html.
For manual takeover, this is not a problem since you can verify the status and deliberately take the desired
actions. But to enable automated takeover/failover in unintended mode with the highest possibility, the BY
FORCE option should be enabled in the PowerHA cluster script when a normal TAKEOVER HADR
command is not allowed, and then subsequently you have to consider the data consistency issue for the
automatic takeover. Therefore, you have to introduce the PEER WINDOW ONLY option combined with the
BY FORCE clause.
With the PEER WINDOW ONLY option, you can ensure a greater data consistency. However, there are still
situations in which data loss can happen:
If the primary database remains active past the time when the peer window expires, and if the primary
database still has no connection to the standby database, the primary database moves out of
disconnected peer state and resumes processing transactions independently.
In NEARSYNC mode, if the standby database fails after acknowledging receipt of transaction logs
from the primary database but before writing that transaction log information to disk, then that
transaction log information, in the log receive buffer, might be lost.
To ensure data consistency and successful automatic failover, if the DB2 HADR is out of PEER state, you
should be alerted as soon as possible. The following script db2_hadr_peercheck.sh can be used to monitor
the DB2 HADR state:
#!/bin/ksh
HADRSTATE=$(su - db2dst -c "db2pd -db dst -hadr" | grep -p State | grep "[a-zA-Z]" | tail -1 | awk '{print $2}')
if [ "${HADRSTATE}" = "Peer" ]; then
rc=0
else
rc=50
fi
else
rc=100
fi
echo $rc
exit $rc
The second last line can be changed to echo $rc > some_output_file.
With the general monitoring tools, you can periodically check this output file for alerts. The return codes have
the following meanings:
0: DB2 HADR is in PEER state
50: DB2 HADR is not in PEER state, and the DB2 instance on the monitoring node is up
100: DB2 HADR is not in PEER state and the DB2 instance on the monitoring node is down.
The option PEER WINDOW ONLY has been introduced as of DB2 V9.5, so the solution described in this
paper can be applied to all DB2 versions starting from DB2 V9.5.
4.3 Alternative IBM PowerHA-based DB2 HADR Failover Automation Solutions
19
4.3.1 Solution with Custom Application Monitor but without Instance Failover Capability
The paper "Automating IBM DB2 UDB HADR with HACMP" (see chapter Related Content at the end)
introduces a solution to automate IBM DB2 HADR with IBM PowerHA. This automation works fine for the
following scenarios: (1) the node/server failure; (2) the primary DB2 instance failure and successful DB2
instance restart on the same node.
There are still potential cases that cannot be handled by the described configuration, for example, after the
primary DB2 instance failures, the triggered DB2 instance restarts also fail due to some reasons. This
mechanism is quite similar to what is described in section 3.3.2 for SAP CS and ERS instances.
4.3.2 Parallel Resource Groups with Monitoring for Both DB2 Primary and DB2 Standby Databases
Section 7.8 "DB2 HADR cluster solution" of the IBM document "IBM PowerHA SystemMirror Standard
Edition 7.1.1 for AIX Update" describes a solution to automate IBM DB2 HADR with IBM PowerHA with one
parallel resource group for DB2 instances and two resource groups for DB2 Primary and Standby databases
respectively. This solution provides additional functions to monitor the DB2 standby database instead of
monitoring the DB2 primary database only.
Fig ur e 7: Par al le l R e so urc e G ro up s w ith M onit ori ng fo r bo th DB2 Pr i mary an d DB2 St a ndby Da ta ba s es
The above figure shows:

The DB2 instance is defined with a parallel resource group with the following startup policy: Online On
All Available Nodes
The DB2 Primary database and DB2 Standby database are controlled by two separate resource
groups with corresponding application monitoring and an Online On Different Node (OODN)
dependency between them.
In this OODN dependency, the DB2 Primary database resource group has a higher priority, that is,
when both resource groups are online, the DB2 Primary database resource group can be moved to the
node where the DB2 Standby database is online.Then, the DB2 Standby database will be moved to
the other node.The DB2 Standby database resource group is not allowed to be online on the node
where the DB2 Primary database resource group is online.
The DB2 Primary database resource group is bound with one virtual IP.
4.3.3 Variations of the Solution Proposed in this Paper

In chapter 2 and 3, a solution with one resource group for a two-node cluster is proposed. There can be
several similar variations based on this, two of which are described in this section.
20
Fig ur e 8: C o mp on ent s d ep loy m e nt w ith t wo r es our c e gro up s f or a t wo- no de c lu st er
The above figure shows one variation of the solution proposal: You separate the SAP application resource
(SAP NFS, CS/ERS) from the DB2 HADR resource group into its own resource group. Then each resource
group can have its own independent failover/switchover capability. The Parent/Child resource group
dependency is added, so in case a DB2 HADR resource group (Parent) failover/switchover occurs, the SAP
application resource group (Child) will also be restarted. Otherwise, a manual cleanup of SAP enqueue
entries may be needed due to database transaction rollbacks.
Fig ur e 9: C o mp on ent s d ep loy m e nt w ith s e par at e Po w er HA c lu st ers fo r Appl i cat io n an d Dat ab a se
For better scalability and performance, SAP SPOF components and DB2 can be split into individual resource
groups for each resource as shown above. In this configuration, IBM PowerHA will perform the SAP CS and
ERS monitoring and automatic recovery. This means that the native restart capability of SAP CS and ERS as
described in section 3.3.2 should be disabled. To do so, you can either use IBM PowerHA Smart Assist for
SAP NetWeaver if you are using PowerHA 7.1 and higher, or refer to the IBM documentation
(https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Power+Systems/page/SAP+an
d+PowerHA) listed in chapter Related Content for previous HACMP/PowerHA releases. A manual cleanup of
SAP enqueue entries may be needed due to database transaction rollbacks.
21
4.4 Script Adjustments due to Version Changes

4.4.1 Adding "Peer Window Only" Option
In section 3.3, for the IBM PowerHA application server configuration, the standard default scripts have been
called. In the sample start script configuration (/db2/db2dst/db2_software/samples/hacmp/rc.hadr.start), the
default DB2 software installation directory in SAP environments is /db2/db2<dbsid>/db2_software as of SAP
Basis release 7.0 SR3.
The IBM-provided script does not contain the "PEER WINDOW ONLY" option. As previously mentioned, we
highly recommend that you add this option as of DB2 V9.5. You have to adjust line 142 of the script
rc.hadr.start. The original content of lines 138~142 are as follows:
elif [ $rc -eq 40 ]; then

# this instance is standby not peer
# Must takeover by force to bring up HADR in Peer state
logger -i -p notice -t $0 "NOTICE: Takeover by force issued"
su - ${instance_to_start?} -c "db2 takeover hadr on db ${DB2HADRDBNAME?} by force"
After the adjustment, it should look like this:
elif [ $rc -eq 40 ]; then

# this instance is standby not peer
# Must takeover by force to bring up HADR in Peer state
logger -i -p notice -t $0 "NOTICE: Takeover by force issued"
su - ${instance_to_start?} -c "db2 takeover hadr on db ${DB2HADRDBNAME?} by force peer window only"
If you are using the solution mentioned in section 4.3.2, you need to maintain the configuration file
sapha_SID.cfg. Most probably, the default cluster scripts from IBM SAP International Competence Center
(ISICC) have the following content in line 150:
PEER_WINDOW=""#peer window only" # change to empty may result in data loss!

Change it to
PEER_WINDOW="peer window only" # change to empty may result in data loss!
4.4.2 Mandatory Script Adjustments as of DB2 10.1

As of DB2 10.1, the db2pd output format has been changed, so the above discussed IBM PowerHA
application server configuration scripts, customer application monitoring, and DB2 HADR peer status check
scripts work fine up to and including DB2 9.7. As of DB2 10.1, the following changes should be implemented:
Adjust the default IBM PowerHA script rc_hadr_monitor

In the sample script configuration (/db2/db2dst/db2_software/samples/hacmp/rc.hadr.monitor), lines
104~153 originally are as follows:
# Use db2pd instead of SNAPSHOT if possible ...

su - ${instance_to_monitor} -c "db2pd -hadr -db ${DB2HADRDBNAME?}" \
| grep ${grepFlags?} Role | grep ${grepFlags?} State \
| grep "[a-zA-Z]" \
| tail -1 | awk '{print $1 "\n" $2}' > $temp_snap
if [ -r $temp_snap ]; then
hadr_role=$(head -1 $temp_snap)
hadr_state=$(tail -1 $temp_snap)
rm -f $temp_snap
fi
if [[ -z $hadr_role ]]; then

# Any problems with db2pd output, we'll resort to snapshot API
su - ${instance_to_monitor} -c "db2 get snapshot for all on ${DB2HADRDBNAME?}" \
22
| egrep "(^ [ ]+State[ ]+=[ ]+)|(^[ ]+Role[ ]+=[ ]+)" \

| sort | awk '{print $3}' > $temp_snap
rm -f $temp_snap
fi
fi
# hadr_role & hadr_state should now be set
if [ ! -z "$hadr_role" ]; then
if [ ! -z "$hadr_state" ]; then
if [ "$hadr_role" = "Primary" ]; then

if [ "$hadr_state" = "Peer" ]; then
rc=10
else
rc=20
fi
elif [ "$hadr_role" = "Standby" ]; then

if [ "$hadr_state" = "Peer" ]; then
rc=30
else
rc=40
fi
else
rc=100
fi
fi
fi
fi
Adjust the lines to look as follows:
# Use db2pd instead of SNAPSHOT if possible ...

#su - ${instance_to_monitor} -c "db2pd -hadr -db ${DB2HADRDBNAME?}" \
#| grep ${grepFlags?} Role | grep ${grepFlags?} State \
#| grep "[a-zA-Z]" \
#| tail -1 | awk '{print $1 "\n" $2}' > $temp_snap
# Adapt due to db2pd format change

| grep ${grepFlags?} HADR_ROLE \
| grep "[a-zA-Z]" \
| awk '{print $3}' > $temp_snap

| grep ${grepFlags?} HADR_STATE \
| grep "[a-zA-Z]" \
| awk '{print $3}' >> $temp_snap
23
rm -f $temp_snap
fi
#Upper Case for "Primary / Standby / Peer" to "PRIMARY / STANDBY / PEER"

if [ ! -z "$hadr_role" ]; then
if [ ! -z "$hadr_state" ]; then
if [ "$hadr_role" = "PRIMARY" ]; then

if [ "$hadr_state" = "PEER" ]; then
rc=10
else
rc=20
fi
elif [ "$hadr_role" = "STANDBY" ]; then

if [ "$hadr_state" = "PEER" ]; then
rc=30
else
rc=40
fi
else
rc=100
fi
fi
fi
fi
Adjust the Customer Application Monitor script

Adjust the script /hacmp/new_hadr/db2_hadr_monitor.sh as follows:
#!/bin/ksh
HADRROLE=$(su - db2dst -c "db2pd -db dst -hadr" | egrep 'HADR_ROLE' | awk '{print $3'} | tr "." " " \
| awk '{print $1}' | tail -1)
if [ "${HADRROLE}" = "PRIMARY" ]; then
rc=0
else
rc=50
fi
else
rc=100
fi
echo $rc
exit $rc
Adjust the DB2 HADR peer state check script

Adjust the script /hacmp/new_hadr/db2_hadr_peercheck.sh as follows:
#!/bin/ksh
HADRSTATE=$(su - db2dst -c "db2pd -db dst -hadr" | egrep 'HADR_STATE' | awk '{print $3'} | tr "." " " \
| awk '{print $1}' | tail -1)
if [ "${HADRSTATE}" = "PEER" ]; then
rc=0
24
else
rc=50
fi
else
rc=100
fi
echo $rc
exit $rc
25
5 SCRIPTS FOR THE SAMPLE IBM POWERHA CONFIGURATION
5.1 IBM PowerHA Scripts

In the sample configuration, the IBM PowerHA scripts listed below are placed in directory /hacmp/new_hadr.
See section 4.4.2. ("Mandatory Script Adjustments as of DB2 10.1") for information about version
dependencies.
No. Script Name Script Description Comment
1 db2_hadr_start.sh PowerHA Application Server Start Script Version dependent for called script
2 db2_hadr_stop.sh PowerHA Application Server Stop Script Version dependent for called script
3 db2_hadr_monitor.sh PowerHA Customer Application Monitor Version dependent

Monitor Method
4 db2_hadr_cleanup.sh PowerHA Customer Application Monitor NA

Cleanup Method
5 db2_hadr_restart.sh PowerHA Customer Application Monitor NA

Restart Method
6 db2_hadr_peercheck.sh DB2 HADR Peer state check Version dependent
7 r3startCS.sh Start SAP ASCS and ERS instances NA
8 r3startERS.sh Start SAP ERS instance NA
9 r3stopCS.sh Stop SAP ASCS and ERS instances NA
10 r3stopERS.sh Stop SAP ERS instance NA
Script db2_hadr_start.sh
#!/bin/ksh
sh /db2/db2dst/db2_software/samples/hacmp/rc.hadr.start db2dst db2dst dst verbose
Script db2_hadr_stop.sh
#!/bin/ksh
DB2PID=$(ps -ef | grep db2sysc | grep -v grep | awk '{print $2}')
if [ ! -z "${DB2PID}" ]; then
sh /db2/db2dst/db2_software/samples/hacmp/rc.hadr.stop db2dst db2dst dst verbose
fi
Script db2_hadr_monitor.sh
#!/bin/ksh
HADRROLE=$(su - db2dst -c "db2pd -db dst -hadr" | grep -p Role | grep "[a-zA-Z]" | tail -1 | awk '{print $1}')
if [ "${HADRROLE}" = "Primary" ]; then
rc=0
else
rc=50
fi
else
rc=100
26
fi
echo $rc
exit $rc
Script db2_hadr_cleanup.sh
#!/bin/ksh
su - db2dst -c "db2gcf -k"
Script db2_hadr_restart.sh
#!/bin/ksh
su - db2dst -c "db2gcf -u"
Script db2_hadr_peercheck.sh
#!/bin/ksh
HADRSTATE=$(su - db2dst -c "db2pd -db dst -hadr" | grep -p State | grep "[a-zA-Z]" | tail -1 | awk '{print $2}')
if [ "${HADRSTATE}" = "Peer" ]; then
rc=0
else
rc=50
fi
else
rc=100
fi
echo $rc
exit $rc
Script r3startCS.sh
#!/bin/ksh
######################################################################################
# Module: /hacmp/new_hadr/r3startCS.sh
#
# Description: Use to start ASCS and ERS.
# Programe Logic:
# 1) start ASCS on local node;
# 2) start ERS on local node.
######################################################################################
SAPSID=DST
SAPADM=dstadm
OUT=/tmp/hacmp.out
DB2ADM=db2dst
ABAPCS=ASCS00
ABAPCSNUM=00
ABAPERS=ERS10
ABAPERSNUM=10
SVCHOST=DST-ERP-SVC
PRIHOST=dsthost1
SECHOST=dsthost2
echo "==============================================================" | tee -a ${OUT}

echo "start of /hacmp/new_hadr/r3startCS.sh on " `hostname` `date` | tee -a ${OUT}
#Step 1) start ASCS on local node;

echo "r3startCS Step 1 start......" | tee -a ${OUT}
echo "Start ABAP ASCS Service on " `hostname` `date` | tee -a ${OUT}
27
ps -ef | grep en.sap | fgrep -v grep > /dev/null 2>&1

if [ $? != 0 ]
then
/usr/bin/su - ${SAPADM} -c "/sapmnt/${SAPSID}/exe/startsap r3 ${ABAPCS} ${SVCHOST}" | tee -a ${OUT}
else
echo "There are ABAP ASCS Service processes running on `hostname`" | tee -a ${OUT}
fi
echo "r3startCS Step 1 end......\n" | tee -a ${OUT}
#Step 2) start ERS on local node;

echo "r3startCS Step 2 start......" | tee -a ${OUT}
echo "Start ABAP ERS Service on " `hostname` `date` | tee -a ${OUT}
/hacmp/new_hadr/r3startERS.sh
echo "r3startCS Step 2 end......" | tee -a ${OUT}
echo "End of /hacmp/new_hadr/r3startCS.sh on " `hostname` `date` | tee -a ${OUT}

echo "==============================================================\n\n" | tee -a ${OUT}
Script r3startERS.sh
#!/bin/ksh
######################################################################################
#
# Description: Use to start ERS.
# Programe Logic: Start ERS instance on local node.
######################################################################################
SAPSID=DST
SAPADM=dstadm
OUT=/tmp/hacmp.out
ABAPERS=ERS10
ABAPERSNUM=10
echo "==============================================================" | tee -a ${OUT}

echo "start of /hacmp/new_hadr/r3startERS.sh on " `hostname` `date` | tee -a ${OUT}
# start ERS on local node;

echo "r3startERS start......" | tee -a ${OUT}
echo "Start ABAP ERS Service on " `hostname` `date` | tee -a ${OUT}
ps -ef | grep er.sap | fgrep -v grep > /dev/null 2>&1
if [ $? != 0 ]
then
/usr/bin/su - ${SAPADM} -c "/sapmnt/${SAPSID}/exe/startsap r3 ${ABAPERS}" | tee -a ${OUT}
else
echo "There are ABAP ERS Service processes running on `hostname`" | tee -a ${OUT}
fi
echo "r3startERS end......\n" | tee -a ${OUT}
echo "End of /hacmp/new_hadr/r3startERS.sh on " `hostname` `date` | tee -a ${OUT}

echo "==============================================================\n\n" | tee -a ${OUT}
Script r3stopCS.sh
#!/bin/ksh
######################################################################################
#
# Description: Use to stop ASCS on local node.
# Programe Logic:
# 1) stop ERS on local node;
28
# 1) stop ASCS on local node;

######################################################################################
SAPSID=DST
SAPADM=dstadm
OUT=/tmp/hacmp.out
DB2ADM=db2dst
ABAPCS=ASCS00
ABAPCSNUM=00
ABAPERS=ERS10
ABAPERSNUM=10
SVCHOST=DST-ERP-SVC
PRIHOST=dsthost1
SECHOST=dsthost2
echo "==============================================================" | tee -a ${OUT}

echo "start of /hacmp/new_hadr/r3stopCS.sh on " `hostname` `date` | tee -a ${OUT}
#Step 1) stop ERS on local node;

/hacmp/new_hadr/r3stopERS.sh
#Step 2) stop ASCS on local node;

echo "r3stopCS Step 2 start......" | tee -a ${OUT}
echo "Stop ABAP ASCS Service on " `hostname` `date` | tee -a ${OUT}
ps -ef | grep ${ABAPCS} | fgrep -v grep > /dev/null 2>&1
if [ $? = 0 ]
then
echo "Begin to stop ABAP ASCS Service...." | tee -a ${OUT}
/usr/bin/su - ${SAPADM} -c "/sapmnt/${SAPSID}/exe/stopsap r3 ${ABAPCS} ${SVCHOST}" | tee -a ${OUT}
else
echo "ABAP ASCS Service had been stopped..." | tee -a ${OUT}
fi
ps -ef | grep ${ABAPCS} | fgrep -v grep > /dev/null 2>&1
if [ $? = 0 ]
then
SAPSEPID=`ps -ef | grep ${ABAPCS} | fgrep -v grep | awk '{ print $2 }'`
for i in ${SAPSEPID}
do
kill -9 $i
done
fi
/usr/bin/su - ${SAPADM} -c "/sapmnt/${SAPSID}/exe/cleanipc ${ABAPCSNUM} remove" | tee -a ${OUT}
echo "r3stopCS Step 2 end......\n" | tee -a ${OUT}
echo "End of /hacmp/new_hadr/r3stopCS.sh on " `hostname` `date` | tee -a ${OUT}

echo "==============================================================\n\n" | tee -a ${OUT}
Script r3stopERS.sh
#!/bin/ksh
######################################################################################
# Module: /hacmp/new_hadr/r3stopCS.sh
#
# Description: Use to stop ERS.
# Programe Logic: stop ERS on local node;
######################################################################################
SAPSID=DST
SAPADM=dstadm
OUT=/tmp/hacmp.out
ABAPERS=ERS10
29
ABAPERSNUM=10
echo "==============================================================" | tee -a ${OUT}

echo "start of /hacmp/new_hadr/r3stopERS.sh on " `hostname` `date` | tee -a ${OUT}
# stop ERS on local node;

echo "r3stopERS start......" | tee -a ${OUT}
echo "Stop ABAP ERS Service on " `hostname` `date` | tee -a ${OUT}
ps -ef | grep ${ABAPERS} | fgrep -v grep > /dev/null 2>&1
if [ $? = 0 ]
then
echo "Begin to stop ABAP ERS Service...." | tee -a ${OUT}
/usr/bin/su - ${SAPADM} -c "/sapmnt/${SAPSID}/exe/stopsap r3 ${ABAPERS}" | tee -a ${OUT}
else
echo "ABAP ERS Service had been stopped..." | tee -a ${OUT}
fi
ps -ef | grep ${ABAPERS} | fgrep -v grep > /dev/null 2>&1
if [ $? = 0 ]
then
SAPSEPID=`ps -ef | grep ${ABAPERS} | fgrep -v grep | awk '{ print $2 }'`
for i in ${SAPSEPID}
do
kill -9 $i
done
fi
/usr/bin/su - ${SAPADM} -c "/sapmnt/${SAPSID}/exe/cleanipc ${ABAPERSNUM} remove" | tee -a ${OUT}
echo "r3stopERS end......\n" | tee -a ${OUT}
echo "End of /hacmp/new_hadr/r3stopERS.sh on " `hostname` `date` | tee -a ${OUT}

echo "==============================================================\n\n" | tee -a ${OUT}
5.2 SAP CS and ERS Instance or Start Profiles

As mentioned in section 3.3.2, to enable SAP native monitoring and restart capabilities for SAP CS and ERS
instances, the corresponding instance profiles (or start profiles) can be set as follows:
SAP CS instance profile

SAPSYSTEMNAME = DST
SAPSYSTEM = 00
INSTANCE_NAME = ASCS00
DIR_CT_RUN = $(DIR_EXE_ROOT)/run
DIR_EXECUTABLE = $(DIR_INSTANCE)/exe
SAPLOCALHOST = DST-ERP-SVC
DIR_PROFILE = $(DIR_INSTALL)/profile
_PF = $(DIR_PROFILE)/DST_ASCS00_DST-ERP-SVC
SETENV_00 = DIR_LIBRARY=$(DIR_LIBRARY)
SETENV_01 = LD_LIBRARY_PATH=$(DIR_LIBRARY):%(LD_LIBRARY_PATH)
SETENV_02 = SHLIB_PATH=$(DIR_LIBRARY):%(SHLIB_PATH)
SETENV_03 = LIBPATH=$(DIR_LIBRARY):%(LIBPATH)
SETENV_04 = PATH=$(DIR_EXECUTABLE):%(PATH)
#-----------------------------------------------------------------------
# Copy SAP Executables
#-----------------------------------------------------------------------
_CPARG0 = list:$(DIR_CT_RUN)/scs.lst
Execute_00 = immediate $(DIR_CT_RUN)/sapcpe$(FT_EXE) pf=$(_PF) $(_CPARG0)
_CPARG1 = list:$(DIR_CT_RUN)/sapcrypto.lst
#-----------------------------------------------------------------------
# Start SAP message server
30
#-----------------------------------------------------------------------
_MS = ms.sap$(SAPSYSTEMNAME)_$(INSTANCE_NAME)
Execute_02 = local rm -f $(_MS)
Execute_03 = local ln -s -f $(DIR_EXECUTABLE)/msg_server$(FT_EXE) $(_MS)
Restart_Program_00 = local $(_MS) pf=$(_PF)
#-----------------------------------------------------------------------
# Start SAP enqueue server
#-----------------------------------------------------------------------
_EN = en.sap$(SAPSYSTEMNAME)_$(INSTANCE_NAME)
Execute_04 = local rm -f $(_EN)
Execute_05 = local ln -s -f $(DIR_EXECUTABLE)/enserver$(FT_EXE) $(_EN)
Restart_Program_01 = local $(_EN) pf=$(_PF)
#-----------------------------------------------------------------------
# SAP Message Server parameters are set in the DEFAULT.PFL
#-----------------------------------------------------------------------
ms/standalone = 1
ms/server_port_0 = PROT=HTTP,PORT=81$$
#-----------------------------------------------------------------------
# SAP Enqueue Server
#-----------------------------------------------------------------------
enque/table_size = 64000
enque/snapshot_pck_ids = 1600
enque/server/max_query_requests = 5000
enque/server/max_requests = 5000
enque/async_req_max = 5000
enque/server/threadcount = 4
rdisp/enqname = $(rdisp/myname)
ssl/ssl_lib = $(DIR_EXECUTABLE)$(DIR_SEP)$(FT_DLL_PREFIX)sapcrypto$(FT_DLL)
sec/libsapsecu = $(ssl/ssl_lib)
ssf/ssfapi_lib = $(ssl/ssl_lib)
SETENV_05 = SECUDIR=$(DIR_INSTANCE)/sec
enque/server/replication = true
SAP ERS instance profile on dsthost1

SAPSYSTEMNAME = DST
SAPSYSTEM = 10
INSTANCE_NAME = ERS10
DIR_PROFILE = $(DIR_INSTANCE)/profile
_PF = $(DIR_PROFILE_SYSTEM)/DST_ERS10_dsthost1
SETENV_05 = DB2_CLI_DRIVER_INSTALL_PATH=$(DIR_EXECUTABLE)/db6_clidriver
#-----------------------------------------------------------------------
#-----------------------------------------------------------------------
_CPARG0 = list:$(_PF).lst
_CPARG1 = source:$(DIR_PROFILE_SYSTEM)
_CPARG2 = target:$(DIR_PROFILE)
Execute_00 = immediate $(DIR_CT_RUN)/sapcpe$(FT_EXE) pf=$(_PF) $(_CPARG0) $(_CPARG1)
$(_CPARG2)
31
DIR_PROFILE_SYSTEM = $(DIR_INSTALL)/profile
_PFL = $(DIR_PROFILE)/DST_ERS10_dsthost1
#-----------------------------------------------------------------------
# Settings for enqueue monitoring tools (enqt, ensmon)
#-----------------------------------------------------------------------
enque/process_location = REMOTESA
#-----------------------------------------------------------------------
# standalone enqueue details from (A)SCS instance
#-----------------------------------------------------------------------
SCSID = 00
SCSHOST = DST-ERP-SVC
enque/serverinst = $(SCSID)
enque/serverhost = $(SCSHOST)
Autostart = 0
enque/enrep/hafunc_implementation = script
#-----------------------------------------------------------------------
# Start enqueue replication server
#-----------------------------------------------------------------------
_ER = er.sap$(SAPSYSTEMNAME)_$(INSTANCE_NAME)
Execute_03 = local rm -f $(_ER)
Execute_04 = local ln -s -f $(DIR_EXECUTABLE)/enrepserver$(FT_EXE) $(_ER)
SAP ERS instance profile on dsthost2

SAPSYSTEMNAME = DST
SAPSYSTEM = 10
INSTANCE_NAME = ERS10
DIR_PROFILE = $(DIR_INSTANCE)/profile
_PF = $(DIR_PROFILE_SYSTEM)/DST_ERS10_dsthost2
SETENV_05 = DB2_CLI_DRIVER_INSTALL_PATH=$(DIR_EXECUTABLE)/db6_clidriver
#-----------------------------------------------------------------------
#-----------------------------------------------------------------------
_CPARG0 = list:$(_PF).lst
_CPARG1 = source:$(DIR_PROFILE_SYSTEM)
_CPARG2 = target:$(DIR_PROFILE)
Execute_00 = immediate $(DIR_CT_RUN)/sapcpe$(FT_EXE) pf=$(_PF) $(_CPARG0) $(_CPARG1)
$(_CPARG2)
32
DIR_PROFILE_SYSTEM = $(DIR_INSTALL)/profile
_PFL = $(DIR_PROFILE)/DST_ERS10_dsthost2
#-----------------------------------------------------------------------
# Settings for enqueue monitoring tools (enqt, ensmon)
#-----------------------------------------------------------------------
enque/process_location = REMOTESA
#-----------------------------------------------------------------------
# standalone enqueue details from (A)SCS instance
#-----------------------------------------------------------------------
SCSID = 00
SCSHOST = DST-ERP-SVC
enque/serverinst = $(SCSID)
enque/serverhost = $(SCSHOST)
Autostart = 0
enque/enrep/hafunc_implementation = script
#-----------------------------------------------------------------------
# Start enqueue replication server
#-----------------------------------------------------------------------
_ER = er.sap$(SAPSYSTEMNAME)_$(INSTANCE_NAME)
Execute_03 = local rm -f $(_ER)
Execute_04 = local ln -s -f $(DIR_EXECUTABLE)/enrepserver$(FT_EXE) $(_ER)
33
6 RELATED CONTENT
SAP Note 101809 - DB6: Supported Versions and Fix Pack Levels
SAP Note 1555903 - DB6: Supported DB2 Database Features
SAP Note 1612105 - DB6: FAQ for DB2 High Availability Disaster Recovery (HADR)
SAP Note 1568539 - DB6: HADR - Virtual IP or Automatic Client Reroute
SAP Note 60843 - DB6: High Availability for DB2 using SA MP
SAP Note 1154013 - DB6: DB problems in HADR environment
SAP Note 1168456 - DB6: Support Process and End of Support Dates for IBM DB2 LUW
SAP Note 1260217 - DB6: Software Components Contained in DB2 License from SAP
SAP Note 1365982 - DB6: Current "db6_update_db/db6_update_client" script
SAP Note 1898687 - Merge start profile with instance profile (Linux/Unix OS)
SAP Note 768727 - Automatic restart functions in sapstart for processes
Best Practices - DB2 High Availability Disaster Recovery
High Availability and Disaster Recovery Options for DB2 for Linux, UNIX, and Windows
Enabling Database High Availability Using DB2 HADR and IBM Tivoli SA MP in an SAP Environment
Automating IBM DB2 UDB HADR with HACMP
IBM PowerHA SystemMirror Standard Edition 7.1.1 for AIX Update, Section 7.8 DB2 HADR cluster solution
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Power+Systems/page/SAP+an
d+PowerHA
http://www.ibm.com/developerworks/aix/library/au-powerhaintro/
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/DB2HADR
34
www.sap.com/contactsap
2017 SAP SE or an SAP affiliate company. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP SE or an SAP affiliate company.
The information contained herein may be changed without prior notice. Some software products marketed by SAP SE and its distributors contain proprietary software components of other software vendors.
National product specifications may vary.
These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP or its affiliated companies shall not be liable
for errors or omissions with respect to the materials. The only warranties for SAP or SAP affiliate company products and services are those that are set forth in the express warranty statements
accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty.
In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or any related presentation, or to develop or release any functionality
mentioned therein. This document, or any related presentation, and SAP SEs or its affiliated companies strategy and possible future developments, products, and/or platform directions and functionality are
all subject to change and may be changed by SAP SE or its affiliated companies at any time for any reason without notice. The information in this document is not a commitment, promise, or legal obligation
to deliver any material, code, or functionality. All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ materially from expectations. Readers are
cautioned not to place undue reliance on these forward-looking statements, and they should not be relied upon in making purchasing decisions.
SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE (or an SAP affiliate company) in Germany and other
countries. All other product and service names mentioned are the trademarks of their respective companies. See http://www.sap.com/corporate-en/legal/copyright/index.epx for additional trademark
information and notices.

HADR1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

HADR1

Uploaded by

Copyright:

Available Formats

Enabling High Availability Using DB2 HADR and

IBM PowerHA in an SAP Environment

Author: Tao (Stephen) Sun

Version 1.0 (January 2017) Initial version

1.1 High Availability for SAP NetWeaver

For each exists the following:

1.2 DB2 HADR

1.3 IBM PowerHA

2.1 System Architecture

Fig ur e 1: I BM Po w er HA Sy st e m T o po logy wit h R ed un d an cy

Fig ur e 2: C o mp on ent s De pl oy m en t wi thi n th e I BM Po w er HA Cl ust er

2.2 Sample Test System Configuration

3 IMPLEMENTATION OF THE TWO-NODE POWERHA CLUSTER WITH DB2 HADR IN AN SAP

3.1 General Setup Procedures for Systems with DB2 HADR

3.1.2 Create a DB2 HADR Standby Database

Create the DB2 instance on the standby server

Maintain the database ports in the /etc/services file

<SID>_HADR_2 5952/tcp # DB2 HADR log shipping

3.1.3 Configure Database to Enable DB2 HADR

3.2 General IBM PowerHA Cluster Configuration Steps

Add Cluster Nodes

smitty hacmp Extended Configuration Extended Topology Configuration Configure HACMP

Add Communication Interfaces and Devices

Add Service (virtual) IPs

Add Application Servers

Add Application Monitors

Add Resource Groups

Change Resource Groups

Verify and Synchronize Cluster

Start and Stop Cluster Services

3.3.1 Application Server Configuration

Fig ur e 3: Sa mp l e Po we r HA App li c ati on S erv er a pp _e c cd b C onf ig ura ti on

The Start Script /hacmp/new_hadr/db2_hadr_start.sh contains:

The Stop Script /hacmp/new_hadr/db2_hadr_stop.sh contains:

3.3.2 Custom Application Monitor

Fig ur e 4: Sa mp l e Po we r HA Cu sto m Ap pli c ati on M onit or ec cd b _ AM for App li ca ti on Se rv er ap p _e c cd b

The DB2 monitor script /hacmp/new_hadr/db2_hadr_monitor.sh contains:

The cleanup script /hacmp/new_hadr/db2_hadr_cleanup.sh contains:

The restart script /hacmp/new_hadr/db2_hadr_restart.sh contains:

Restart_Program_01 = local $(_EN) pf=$(_PF)

3.3.3 Resources and Attributes for Sample Resource Group

Fig ur e 5: Sa mp l e R es our c es and At trib ut es f or R e so u rce G ro up s

3.4 Recommended High Availability Test and Verification Scenarios

1 Complete system start Preconditions: Expected behavior:

Start up cluster nodes. online on cluster node 1 (primary

2 Complete system stop Preconditions: Expected behavior:

3 Stop SAP and DB2 Preconditions: Expected behavior:

4 Start SAP and DB2 Preconditions: Expected behavior:

5 Switchover (move) Preconditions: Expected behavior:

6 Fallback (move) PowerHA Preconditions: Expected behavior:

7 Service failure simulation: Preconditions: Expected behavior:

8 Service failure simulation: Preconditions: Expected behavior:

9 Service failure simulation: Preconditions: Expected behavior:

10 Service failure simulation: Preconditions: Expected behavior:

Find the PID for SAP enqueue

11 Service failure simulation: Preconditions: Expected Behavior:

12 Service failure simulation: Preconditions: Expected behavior:

13 System failure simulation: Preconditions: Expected Behavior:

14 System failure simulation: Preconditions: Expected Behavior:

Bring the DB2 standby database

4 FURTHER DISCUSSIONS ON SPECIFIC TOPICS

4.1 Virtual IP or Automatic Client Reroute

Fig ur e 6: DB2 HADR s ta te c ha ng e fl o w f or st an dby da tab a se

4.3 Alternative IBM PowerHA-based DB2 HADR Failover Automation Solutions