Professional Documents
Culture Documents
http://www.redbooks.ibm.com
SG24-5343-00
IBM
SG24-5343-00
International Technical Support Organization
October 1999
Take Note!
Before using this information and the product it supports, be sure to read the general information in
Appendix C, “Special Notices” on page 97.
This edition applies to SAP R/3 on DB2 for OS/390, SAP R/3 Release 4.0B, OS/390 Release 2.6 (5645-001), AIX
Release 4.3.1 (5765-603), and IBM DATABASE 2 Server for OS/390 Version 5.1 (DB2 for OS/390) Release 4.1
(5655-DB2), and to all subsequent releases and modifications until otherwise indicated in new editions or
Technical Newsletters.
When you send information to IBM, you grant IBM a non-exclusive right to use or distribute the information in any
way it believes appropriate without incurring any obligation to you.
Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
The Team That Wrote This Redbook . . . . . . . . . . . . . . . . . . . . . . . . . xi
Comments Welcome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
Contents v
vi SAP R/3 on DB2 for OS/390: Disaster Recovery
Figures
This redbook will help you plan and install a disaster recovery solution for SAP
R/3 on DB2 for OS/390. It is one in a series of redbooks about the SAP R/3 on
DB2 for OS/390 solution and the associated environment.
Many companies are migrating their critical business applications from the
legacy mainframe environment to SAP R/3. As these companies depend on
information and processes managed by SAP R/3, the availability of the SAP R/3
system, even in the case of a disaster, is essential.
Thanks to the following people for their invaluable contributions to this project:
IBM Poughkeepsie
Mary Ellen Cowles
Mary Ann Ritosa
IBM Germany
Namik Hrle
Andreas Maier
Comments Welcome
Your comments are important to us!
This chapter provides a general description of SAP R/3 and an overview of the
architecture of that system. The specific solution SAP R/3 on DB2 for OS/390 is
then explained in terms of the SAP R/3 architecture. Since DB2 for OS/390 is
such an important part of this solution, features of DB2 for OS/390 are introduced
as a foundation for later chapters. For more details about SAP R/3 on DB2 for
OS/390, see Implementing SAP R/3 in an OS/390 Environment Using AIX and
Windows NT Application Servers, SG24-4945, SAP R/3 on DB2 for OS/390:
Planning Guide SAP R/3 Release 4.0B, SC33-7962, and SAP R/3 on DB2 for
OS/390: Connectivity Guide, SC33-7965.
SAP R/3 customers can use SAP-supplied utilities to add more machines for
application and presentation services to the existing SAP R/3 system. Thus, SAP
R/3 can support centralized or decentralized computing with its distributed
client/server architecture.
These applications are contained within one database, but not all are used by
any one company. Some companies use only one of the applications; others
have several SAP R/3 systems for different applications. It is obvious that the
way an SAP R/3 system is managed depends on the criticality of the applications
used by that company.
In the Industry Solutions (IS) area, SAP AG makes a broad range of specific
industry solutions available. These are based on standard SAP R/3 applications.
To see SAP AG offerings for industries, consult the SAP Web site at:
http://www.sap-ag.de
This should lead to a page for “Industry Solutions” currently found at:
http://www.sap-ag.de/products/industry/index.htm
For any specific application package, SAP AG should be consulted regarding
availability on a specific hardware or operating system configuration.
An SAP/R3 LUW starts when the transaction starts. As the transaction changes
data, all updates are consolidated through VB Protocol entries in a database
The execution of the transaction now moves to the update task, which processes
all of the entries in VBLOG (or the VB tables) for this SAP LUW. All of the
changes to all of the databases modified by this SAP LUW are made at this time,
in a single DB2 unit of work. The entries in the VBLOG (or the VB tables) are
deleted in this same unit of work.
After a failure, DB2 will recover the database to a consistent state, rolling back
those units of work that were “in flight” at the time of failure, and committing
those units of work that had completed. Note that this database state, even
though consistent from the DB2 point of view, may have incomplete SAP LUWs.
The application server host must be restarted to process the VBLOG data,
backing out changes for business transactions that are not complete (that is,
have not executed a COMMIT WORK).
Through its support of UNIX interfaces via the OS/390 UNIX System Services,
OS/390 becomes a database server for SAP R/3, allowing you to profit from
client/server technology benefits such as distributed processing and extensive
scalability. SAP R/3 application programs and user data, including data and
process models, are stored on the database server. SAP R/3 uses DB2 for
OS/390 as the database server, which can manage large amounts of data on
behalf of many users.
IBM adds a OS/390 UNIX System Services program, called Integrated Call Level
Interface (ICLI), which passes DB2 data to the SAP R/3 Database Interface
(DBIF). Communication services support TCP/IP for general access to the
internet.
The strengths that OS/390 and System/390 bring to the SAP R/3 environment
include:
• Reliability, availability, and serviceability
SAP R/3 customers need continuous data availability and integrity. OS/390
reliability and availability is unsurpassed and it has a history of unmatched
security and integrity. SAP R/3 benefits from these underlying
characteristics.
• Scalability
DB2 UDB for OS/390 Version 6 has become available since this redbook was
begun. This version has several features that particularly apply to
backup/restore processing, such as parallelism in the COPY and RECOVER
utilities, fast log apply, and parallel index build. We discuss those features in
Chapter 4, “Backup/Recovery Considerations in Disaster Recovery” on page 45
and Chapter 5, “Restarting from Remote Locations” on page 61 when the
features are particularly important in the steps we have recommended. You
should be aware, however, that the practical experience and work that was the
basis for this book was done with DB2 for OS/390 V5.
Figure 4 on page 9 shows important components for SAP R/3 on DB2 for
OS/390:
DBIF The Database Interface (DBIF) of SAP R/3 has been modified to
support DB2 for OS/390. The DBIF resides on the application server
and is responsible for acceptinq SQL statements from the
applications. DBIF then forwards the SQL to the Database Service
Layer (DBSL).
DBSL The Database Service Layer (DBSL) of SAP R/3 has the responsibility
of adapting SQL to the specific requirements of a DBMS (in this case,
DB2 for OS/390). Additionally, the DBSL forwards the adapted SQL to
the appropriate communication software (in this case, ICLI).
ICLI For communication with the DB2 for OS/390 database service, the
DBSL uses a component called the Integrated Call Level Interface
(ICLI). ICLI consists of a client and server component, which allows
AIX and Windows NT application servers to access an OS/390
database server remotely across a network. The DBSL uses only a
subset of database functions and ICLI delivers exactly that subset.
The server component of ICLI is a program based on OS/390 UNIX
System Services; the client component is a Keep Alive executable
along with a program that either resides in an AIX shared library
(ibmiclic.o) or a Windows NT dynamic link library. The ICLI
components are provided as a part of OS/390 UNIX System Services;
users should consult SAP notes to determine the ICLI level (and IBM
service identifiers) they require. Figure 5 on page 11 shows how an
ICLI connection between application server and OS/390 database
server is established.
The most advanced connectivity products now available are Gigabit Ethernet and
the software/hardware products provided in OSA Express.
Note
All information about FDDI LANs in this chapter also applies to Gigabit
Ethernet (GbE) LANs.
The ESCON channel option is now available for a connection between OS/390
and AIX or OS/390 and Windows NT. Figure 6 shows examples of possible
physical connections.
Both High-Speed UDP and Enhanced ESCON are UDP-based protocols. Neither
is recommended now that TCP/IP has become available.
1.6.2.1 TCP/IP
With TCP/IP (Transmission Control Protocol/Internet Protocol) support, the
flexibility of SAP R/3 configurations is considerably improved. You can use any
network supported by IBM TCP/IP that meets the SAP R/3 speed requirements.
This allows you to use the same protocol for channel connections, FDDI, GbE, or
other connectivity hardware. Thus you may have less expensive secondary
connections (such as using FDDI as a backup for a channel connection) without
changing the controlling software or protocol.
TCP/IP can be used over an ESCON channel (if an ESCON channel adapter is
installed in the application server). TCP/IP can also be used over GbE LANs or
FDDI LANs through a OSA-2 adapter on S/390 (if the appropriate LAN adapter is
installed on the application server). No special device drivers are required,
however when you install the LAN adapter or the channel adapter you may be
instructed to install a supplied device driver rather than one of those on your AIX
installation medium.
TCP/IP also has expandability advantages over the other two protocols. As new
devices, architectures, and features are created, TCP/IP contains the required
support earlier that specialized software.
Note that while High-Speed UDP is based on UDP, you will still have to configure
the normal full-function TCP/IP stack for it to work. TCP/IP is necessary during
the installation process for SAP R/3 since FTP is used. During normal operation
of the SAP R/3 system, TCP/IP is used for performance monitoring and CCMS
remote job submission. The logical file system support in OS/390 UNIX System
Services allows several different back-end stacks to coexist. The OS/390 system
transparently directs standard inbound and outbound data to the correct
This chapter discusses disaster recovery planning in general. Not all topics
mentioned here will be covered in the rest of the manual.
It may be that reduced levels of service are acceptable, for some or all of the
critical applications, in the event of a disaster. Service levels for disaster mode
need to be negotiated, defined and then formally documented in a service level
agreement. Statements should be included to cover aspects such as recovery
targets, availability, capacity, and performance.
In some of the above situations, the IT equipment may still be intact and usable,
but simply inaccessible. With preplanning, you may be able to run the data
center from a remote location for a short period of time.
Recent statistics on the most common types of disasters that occur show that
hardware failures and natural disasters such as hurricanes are the most
common causes of disasters (see Figure 7).
It stands to reason that as more preventive measures are put in place, the less
chance there is of a situation resulting in a disaster to an organization.
However, no matter how sophisticated these preventive measures are, there will
always be some risk of an outage. You reduce the risk of disasters as you
increase the cost of preventing such disasters.
Certain measures should be taken to help prevent disasters from affecting your
organization or to minimize their impact when they are unavoidable. The best
way to determine which measures your organization should include is to conduct
a risk analysis to determine where the organization′s major vulnerabilities are.
A risk analysis identifies major threats to customers as they relate to the target
city site and the probability of those threats.
The minimum preventive measures are good on-site recovery procedures to help
avoid having routine problems escalate into a disaster. Examples are spare
hardware, regular backups, and a skilled operations staff. In addition, you may
In order to develop a disaster recovery plan, the following steps are needed:
This book focuses on disaster recovery technology and therefore covers the
technical aspects of the recovery strategy (step 3). It assumes that the business
requirements and their associated IT requirements have been defined, and it
does not cover testing in detail.
The overall aim is to develop a disaster recovery solution that exactly matches
the requirements of the business. However, it is important to remember that
some compromises may be needed, as requirements may conflict. The four
main factors that need to be traded off in any solution are:
• Type of disasters that need to be covered
• Amount of data that can be lost
• Speed of recovery
• Overall costs
(See Figure 9 on page 20). Throughout the design process, you need to be
aware of these factors and balance them to develop a solution that best meets
the business needs at an acceptable cost. This trade-off triangle is discussed in
more detail in Fire in the Computer Room - What Now?, SG24-4211.
The scope influences all of the other major design elements such as the location
of the recovery site, the ownership of the recovery site, the configuration of the
recovery facilities, and the processes for maintaining or recovering data at the
recovery site.
The backup and recovery methodology must be married with the business
objectives. The business requirement drives the solution. For example, a
requirement to have service reestablished within three hours dictates different
backup and recovery options from those possible when the application must be
operational within 24 hours. The methods used to back up the data, the way it is
transported and stored off-site, and the techniques for recovering the data, will
There are many different processes for backing up and recovering data and
there are several hardware/software products and features that support the
different processes. All processes and products have their relative merits, and
most organizations use a combination of approaches to cover their critical data.
There are certain key factors that influence which is the best option. Among
these factors are:
• Type and amount of data
• Frequency of backups
• Speed of recovery
• Level of currency after recovery
System data consists largely of the system software that is normally purchased
from vendors and tailored. It is typically isolated on specific disk volumes and is
changed only when service is applied or new releases are installed.
Application data includes all of the data that is needed to run the applications. It
is the most volatile, the most valuable, and the most challenging to recreate.
This data is typically spread across numerous volumes; it is common to find data
from multiple applications residing on the same volume, and sometimes also
co-residing with noncritical data. When considering backup and recovery
options, it may be necessary to subdivide this data into “DBMS managed data”
and “non-DBMS managed data.” The speed and currency targets are usually
most stringent for application data, and may also be different for individual
applications. A variety of processes may need to be employed, as frequent
backups will be required to enable fast recovery and minimal data loss.
In an SAP R/3 on DB2 for OS/390 environment, system data consists of the
OS/390, AIX and/or Windows NT environment. The infrastructure data consists of
the DB2 for OS/390 system and the SAP R/3 executables on the application
servers; the application data consists of the tables.
If there are dependencies, then backup and recovery of this data must be
coordinated, to ensure that data integrity is maintained. In an SAP R/3 on DB2
for OS/390 environment, that means that the backups of all components must be
at the same level; for example, the backup of the application servers must have
the same release as the SAP R/3 data in the SAP R/3 objects stored in the
database.
For DB2 for OS/390, the data contained in tables used by SAP/R3 must also be
the of the same level as the SAP/R3 objects.
If volume dumps are used, the procedure for point-in-time recovery in an SAP
R/3 on DB2 for OS/390 environment is described in Database Administration
Experiences: SAP R/3 on DB2 for OS/390, SG24-2078. The procedure requires
that DB2 for OS/390 be stopped, so that data is consistent. Information about
stopping DB2 for OS/390 is found in 4.4.1.1, “Establishing a Point of Consistency”
on page 51.
Concurrent Copies: Point-in-time copies can also be created with little impact
to applications. The combination of DFSMS/MVS and the 3990 Storage
Controllers support a concurrent copy function, which can be used for all data. It
can be invoked by specifying the keyword CONCURRENT on the DB2 for OS/390
COPY utility. A very small outage (seconds) occurs while the extents for all the
named table spaces are marked. Then DB2 starts the table spaces for normal
read/write access as the data sets are dumped. Tracking and registration of the
copies occurs through updates to a DB2 system catalog table.
Virtual concurrent copy (or “SnapShot”) routines are also invoked by the
CONCURRENT keyword of the DB2 for OS/390 COPY utility. A very small outage
(seconds) occurs while the data sets are “snapped” to certain work data sets.
Then DB2 for OS/390 starts the data sets for normal read/write access as the
data sets are dumped. Tracking and registration of the copies occur through
updates to a DB2 for OS/390 system catalog table.
The minimum amount of data loss that could occur is that which resides on the
last active log at the local site.
RRDF: For DB2 for OS/390 it is possible to maintain a real-time copy of log data
at a remote site. As log blocks are written at the prime site, they are also
transmitted to the recovery site and vaulted. The data loss for DBMS data can
be almost eliminated. One product that performs this function is Remote
Recovery Data Facility (RRDF), distributed by E-Net Software. Once at the
recovery site, the log records are normally stored as RRDF archives until they
are needed, when they are converted to DB2 archive logs. More information
about RRDF may be obtained at:
http://www.ubiquity.com.au/content/production/suppliers/enet/rrdf01.htm
The most appropriate option will depend on the required recovery speed, data
currency and the cost. With all electronic transfer options, there are
performance and distance considerations.
Information about PPRC and XRC can be found in Remote Copy Administrator′ s
Guide and Reference, SC35-0169. This redbook also contains considerations for
choosing between PPRC and XRC. Planning considerations for both PPRC and
XRC can be found in Planning for IBM Remote Copy, SG24-2595. PPRC and XRC
operations with the 2105 Enterprise Storage Server are explained in
Implementing the Enterprise Storage Server in Your Environment, SG24-5420.
If your recovery site is owned by a third party, the agreement with the third-party
supplier must include the level of readiness.
System startup and shutdown activities and all system console interactions can
now be largely automated and handled remotely. Automated Tape Libraries can
be used to almost totally eliminate manual tape handling. There will, however,
always be manual activities that need to be performed. You must define what
skills and how many people are needed to manage the recovery site.
When designing and implementing a remote recovery site, there are three basic
options for managing and operating it:
• Operations personnel at both sites
With operations personnel at both sites, recovery is usually easier and
faster, but this will typically need more people and therefore will be more
costly.
• Recovery site unmanned and operated remotely
The recovery site could be remotely operated, on a day-to-day basis, from
the prime site. In the event of a disaster, operations personnel would have
to be transported to the recovery center or a backup operations command
center. The travel time may be on the critical path of the recovery.
As the distance between the prime site and the recovery site increases, so does
the protection against environmental disasters.
If, however, very fast recovery or minimal data loss or both are required, then
the distance between the prime site and the recovery site may need to be short.
There are technical distance limitations associated with some data backup and
recovery methodologies that support advanced data readiness solutions. The
limits frequently change as new hardware solutions become available. At the
time this book was published, the PPRC limitation was 43 km with ESCON
connections, 100 km with FICON connections. For more information on PPRC
consult Planning for IBM Remote Copy, SG24-2595.
If the disaster recovery plan involves having personnel working at the recovery
site, then there are special considerations. First, the time taken to assemble and
If the business depends on physical interaction between the central IT site and
other locations, this also is a recovery consideration. One such interaction is
print output distribution.
Sufficient processing power, storage, and disk capacity must be provided, but the
design must also cover all other hardware and software components that are
required to run the critical applications. Special consideration must be given to
dependencies on specific device types and features, microprogram levels,
software levels, and so on. The same basic approaches that are used for
planning, designing, and managing the prime site configuration should be used
for the recovery configuration. In particular, allowance should be made for
future growth of the business applications when planning the capacity and
performance aspects of the configuration.
As SAP R/3 on DB2 for OS/390 is a multi-tier configuration, the capacity planning
for the backup hardware must be done for all components. The database server
on S/390 is the central part of the installation and therefore must be planned
carefully, whereas the application servers are more or less replaceable. This
means that you can have fewer application servers in your backup site than in
your production site. Moreover, you can have application servers on different
platforms or you can install new application servers in the case of a disaster.
Performance is an extremely important consideration when planning the backup
configuration. For more information about hardware configurations see 3.1,
“Hardware” on page 35.
As well as defining what resources are required, you must also define when the
resources are actually needed, with consideration to the sequence of events in
the recovery process. The most critical applications may start running alongside
the restore processes for the less critical data, and this may actually generate a
temporary peak capacity requirement. Some applications may not be recovered
and fully operational for several days or even weeks, and certain resources
could be provided later.
When designing for ongoing data transfer between the two sites on a day-to-day
basis, you must consider the bandwidth requirements and profiles of the types of
data that will be transmitted. Requirements can be diverse, such as remote
operations messaging, and extended channel connections for remote backup
devices.
The simplest option is to run the entire workload at the prime site and ensure
that there is sufficient resource ready at the recovery site. In many cases,
however, a business cannot justify an idle recovery site dedicated to disaster
recovery.
Another option is to separate the noncritical workload and run it at the recovery
site. In the event of a disaster at the prime site, the noncritical workload is
displaced and the critical workload is recovered using the resources at the
recovery site. Usually, all workload eventually becomes critical as an outage
continues, and therefore some provision for recovery of the noncritical workload
must also be made.
If it is not possible to define critical and noncritical workload at least test and
quality-assurance systems can be run at the recovery site. These systems are
usually made inactive in disaster situations. Note that this possibility provides a
convenient method of practicing for disaster; as a part of normal test and quality
assurance functions, the resources to be used in disaster recovery are
exercised.
This chapter describes which possibilities you have to substitute the components
of an SAP R/3 on DB2 for OS/390 environment in the case of a disaster. There
are several factors that must be considered when planning the disaster recovery
configuration for your environment. These factors are:
• Technical feasibility
When planning the disaster recovery configuration you must be aware of
technical constraints. As an example, consider the connection from an
application server to the database server: it is perfectly possible to have a
LAN such as normal Ethernet or token-ring to be able to communicate, but
the speed and load requirements of SAP R/3 would rule out the usage of
those connections. The recovery site must be able to function for critical
applications at reasonable speeds and with minimal differences from the
central site. Otherwise, there is a risk that users will hold transactions or
use manual procedures; the recovery site becomes useless.
The technical feasibility in an SAP R/3 on DB2 for OS/390 environment is
discussed in this chapter. While the user of SAP/R3 may choose to declare
certain business functions noncritical, it is not possible to separate SAP/R3
data for the applications. Therefore, in this redbook we will assume that all
data from the DB2 for OS/390 database server must be recovered. This
means that the recovery site needs to have essentially the same capacity as
is maintained at the local site.
• Performance needed in the case of a disaster
This factor takes into account that many companies can do their business
with less capacity in the first period of a post-disaster scenario. For
example, they can define certain key users that must have access to the
systems within a defined time (such as 24 hours) while the rest of the users
must regain access after a longer period (such as five days or even after the
whole environment is recovered to the original place). It is essential that the
key users are able to perform the critical business of the company. This
chapter also discusses the role of performance when planning the disaster
recovery configuration (for example, using a smaller S/390 system or fewer
application servers).
• Recovery time
The requirements for recovery time for an SAP R/3 on DB2 for OS/390
environment can range from minutes to several days. This means that some
companies might need a high availability solution spread over two sites
using mirrored DASD (see High Availability Considerations: SAP R/3 on DB2
for OS/390, SG24-2003), whereas others can recover their environment from
tape on dedicated backup hardware (for example from a vendor) in the case
of a disaster. In this redbook we focus on recovery of the DB2 database
server in a ready-to-roll-forward environment, using tape archives and image
copies. In our experience, this solution is the most commonly employed.
Backup is achieved without disruption and there is no requirement for
bandwidth or hardware on a daily basis at the recovery site. Hardware costs
are moderate compared to those that use mirrored DASD. Transport is
assumed to be manual. In 5.4, “Advanced Disaster Recovery Planning” on
page 85, we give some hints regarding how the recovery time can be
reduced by adding more resources or changing procedures.
3.1 Hardware
In this section we discuss the hardware configuration that is needed at the
recovery site to recover an SAP R/3 on DB2 for OS/390 environment. The
configuration described here will support recovery from tape backups and
archive logs for the DB2 for OS/390 database server. This means the recovery
time might range to 24 hours depending on the availability of the recovery site
and the amount of data that must be restored.
If your business needs faster recovery even in the case of a disaster and
ready-to-roll-forward (recovery from tape) is no option for you, please refer to
High Availability Considerations: SAP R/3 on DB2 for OS/390, SG24-2003.
The premise of DB2 for OS/390 disaster recovery is that the local environment is
recreated at the recovery site. This means that the local OS/390 and all
subsystems that are used at the local site are recovered in the event of disaster.
This can include such subsystems as DFSMS, Security Server, integrated catalog
facility (ICF) catalogs, tape management subsystem, and ancillary subsystems.
Your OS/390 capacity planning personnel normally take care of calculating the
capacity and arranging the configuration at the recovery site to support those
business-critical applications that you need to run at the recovery site.
For SAP R/3 on DB2 for OS/390 we assume that only SAP/R3 data is stored on
the affected DB2 for OS/390. From the standpoint of the DB2 for OS/390
subsystem(s) that are the database servers for SAP/R3, all data will be
recovered and used in a post-disaster environment. Therefore, the DASD used
by DB2 should have similar capacity at both sites. The implementation of
DFSMS may allow greater flexibility in DASD devices. Tape drives used for
archives must be able to read the archive logs created at the local site. Your
hardware staff will know which devices are necessary.
If your local processor supports hardware compression and the equipment at the
recovery site does not, the compression is simulated by OS/390 software at the
Up to now, we have discussed DB2 for OS/390 in general, and have not
considered more than one subsystem. From now on we will divide our DB2 for
OS/390 recovery discussion into two environments: data sharing and non-data
sharing. Each procedure will be described under its own topic; if you do not use
DB2 data sharing, then you can skip topics that address it.
There is no option to disable DB2 data sharing for disaster recovery. All DB2
members must be started and must complete restart in order to release all
retained locks. Then all but one member can be stopped and recovery can
complete using the same procedures as are used in a non-data sharing
environment, described in 5.3, “Remote Site Recovery from Disaster at a Local
Site” on page 64.
Shared DASD is required for the following items of all DB2 for OS/390 members:
• Active logs
• Bootstrap data sets
• DB2 data
• DASD archive logs
Tape drives used for image copies and archives should be accessible to all
members, as well.
Since the central instance is essential for the SAP R/3 on DB2 for OS/390
environment, it must be treated with special focus. That means that there must
be a detailed disaster recovery plan, including the description of backup
hardware and backup/recovery procedures specifically for the central instance.
The dialog instances can be treated in the same way as the central instance or
they can be reinstalled in the case of a disaster. For more information about
backup and recovery procedures of application servers see Chapter 4,
“Backup/Recovery Considerations in Disaster Recovery” on page 45.
The disk space needed is determined by the production servers and can only be
reduced for the backup servers when you mirror the data. Normally, data is not
mirrored in a post-disaster environment.
When planning the backup site′s network infrastructure, normally the same
architecture is used as in the prime site. If the architecture is different at the
backup site (for example, because an existing site with existing network
infrastructure is used) you may need different communication adapters in your
backup servers.
The normal disaster recovery plan for application servers is to take backups
from the servers and to restore them on the same architecture. Generally it is
possible to choose a different architecture for your recovery site from the one
you have in normal operation. This means you can decide to have your
application servers running on stand-alone RS/6000 machines instead of on
RS/6000 SP, or you can choose Windows NT instead of AIX.
If you choose to take backups of your RS/6000 machines for disaster recovery,
please keep in mind that at the moment there are two general architectures of
RS/6000: PCI-bus and Micro Channel. Backups made from one architecture are
not easily transferable to the other architecture. Therefore, you should tend to
choose the architecture of your production server for your backup server. This
is no problem in an RS/6000 SP environment; Parallel System Support Programs
(PSSP) control uses the correct drivers for the components.
Special focus is needed when you use ESCON channel since ESCON channel
hardware is only supported by certain application servers.
3.2 Software
All SAP R/3 applications are stored in the database. This means that restoring
the DB2 for OS/390 data is the main part of the disaster recovery procedure.
Besides DB2 for OS/390, there are SAP R/3 executables on the application
servers that need to be available. The backup and recovery procedures are
described in Chapter 4, “Backup/Recovery Considerations in Disaster Recovery”
on page 45. In this section we list the software you need at your recovery site.
3.2.1 OS/390
Since DB2 for OS/390 ready-to-roll-forward disaster recovery means recreating
the local environment at the recovery site, your OS/390 staff must provide all the
software used at the local site on which DB2 for OS/390 depends.
Your DBA staff may use tools to manage data or for performance monitoring for
the DB2 for OS/390 environment. In addition to the specific vendor product
libraries, some of the functions are stored in DB2 tables, so you must make
provision for their copy and restoration.
The installation media are essential when your disaster recovery plan includes
installation of new application servers or if you need to install new application
server operating system software.
To be able to work with the R/3 system, a license is required. The installation of
the license is performed from the central instance. To install it, a license key
that depends on a system-specific keycode, must be obtained from SAP AG. The
license key itself is stored in the database.
Because the license depends on the hardware of the central instance you need a
new license on the backup machine. The license for the backup machine can be
installed in two ways:
1. You can install a temporary license key after recovery; this allows you to
work with the R/3 system for four weeks. The temporary license is
independent of the hardware and can be used during tests as well as in a
post-disaster scenario. If, in the case of a disaster, you need to run your
SAP R/3 system longer than four weeks on the backup machine, you have
enough time to get the license key from SAP AG; include getting this key as
an item in your disaster recovery plan.
2. If you recover on dedicated hardware in the case of a disaster you can
install a stationary license key. It is possible to install several licenses (for
different host candidates running the message server). The R/3 system will
search for the current license. The license key must be updated for new
releases.
For more information about license keys see R/3 Installation on UNIX DB2 for
OS/390, Material Number 51002659.
Backup and recovery tools are normally needed to recover your data at the
recovery site, so they must be included in your disaster recovery plan.
In some environments archiving tools are only needed once a month, but in
other environments archiving is done continuously and without the archiving you
might run out of disk space. In the latter case you should also include the
archiving tools in your disaster recovery plan.
3.3 Communications
In a multi-tier environment like SAP R/3 on DB2 for OS/390, communication
between the components is the basis for all operations. Therefore, you must
plan the disaster recovery for communications carefully.
The network topology should provide alternate paths between host locations
(primary site and disaster recovery site) and all remote user locations. Note that
local users to the primary site are remote users to the recovery site. In the case
of a disaster, there must be provision for connections of primary site users to the
recovery site, and operating instructions for any different procedures they
perform must be provided. Figure 12 on page 43 shows the network
connections in a post-disaster situation.
Of course, redundant paths are more expensive than switching but they offer a
faster recovery time, whereas switching the network is cheaper but takes longer.
As the switching of the network can be done in parallel to the SAP R/3 restore,
you must calculate the duration of restoring your SAP R/3 on DB2 for OS/390
environment and the switching time to decide if switching the network in the
case of a disaster is an option for you. The overall objective is to meet the
required recovery time determined in the business impact analysis.
This chapter describes in detail the backup procedures for disaster recovery for
the components of an SAP R/3 on DB2 for OS/390 environment, such as DB2 for
OS/390 and the application servers on the different platforms. The backup that
will be described is restricted to that required for a ready-to-roll forward
environment (tape image copies and archive logs).
As is the case with any software package, backup copies of system and
application software must be available at the recovery site. Backup and
recovery requirements for an SAP R/3 DB2 for OS/390 installation are somewhat
different from traditional software packages in that the SAP R/3 application
programs reside in the SAP R/3 database. Those program modules that are
resident are backed up as a matter of course when the SAP R/3 database is
copied. What remains are files and executables that reside outside of the SAP
R/3 database, and include:
• System software (OS/390, AIX, Windows NT, DB2 for OS/390)
• DB2 for OS/390 Catalog and Directory
• Central Instance/Application Servers
• Transport and Correction System
Important
To provide for effective recovery in the event of failure, any restoration must
guarantee that every component in the SAP R/3 environment is logically
consistent with the others in terms of the content and structure at a given
point in time.
For more information on maintaining backups and planning for recovery, please
refer to the following publications:
• BC SAP Database Administration Guide: DB2 for OS/390, Material Number
51001015
• DB2 UDB for OS/390 V6 Administration Guide, SC26-9003
• DB2 UDB for OS/390 V6 Utility Guide and Reference, SC26-9015
• R/3 Installation on UNIX DB2 for OS/390, Material Number 51002659
• Database Administration Experiences: SAP R/3 on DB2 for OS/390, SG24-2078
These utilities are more fully described in DB2 UDB for OS/390 V6 Utility Guide
and Reference, SC26-9015.
A major advantage of the COPY utility is that DB2 for OS/390 records backup
activity in the DB2 for OS/390 catalog, and uses this information in the event that
recovery is required.
You can use a SHRLEVEL CHANGE copy for a valid point-in-time recovery
only if recovering to a QUIESCE point. You can use the COPY SHRLEVEL
CHANGE with the conditional restart technique of prior point-in-time recovery.
Important
When the SHRLEVEL REFERENCE option of COPY is used, the point of
consistency for the tablespace is at the point in time that the COPY begins,
since DB2 for OS/390 will not permit modification to the tablespace for the
duration of the COPY. As with the DB2 for AIX implementation, a tablespace
copied in this manner is logically consistent at the time that the copy began.
In this redbook, we do not assume this method is used to back up the DB2 for
OS/390 data. It can be used effectively for backing up program libraries.
Once set up the application server changes only when the profiles are adjusted
or new releases are installed. For that reason you only need to make a backup
of the application servers after installation and after changes. Most SAP R/3
users decide to take a backup after changes and also frequently each month to
ensure that there is always a backup that is not older than one month.
The backup from the rootvg can easily be made by using SMIT ( smit mksysb) or
the command mksysb.
The backup from the SAP R/3 environment can be made in several ways. You
can create a tar tape containing the filesystems, you can back up the whole
volume group in which the filesystems reside or you can use the SAP tool
backoffl. Which of these procedures you use depends on your company′ s
strategies.
If you use ADSM to back up the application servers, this needs special attention.
The ADSM server must then be included in the disaster recovery strategy and
must be in place before you restore the SAP R/3 environment.
In many installations the SAP R/3 environment is included in the rootvg. Then
you only need to make the backup from the rootvg.
In most cases the images of the RS/6000 SP nodes reside on external disks of
the control workstation. This means that a complete backup of the control
workstation′s environment is necessary. To achieve this you need to run mksysb
from the control workstation to back up the operating environment. You also
need to run tar to back up all volume groups to have backups of the images.
If you use ADSM to back up the node images or the SAP R/3 environment, this
needs special attention. The ADSM server must then be included in the disaster
recovery strategy and must be in place before you restore the SP and SAP R/3
environment.
As with AIX, you should build backups of all logical disks required for your
application server operation.
When you want to restore your complete SAP R/3 landscape to a specific point
in time you need to back up the Transport and Correction directories that reside
on one of the application servers at the same time as the database. This
ensures that all corrections and transports made up to this time can be found in
the Transport and Correction System and can be processed correctly.
Database Recovery: A DB2 for AIX system is backed up and recovered at either
the database level or the tablespace level. In contrast, the SAP R/3 database in
a DB2 for OS/390 environment is neither backed up nor recovered at the DB2 for
OS/390 database level. Rather, image copies are taken at the tablespace,
partition, or data set level. Recovery of the DB2 for OS/390 databases that make
up an SAP R/3 database, therefore, is not done at the database level, which is
also in contrast to the DB2 for AIX implementation.
Tablespace Recovery: Both a DB2 for AIX system and a DB2 for OS/390 system
can be backed up and recovered at the tablespace level. DB2 for OS/390
provides an additional level of granularity by allowing backup and recovery to
occur at the partition level. Refer to DB2 for OS/390 V5 Utility Guide and
Reference for details.
Attention
In R/3, unless you definitely know otherwise, you should assume that there is
only one logically related set of objects: all the R/3 databases and the
associated DB2 subsystem catalog and directory.
Important
The use of ADSM is not an alternative at this point, since ADSM does not
support the backup of DB2 for OS/390 tablespaces.
To decide which method you will use to establish your quiesce point, you must
first evaluate the characteristics and the corresponding options available with
each quiesce point alternative in terms of your recovery objectives. In summary,
each alternative establishes a quiesce point:
• ARCHIVE LOG MODE(QUIESCE) will establish a system-wide quiesce point and
record it in the bootstrap data set where it can be accessed by the PRINT
LOG MAP utility (DSNJU004). You also have control over the command
timeout value, and can thus control the effect that this command has on user
update processing.
• ARCHIVE LOG produces an archive log without consistency. When such a log
is used in disaster recovery, consistency to the last completed unit of work is
achieved during restart at the recovery site. There is no effect on local
users when the archive is produced.
Finally, since the process of taking a quiesce point may affect user access for
the duration of the operation, you must decide the best time to take it. You must
also decide which has the higher priority: uninterrupted user access, or the
establishment of the point-in-time recovery quiesce point itself. For flexibility
and to minimize the impact to SAP R/3 users, you should consider the use of the
ARCHIVE LOG command. Additional information may be found in BC SAP Database
Administration Guide: DB2 for OS/390 (51001015).
For disaster recovery, you will predefine the time and consistency point to which
you wish to recover. It is likely based on a business relationship within the
applications (that is, logical end of a business process).
The next section, 4.4.2, “Point-in-Time Recovery Using DB2 Conditional Restart,”
describes an alternative, where an inconsistent point is created without
disruption to local users. It is made a consistent point at the recovery site
through log truncation.
You should also be aware that it is important to frequently obtain OSS note
83000 from SAP. This note is updated with recommendations on
backup/recovery and is the master reference on that topic from SAP developers.
The first three steps listed are new and will receive the major part of our
attention here. Once those steps are complete, the remainder of this scenario
will be described in 5.3, “Remote Site Recovery from Disaster at a Local Site” on
page 64.
The list of candidate points of consistency might have an entry for each hour in
the day or for each minute in the day. For each entry in the list, we have a
timestamp and the corresponding DB2 Log RBA. This allows you to map a
specific time to a DB2 Log RBA. For data sharing users it is a Log Record
Sequence Number (LRSN) that serves the same purpose.
How do you build a list of timestamps and the associated Log RBAs? You start
by defining a dummy database and tablespace. This will be a real DB2 database
and tablespace, but there will be no activity against the dummy tablespace. SAP
R/3 will not know about this tablespace.
You should be aware that the use of the dummy tablespace is for convenience
only; you can also look for checkpoint records that are stored with a time and
RBA. Most installations will have checkpoint records at 10-15 minute intervals,
so those records should provide usable references that map RBAs to a specific
time.
Select the Candidate Point of Consistency That Best Meets Your Requirements:
You must decide the point at which you want to recover if a disaster occurs.
This part cannot be automated. Suppose you determine that 6:00 PM will
become your system-wide recovery time. You make the determination that you
want to take your system back to that date and time.
You have one more task. Query SYSIBM.SYSCOPY for the dummy tablespace
entry before 6:00 PM (or look for checkpoint records at that time as noted
previously). Once you determine that entry from the list, note the DB2 Log RBA.
This can be provided from the local site by adding it to other recovery
information which will be sent to the recovery site, either in list form or through
creation of a separate data set.
There is probably data inconsistency at the Log RBA you identified. You are
running an active SAP R/3 system and it is likely that at the time you have
identified, there was work in process (or in-flight units of recovery). However,
you can make that Log RBA a true point of consistency.
By doing a DB2 conditional restart, you can make the Log RBA you identified
into a point of consistency. You will use the CHANGE LOG INVENTORY DB2
utility to create a conditional restart control using the following statement:
CRESTART CREATE,FORWARD=YES,BACKOUT=YES,ENDRBA=XXXX
where XXXX is the true point of consistency you determined from your
SYSIBM.SYSCOPY query. For more information on the CHANGE LOG
INVENTORY DB2 utility, see DB2 UDB for OS/390 V6 Utility Guide and Reference,
SC26-9015.
The conditional restart will cause DB2 to truncate the log at your true point of
consistency. Log entries beyond that point will be disregarded. Additionally,
DB2 will remove from SYSLGRNGX and SYSCOPY any entries that occurred after
the true point of consistency.
Recover All Tablespaces to the True Point of Consistency: After the conditional
restart, this will be a recovery to currency and not a recovery to an RBA
(recovery to an RBA is common in most point-in-time recovery scenarios).
Recover/Rebuild All Indexes on the Tables That Have Been Reset to the Prior
Point of Consistency: The indexes must be made consistent with the data.
Important
Recovery time can be significantly reduced with a procedure that identifies
those tablespaces and indexes that actually changed after the RBA noted.
Recovery of only those tablespaces and indexes is necessary. You should
investigate the possibility of writing a program that performs a DB2 log scan
to find such tablespaces (remembering to recover the tablespaces and all the
indexes that exist on tables in those tablespaces). Some vendors provide
DB2 log analysis programs that have this function.
Execute Transaction SM13 on the SAP R/3 System: After the SAP R/3 Central
Instance is started, execute transaction SM13 to review aborted updates.
Resolve all aborted updates before the SAP R/3 system is opened for productive
use.
Note: Transaction SM13 should be executed as part of your daily activities.
Since this scenario contains a conditional restart, anyone using it must first
practice it. An improperly done conditional restart usually results in the failure
of the disaster recovery attempt.
DB2 for OS/390 has an installation option called SITE TYPE, and it is intended for
disaster recovery. The two choices are LOCALSITE and RECOVERYSITE. It is
designed to allow DB2 for OS/390 to call for the relevant image copy at the
correct site without unnecessary operational intervention. We assume the local
site is LOCALSITE and the site where the recovery is performed is
RECOVERYSITE.
Following is a list of essential disaster recovery elements and the steps you
need to take to create them. For ease of use, we assume that all data sets are
cataloged and will be tracked using an ICF catalog.
• Image copies
1. Make copies of all the SAP/R3 tablespaces, any vendor tools which are
used by your DBA group, and the DB2 catalog/directory, preferably in
that order. We assume they are made at least daily to assure adequate
performance at the recovery site.
Use the COPY utility to make copies for the local subsystem and
additional copies for disaster recovery. They can be made with one
invocation of the COPY utility, by specifying DDNAME with COPYDDN to
produce the copy for the local site and RECOVERYDDN option to produce a
copy for the recovery site.
Do not produce the copies by invoking COPY twice.
2. Send the image copies to the recovery site.
3. Record this activity at the recovery site when the image copies are
received.
• Archive logs
1. Make copies of the archive logs for the recovery site.
Use the ARCHIVE LOG command to archive all current DB2 active log data
sets.
There is an exposure if you take COPY 2 of the archive to the recovery
site. If the first copy of an archive becomes unreadable, then the second
copy is requested. DB2 will wait indefinitely until the second copy is
mounted, which can create logistical problems. A secondary problem is
that under certain unusual circumstances, COPY 2 may not be produced,
For disaster recovery to be successful, all copies and reports must be updated
and sent to the recovery site daily. Data will be up to date through the last
If the SAP R/3 environment is not stored in the rootvg, you need to recover the
SAP R/3 environment according to the backup procedure. This means you need
to recover it with a tool such as tar, backoffl, or others.
After restoring all data you must adjust some parameters, for example the
network environment, paging space, and date/time.
If the SAP R/3 environment is not stored in the rootvg of the nodes, you need to
recover the SAP R/3 environment according to the backup procedure. This
means you need to recover it with a tool such as tar, backoffl, or others.
After restoring all data you must adjust some parameters, for example the
network environment, paging space, and date/time. You must also recognize
that the names and IP addresses of application servers may be required to be
changed (for example, when the application servers are on a different LAN) and
they may use different network servers (name servers and gateways), so the
network connections must be reconfigured.
Network reconfiguration may impose some SAP R/3 parameter changes, if the
name of the central instance application server or the database server is
different from normal operation.
For the detailed installation procedure see R/3 Installation on UNIX DB2 for
OS/390, Material Number 51002659 and Implementing SAP R/3 in an OS/390
Environment Using AIX and Windows NT Application Servers, SG24-4945.
To reduce the recovery time, you can make a backup of the application server
environment on your recovery machines after the first test and use the backup
tapes to restore in the case of a disaster. Of course, you need to create new
tapes when a new release is installed.
This chapter addresses restart actions when a central site becomes unavailable
and a remote site must assume those functions.
Planning Note
In this case the application server load characteristics at the recovery site
will be different from the central site. You should review your decisions
regarding the number of SAP R/3 batch, dialog, and update processes that
are assigned to the individual application server machines.
In some cases, application server machines may be brought to the remote site
only when disaster use is necessary. If that is true in your case, remember that
system copies of central site application servers will be required. However,
some configuration time will be necessary in addition to restoring the copy:
1. TCP/IP system names and IP addresses will be different (to allow the central
site machines to be in the network as repair is done). SAP R/3 definitions
that specify these names and addresses will require modification.
2. Hardware configurations may be slightly different.
3. Connectivity information (such as gateway names and name servers) will be
different.
The enterprise must recognize that the disk space requirement will not be
lessened; all SAP R/3 tables and views must be kept available.
Important
Where this scenario differs from that of a later release of DB2 for OS/390, or
a later PTF, use the most current one.
The assumption of the original scenario is that you will be recovering to the end
of the last archive log you have off-site. While this method assures you the most
currency, it is likely not to make “business” sense.
We have assumed that you intend to use the point-of-consistency you developed
at the local site as described in 4.4.1.1, “Establishing a Point of Consistency” on
page 51. We also assume you have the results of the query of
SYSIBM.SYSCOPY for the dummy tablespace, and the log RBA or ENDLRSN to
which you wish to recover your environment. The method used to truncate the
DB2 log is the same, a conditional restart. The process of determining the
ENDRBA or ENDLRSN is different for each end point: the end of the archive log,
as described in DB2 UDB for OS/390 V6 Administration Guide, SC26-9003, or of
the point-of-consistency, which is described in the following pages.
Redbook Difference
While the point-of-consistency method described in 4.4.2, “Point-in-Time
Recovery Using DB2 Conditional Restart” on page 53 will not necessarily use
the most current archive log, that log will be used as a base point to identify
the archive log that contains our point-of-consistency.
For data sharing users, begin at 5.3.2, “Steps to Recover (Data Sharing Only)” on
page 72.
b. Use the change log inventory utility (DSNJU003) to register this latest
archive log tape data set in the archive log inventory of the BSDS just
restored. This is necessary since the BSDS image on an archive log
tape does not reflect the archive log data set residing on that tape.
Redbook Recommendation
This is the first mention of the change log inventory utility. Various
change log inventory control statements are developed through the
rest of the subscripts of this step. Though they are explained in this
procedure separately, you can run them all in one batch invocation of
change log inventory.
c. Use the change log inventory utility to adjust the active logs:
1) Use the DELETE option of the change log inventory utility (DSNJU003)
to delete all active logs in the BSDS. Use the BSDS listing produced
in the step above to determine the active log data set names.
2) Use the NEWLOG statement of the change log inventory utility
(DSNJU003) to add the active log data sets to the BSDS. Do not
specify a STARTRBA or ENDRBA value in the NEWLOG statement. This
indicates to DB2 that the new active logs are empty.
If you are using dual BSDSs, make sure both of them are included in the
jobs.
DSN1LOGP gives a report. For sample output and information about how to
read it, see Section 3 of DB2 UDB for OS/390 V6 Utility Guide and Reference,
SC26-9015.
a. Run the DSNTINST CLIST in UPDATE mode See Section 2 of DB2 UDB for
OS/390 V6 Installation Guide, GC26-9008.
b. To defer processing of all databases select Databases to Start
Automatically from panel DSNTIPB. You are presented with panel
DSNTIPS. Type DEFER in the first field, ALL in the second and press Enter.
You are returned to DSNTIPB.
c. To specify where you are recovering select Operator Functions from
panel DSNTIPB. You are presented with panel DSNTIPO. Type
RECOVERYSITE in the SITE TYPE field. Press Enter to continue.
d. To optionally specify which archive log to use, select Operator Functions
from panel DSNTIPB. You are presented with panel DSNTIPO. Type YES
in the READ ARCHIVE COPY 2 field if you are using dual archive logging
and want to use the second copy of the archive logs. Press Enter to
continue. (This applies to DB2 UDB for OS/390 Version 6 or above.)
Redbook Recommendation
To enable fast log apply for recovery and restart, select ″Active Log
Data Set Parameters″ from panel DSNTIPB. You are presented with
panel DSNTIPL. Type 100 in the LOG APPLY STORAGE field to
reserve 100 MB storage in the DBM1 address space and press Enter.
(This applies to DB2 UDB for OS/390 Version 6 or above.)
DB2 discards any log information in the bootstrap data set and the active
logs with an RBA greater than or equal to nnnnnnnnn000 as listed in the
CRESTART statements above.
Use the print log map utility to verify that the conditional restart control
record that you created in the previous step is active.
10. Enter the command START DB2 ACCESS(MAINT).
Even though DB2 marks all tablespaces for deferred restart, log records are
written so that in-abort and inflight units of recovery are backed out.
In-commit units of recovery are completed, but no additional log records are
written at restart to cause this. This happens when the original redo log
records are applied by the RECOVER utility.
At the primary site, DB2 probably committed or aborted the inflight units of
recovery, but you have no way of knowing.
During restart, DB2 accesses two tablespaces that result in DSNT501I,
DSNT500I, and DSNL700I resource unavailable messages, regardless of
DEFER status. The messages are normal and expected. You can ignore
them.
The return code accompanying the message might be one of the following,
although other codes are possible:
00C90081 This return code occurs if there is activity against the object
during restart as a result of a unit of recovery or pending
writes. In this case the status shown as a result of -DISPLAY
is STOP,DEFER.
00C90094 Since the tablespace is currently only a defined VSAM data
set, it is in an unexpected state to DB2.
00C900A9 This codes indicates that an attempt was made to allocate a
deferred resource.
11. Resolve the indoubt units of recovery.
The RECOVER utility, which you will soon invoke, will fail on any tablespace
that has indoubt units of recovery. Because of this, you must resolve
in-doubt units of recovery first.
Determine the proper action to take (commit or abort) for each unit of
recovery. To resolve indoubt units of recovery see “Resolving Indoubt
Threads” in DB2 UDB for OS/390 V6 Administration Guide, SC26-9003. From
an install SYSADM authorization ID, enter the RECOVER INDOUBT command for
all affected transactions.
If you attempt this from an MVS console, you will receive messages resulting
from an attempt to do authorization checking when no tables exist yet.
Redbook Recommendation
After you identify any indoubt units of recovery, it is safe to issue RECOVER
INDOUBT ACTION(ABORT). The rationale is that you have already lost work;
the loss of one more unit of work is not significant.
3. DSNDB01.SYSLGRNX
• -TERM UTILITY on SYSLGRNX (job step 5)
a. Run the DSNTINST CLIST in UPDATE mode. See Section 2 of DB2 UDB
for OS/390 V6 Installation Guide, GC26-9008.
b. From panel DSNTIPB select Databases to Start Automatically. You are
presented with panel DSNTIPS. Type RESTART in the first field, ALL in the
second and press Enter. You are returned to DSNTIPB.
c. (For DB2 UDB for OS/390 Version 6 and above). You must keep SITE
TYPE as RECOVERYSITE until all user objects are recovered, so do not
change this parameter now.
d. Reassemble DSNZPxxx using job DSNTIJUZ (produced by the CLIST
started in the first step).
18. Stop and start DB2.
Note: Do not use ACCESS(MAINT), because you want other users to perform
the next recoveries.
19. Make a full image copy of the catalog and directory.
20. Recover user tablespaces (any tools tablespaces as well as SAP R/3 on DB2
for OS/390 tablespaces). See 5.3.2.1, “What to Do about Utilities in
Progress” on page 82 for information on how to recover tablespaces on
which utilities were running. You cannot restart a utility at the recovery site
that was interrupted at the disaster site. You have already terminated
utilities running against user tablespaces in step 12d on page 69.
a. Issue the SQL query
SELECT * FROM SYSIBM.SYSTABLEPART WHERE STORTYPE=′ E′ ;
to determine which, if any, of your tablespaces are user-managed. To
allocate user-managed tablespaces, use the access method services
DEFINE CLUSTER command.
b. If your user tablespaces are STOGROUP-defined, and if the volume serial
numbers at the recovery site are different from those at the local site,
use ALTER STOGROUP to change them in the DB2 catalog.
After these archive logs are registered, use the print log map utility
(DSNJU004) with the GROUP option to list the contents of all the SDSs. You
get output that includes the start and end LRSN and RBA values for the
latest active log data sets (shown as NOTREUSABLE).
Note: If there is a discrepancy among the print log map reports as to
the number of members in the group, record the one that shows the
highest number. (This is an unlikely occurrence.) This is the DB2 that
must be started first.
a. Run the DSNTINST CLIST in UPDATE mode. See Section 2 of DB2 UDB
for OS/390 V6 Installation Guide, GC26-9008.
b. To defer processing of all databases select Databases to Start
Automatically from panel DSNTIPB. You are presented with panel
DSNTIPS. Type DEFER in the first field, ALL in the second and press Enter.
You are returned to DSNTIPB.
c. To specify where you are recovering select Operator Functions from
panel DSNTIPB. You are presented with panel DSNTIPO. Type
RECOVERYSITE in the SITE TYPE field. Press Enter to continue.
d. To optionally specify which archive log to use select Operator Functions
from panel DSNTIPB. You are presented with panel DSNTIPO. Type YES
in the READ ARCHIVE COPY 2 field if you are using dual archive logging
and want to use the second copy of the archive logs. Press Enter to
continue. (This applies to DB2 UDB for OS/390 Version 6 and above.)
Redbook Recommendation
To enable fast log apply for recovery and restart, select Active Log
Data Set Parameters from panel DSNTIPB. You are presented with
panel DSNTIPL. Type 100 in the LOG APPLY STORAGE field to
reserve 100 MB storage in the DBM1 address space and press Enter.
(This applies to DB2 UDB for OS/390 Version 6 or above)
DB2 discards any log information in the bootstrap data set and the active
logs with an LRSN greater than nnnnnnnnnnnn as listed in the CRESTART
statements above.
Use the print log map utility to verify that the conditional restart control
record that you created in the previous step is active.
11. Start one DB2 with ACCESS(MAINT). DB2 will prompt you to start each
additional DB2 subsystem in the group.
If there is a discrepancy among the print log map reports as to the number
of members in the group, record the one that shows the highest number.
(This is an unlikely occurrence.) This is the DB2 that must be started first.
Redbook Recommendation
A group restart will be performed following the truncation of all the
members′ logs. Expect to see at least one of these messages:
DSNR021I csect-name DB2 SUBSYSTEM MUST PERFORM GROUP RESTART
FOR PEER MEMBERS
DSNR022I csect-name DB2 SUBSYSTEM HAS COMPLETED GROUP RESTART
FOR PEER MEMBERS
If you do not see this, stop now and perform step 1 on page 72 in this
procedure to force existing structures and connections from the CF from
prior tests. Then you must redo step 10, which created the CRESTART
record, before restarting DB2 again. Failure to delete structures prior to
restart is a common cause of disaster recovery failure for data sharing
users.
Even though DB2 marks all tablespaces for deferred restart, log records are
written so that in-abort and inflight units of recovery are backed out.
In-commit units of recovery are completed, but no additional log records are
written at restart to cause this. This happens when the original redo log
records are applied by the RECOVER utility.
At the primary site, DB2 probably committed or aborted the inflight units of
recovery, but you have no way of knowing.
13. If you are going to run single-system data sharing at the recovery site, stop
all DB2s but one by using the STOP DB2 command with MODE(QUIESCE).
14. To recover the catalog and directory, follow these instructions:
The RECOVER function includes: RECOVER TABLESPACE, RECOVER INDEX, or
REBUILD INDEX. If you have an image copy of an index, use RECOVER INDEX. If
you do not have an image copy of an index, use REBUILD INDEX to reconstruct
the index from the recovered tablespace.
a. Recover DSNDB01.SYSUTILX. This must be a separate job step.
b. Recover all indexes on SYSUTILX. This must be a separate job step.
3. DSNDB01.SYSLGRNX
• -TERM UTILITY on SYSLGRNX (job step 5)
a. Run the DSNTINST CLIST in UPDATE mode. See Section 2 of DB2 UDB
for OS/390 V6 Installation Guide, GC26-9008.
b. From panel DSNTIPB select Databases to Start Automatically. You are
presented with panel DSNTIPS. Type RESTART in the first field, ALL in the
second and press Enter. You are returned to DSNTIPB.
c. (For DB2 UDB for OS/390 Version 6 and above). You must keep SITE
TYPE as RECOVERYSITE until all user objects are recovered, so do not
change this parameter now.
d. Reassemble DSNZPxxx using job DSNTIJUZ (produced by the CLIST
started in the first step).
20. Stop and start DB2 (one member).
Note: Do not use ACCESS(MAINT), because you want other users to perform
the next recoveries.
21. Make a full image copy of the catalog and directory.
22. Recover user tablespaces. See 5.3.2.1, “What to Do about Utilities in
Progress” on page 82 for information on how to recover tablespaces on
which utilities were running. You cannot restart a utility at the recovery site
that was interrupted at the disaster site. You have already terminated any
utilities running against user tablespaces in item 14d on page 79.
a. Issue the SQL query
SELECT * FROM SYSIBM.SYSTABLEPART WHERE STORTYPE=′ E′ ;
to determine which, if any, of your tablespaces are user-managed. To
allocate user-managed tablespaces, use the access method services
DEFINE CLUSTER command.
b. If your user tablespaces are STOGROUP-defined, and if the volume serial
numbers at the recovery site are different from those at the local site,
use ALTER STOGROUP to change them in the DB2 catalog.
Important
Remember that any inline image copy taken with COPY SPEC must be at the
recovery site or you will have to recover to a prior point in time.
CHECK DATA Terminate the utility and run it again after recovery is complete.
COPY After you enter the TERM command, DB2 places a record in the
SYSCOPY catalog table indicating that the COPY utility was
terminated. This makes it necessary for you to make a full image
copy. When you copy your environment at the completion of the
disaster recovery scenario, you fulfill that requirement.
Overview of the method: A DB2 tracker site is a separate DB2 subsystem or data
sharing group that exists solely for the purpose of keeping shadow copies of
your primary site′s data. No independent work can be run on the tracker site.
From the primary site, you transfer the BSDS and the archive logs, then the
tracker site runs periodic LOGONLY recoveries to keep the shadow data
Important: Do not attempt to start the tracker site when you are setting it up.
Because bringing up your tracker site as the takeover site destroys the tracker
site environment, you should save your complete tracker site prior to takeover
site testing. The tracker site can then be restored after the takeover site testing,
and the tracker site recovery cycles can be resumed.
How current will the data be? It depends on the bandwidth across which the
data flows and the distance it must travel. It can be from a few milliseconds to
several seconds in a well-tuned XRC system, with channel extenders. Due to the
latter, there is no distance limit at which the receiving control units must be
placed. For devices connected through ESCON directors, the current distance
limit is 43 km. Since the writes to the data are asynchronous, there is no
performance penalty paid at the local site.
The active system data mover, which is receiving tracks, is usually found at the
recovery site. The tracks are placed on devices of the same VOLSERs as exist
at the local site. There is no DB2 subsystem active. The data is not accessible
until a failure occurs. A consistency group can be comprised of a single DB2
subsystem or a DB2 data sharing group. In the latter case, a coupling facility is
required to be at the recovery site by the time the first DB2 is started. All DB2
subsystem and user data is expected to be in the same consistency group.
Since there is more complexity with DB2 data sharing, we have described each
procedure in a different section
a. Enter the following MVS command to display the structures for this data
sharing group:
D XCF,STRUCTURE,STRNAME=grpname*
b. For group buffer pools and the lock structure, enter the following
command to force the connections off those structures:
SETXCF FORCE,CONNECTION,STRNAME=strname,CONNAME=ALL
Connections for the SCA are not held at termination, so there are no
SCA connections to force off.
c. Delete all the DB2 coupling facility structures by using the following
command for each structure:
SETXCF FORCE,STRUCTURE,STRNAME=strname
4. The DSNZPARM members used should be those in use at the local site.
There is no DEFER ALL parameter, since the DB2 restart is not conditional.
5. If you are using the DB2 distributed data facility, run the change log
inventory utility with the DDF statement to update the LOCATION, and
LUNAME values in the BSDS.
6. Start all DB2 members of the data sharing group.
7. You will observe the addition of pages to the logical page list (LPL) during
forward and backward phases of DB2 restart. All data sets that were being
shared at the time will be set to GRECP exception condition.
8. After the restart(s) has completed, DSNR002I is displayed.
9. Issue the DISPLAY DATABASE(*) SPACE(*) RESTRICT LIMIT(*) command to list
the objects in GRECP/LPL.
10. Issue the following commands to recover the DB2 catalog and directory:
START DATABASE(DSNDB01) SPACE(*)
START DATABASE(DSNDB06) SPACE(*)
11. To recover from GRECP/LPL, issue the START commands in either of these
forms:
START DATABASE(*) SPACE(*)
This will recover any objects in GRECP/LPL
START DATABASE(data base name) SPACE(*)
This command can be issued for different databases on different members
for better recovery performance, if desired. The data base name is obtained
from the command issued in step 9. This recovery can take from a few
minutes to hours (if many thousands of data sets are in GRECP/LPL status).
It is a factor of the number of data sets being shared and the amount of log
data that must be passed since the last DB2 member checkpoints.
12. DB2 is now available for new work.
To ensure a fast and smooth recovery, it is essential to have all required items
needed to recover the SAP R/3 on DB2 for OS/390 environment at the recovery
site. Some of the items might be kept at the recovery site, while others might be
brought to the recovery site in the case of a disaster. Of course, the latter must
be kept or stored outside the primary site to ensure that they are not affected or
damaged during the disaster. Everything at the primary site must be considered
to be lost or damaged in the case of the disaster and cannot be part of a
disaster recovery plan.
A.1 Hardware
All hardware necessary to recover your SAP R/3 on DB2 for OS/390 environment
must be installed at the recovery site or must be brought to the recovery site in
the case of a disaster. This includes:
• Database server (S/390)
• Application servers (RS/6000, RS/6000 SP or PC Server)
• Network infrastructure (cabling, hubs etc.)
• Peripherals (tape drives, printers etc.)
A.2 Software
To restore the data of the SAP R/3 on DB2 for OS/390 environment you need
backups of all data categories:
• System
• Infrastructure
• Application
Of course, which tapes you need depends on your backup and restore
procedures. When you use electronic vaulting some of the data might be
already on DASD. Based on the architecture of DB2 for OS/390 and SAP R/3 you
need the following backups:
• OS/390 environment
• DB2 for OS/390 environment
• SAP R/3 tablespaces
• Archive logs
• JCL to recover DB2 for OS/390 environment
• Operating system environment of the application servers (AIX or Windows
NT)
• SAP R/3 environment of the application servers
• Transport and Correction System
• Complementary software as needed
Besides the backups that contain your vital data and therefore are not
replaceable, you should have installation media from the software you use at the
The following installation media should be at the recovery site in the case of a
disaster:
• OS/390
• DB2 for OS/390
• AIX or/and Windows NT
• SAP R/3 on DB2 for OS/390 CDs
• Communication software
• Complementary software
Make sure that the installation media have the required software level and that
there is the appropriate hardware to read the media at the recovery site.
A.3 Skills
In the event of a disaster, the assumption that some skilled personnel will
survive is key to the recovery effort. Depending on how your recovery site is
managed and operated, the availability of skilled personnel at the recovery site
in the case of a disaster might become a critical factor. Usually personnel of a
company having a disaster travel to the recovery site in the case of a disaster
and start the recovery when they reached the recovery site. All travel
arrangements required during a disaster must be coordinated centrally, for
example by a travel agency.
When travelling becomes the critical path in the recovery time line or the need
for additional skilled personnel arises, changing the management and operation
of the recovery site or platform-trained contractors from outside sources should
be considered.
In all cases it is imperative that the personnel is efficiently trained to recover the
SAP R/3 on DB2 for OS/390 environment, especially when outside sources are
part of the recovery plan.
This redbook is directed at customers and analysts who need to plan and
implement disaster handling at sites where SAP R/3 is installed with DB2 on the
OS/390 platform as the database server. The redbook helps database
administrators, SAP basis consultants, and system programmers understand the
activities necessary for implementing a disaster recovery plan in such an
environment. The information in this publication is not intended as the
specification of any programming interfaces that are provided by any of the
products mentioned in this document. See the PUBLICATIONS section of the
appropriate IBM Programming Announcement for more information about what
publications are considered to be product documentation.
Information in this book was developed in conjunction with use of the equipment
specified, and is limited in application to those specific hardware and software
products and levels.
IBM may have patents or pending patent applications covering subject matter in
this document. The furnishing of this document does not give you any license to
these patents. You can send license inquiries, in writing, to the IBM Director of
Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785.
Licensees of this program who wish to have information about it for the purpose
of enabling: (i) the exchange of information between independently created
programs and other programs (including this one) and (ii) the mutual use of the
information which has been exchanged, should contact IBM Corporation, Dept.
600A, Mail Drop 1329, Somers, NY 10589 USA.
The information contained in this document has not been submitted to any
formal IBM test and is distributed AS IS. The information about non-IBM
(″vendor″) products in this manual has been supplied by the vendor and IBM
assumes no responsibility for its accuracy or completeness. The use of this
information or the implementation of any of these techniques is a customer
responsibility and depends on the customer′s ability to evaluate and integrate
them into the customer′s operational environment. While each item may have
been reviewed by IBM for accuracy in a specific situation, there is no guarantee
that the same or similar results will be obtained elsewhere. Customers
attempting to adapt these techniques to their own environments do so at their
own risk.
Any pointers in this publication to external Web sites are provided for
convenience only and do not in any manner serve as an endorsement of these
Web sites.
AIX AS/400
DATABASE 2 DB2
DFSMS DFSMS/MVS
DFSMSdss ESCON
IBM Netfinity
OS/390 Parallel Sysplex
RAMAC RS/6000
S/390 Scalable POWERparallel Systems
SP SP1
SP2 Sysplex Timer
System/390 VTAM
VM/ESA
Microsoft, Windows, Windows NT, and the Windows 95 logo are trademarks
or registered trademarks of Microsoft Corporation.
SET and the SET logo are trademarks owned by SET Secure Electronic
Transaction LLC.
The publications listed in this section are considered particularly suitable for a
more detailed discussion of the topics covered in this redbook.
• Telephone Orders
• Fax Orders
This information was current at the time of publication, but is continually subject to change. The latest information
may be found at the redbooks Web site.
Company
Address
We accept American Express, Diners, Eurocard, Master Card, and Visa. Payment by credit card not
available in all countries. Signature mandatory for credit card payment.
A D
abbreviations 103 data
acronyms 103 application 21
AF_UEINT physical file system 14 backup and recovery 22
application data 21 categories 21
application programs 1 infrastructure data 21
application server interrelationships 22
middleware 1 logical copies 23
recovery planning 29 physical copies 23
recovery site number 61 readiness 26
recovery site requirements 61 system data 21
application services 1, 2, 4, 9 transport and storage 24
ARCHIVE LOG command 51 types 21
archive logs 66 data backup 20
area disasters 15 data backup/recovery process 20
data recovery 20
data sharing 64, 72, 74, 88
B database
backup administration 9
of data 20 server 7
options 22 service 1
backup/recovery OSS note 53 services 3
basis layer 1 Database Interface (DBIF) of SAP R/3 10
bibliography 99 database locking 4
BSDS 52, 65, 75, 77, 87, 88 database server
business impact analysis 18 hardware and software 62
DB2 for OS/390
archive logs 66
C BSDS 52, 65, 75, 77, 87, 88
capacity 29
buffer pools 72, 74
capacity planning 29
catalog 64, 73
catalog 84
catalog and directory 69
catalog (DB2) 64, 69, 73
catalog integrity 71
catalog (ICF) 72
conditional restart 53, 56, 63
catalog integrity (DB2) 71
data sharing 64, 72, 74, 88
categories of data 21
DDF 75
change log inventory utility 65
features 8
client/server
log 55
of SAP R/3 2
resource unavailable message 68
products 1
restart message 77, 78
server 7
SDS 75
command center 28
staff skill requirement 63
COMMIT WORK 7
with SAP R/3 3
compression 9
DDF 75
concurrent copy 23
DFSMSdss 88
conditional restart 63
directory 84
conditional restart (DB2) 53, 56
directory (DB2) 69
consistency group 88
disaster
copies, point-in-time 22
application servers 29
COPY utility 45
area 15
cost of ownership 8
local site 15
coupling facility 72, 88
preventive measures 16
site 15
types 16
E P
Parallel Sysplex 8
environment analysis 18
periodic backup 26
point of consistency 63, 64, 66, 67, 72, 79, 85
G point-in-time copies 22
GDPS 87 PPRC 25
group buffer pools 72, 74 presentation services 1, 9
printing 62
I
ICF catalog 64 Q
incremental copy 24 quiesce 65, 78, 83
index QUIESCE utility 45, 52
High-Speed UDP 4
nonpartitioning 87
rebuild 87
R
R/3
REBUILD INDEX 72, 82
See SAP R/3
start 72, 82
readiness of data 26
SYSIBM.DSNCT02 80
readiness of recovery site 26
SYSIBM.DSNPT01 80
realtime-remote-update 27
SYSIBM.DSNPT02 80
REBUILD INDEX utility 45
user-defined 80
RECOVER INDEX utility 45
indoubt unit of recovery 68, 78
RECOVER TABLESPACE utility 45
infrastructure data 21
recovery
installation
configuration 28
central instance 2
of data 20
interconnection 29
recovery site
IS (Industry Solutions) 4
alternate functions 61
ISPF 65, 74
application load 61
communications facilities 62
L components 61
local site disasters 15 connectivity features 62
locking 4 management 27
log (DB2) 55 operations 27
log data 24 ownership 61
LUW (logical unit of work) 5 peripheral equipment 62
remote options 27
TCP/IP values 61
M remote operations 27
message remote update 27
DB2 restart 77, 78 resource unavailable message 68
DSNR021I 77 risk analysis 16
DSNT500I 78
U
utility
-TERM UTILILITY statements 63
ACTIVE 79
change log inventory 65, 67, 74
COPY 23, 72, 82
DELETE NOSCRATCH 64
DIAGNOSE 69, 79, 86
display 79
displaying inflight 69
DSN1LOGP 66, 75, 84
IDCAMS 65, 72, 73, 86
IEBGENER 66, 75
inflight 67
print log map 65, 66, 68, 74, 75
REBUILD 86
REBUILD INDEX 72, 78, 82
RECOVER 68, 69, 70, 77, 79, 80, 82, 84, 86
RECOVER INDEX 78
Index 107
108 SAP R/3 on DB2 for OS/390: Disaster Recovery
ITSO Redbook Evaluation
SAP R/3 on DB2 for OS/390: Disaster Recovery
SG24-5343-00
Your feedback is very important to help us maintain the quality of ITSO redbooks. Please complete this
questionnaire and return it using one of the following methods:
• Use the online evaluation form found at http://www.redbooks.ibm.com/
• Fax this form to: USA International Access Code + 1 914 432 8264
• Send your comments in an Internet note to redbook@us.ibm.com
Please rate your overall satisfaction with this book using the scale:
(1 = very good, 2 = good, 3 = average, 4 = poor, 5 = very poor)
Overall Satisfaction ____________
Was this redbook published in time for your needs? Yes____ No____
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
IBM
SG24-5343-00