B-Whitepaper Ibm Gdoc Cont Avail DR Solu 02-09 Zsw03064.en-Us

GDOC: The Distributed Continuous Availability / Disaster Recovery Solution February 2009
GDOC
Geographically Dispersed Open Clusters

Providing Continuous Availability & Disaster Recovery for Distributed Systems
(Methodology Adapted for Symantec Storage Foundation HA/DR Technology)
Teddi Maranzano teddim@us.ibm.com Funso Daramola fdaramola@us.ibm.com Alfredo Fernandez afernand@us.ibm.com
GDOC Geographically Dispersed Open Clusters Page 1
Table of Contents
Abstract......................................................................................................................................................... 2 Executive Summary ...................................................................................................................................... 2 Lessons Learned about IT Survival .............................................................................................................. 3 What is GDOC?............................................................................................................................................. 4 Recovery Automation ................................................................................................................................ 5 Data Replication ........................................................................................................................................ 5 Replication Modes ................................................................................................................................. 6 Replication Technologies ...................................................................................................................... 8 Testing ..................................................................................................................................................... 10 Monitoring................................................................................................................................................ 10 Agents.................................................................................................................................................. 11 Notification ........................................................................................................................................... 12 Management Console.......................................................................................................................... 12 Monitoring DR Readiness .................................................................................................................... 13 GDOC Architecture..................................................................................................................................... 13 Case Study.................................................................................................................................................. 14 GDPS-GDOC Interface ............................................................................................................................... 15 IBM Global Technology Services (GTS) Offerings ..................................................................................... 16 Summary ..................................................................................................................................................... 17 Additional Information ................................................................................................................................. 17
Abstract
Geographically Dispersed Open Clusters (GDOC) is a high availability and disaster recovery solution for highly critical applications with aggressive recovery point and recovery time objectives. GDOC is a solution based on both IBMs extensive high availability and disaster recovery experience, and on high availability and disaster recovery software from Symantec. This white paper describes the GDOC solution in the context of distributed systems (such as AIX, Solaris, HP/UX, Linux, and Microsoft Windows) and it also describes an interface to Geographically Dispersed Parallel Sysplex (GDPS), IBMs corresponding continuous availability / disaster recovery Solution on mainframe systems (z/OS).
Executive Summary
Unlike other business investments in infrastructure and inventory that can immediately increase revenue or decrease costs, investments in disaster recovery (DR) are realized when extended and unplanned downtime threatens the viability of the enterprise. Therefore corporate management is often reluctant to make large investments in disaster recovery unless they have personally been involved in an event that impacted areas such as revenue, customer satisfaction, and regulatory compliance. Even delaying system maintenance to avoid downtime can jeopardize operations of the enterprise when known hardware or software issues are not addressed, or when business critical applications continue to run on platforms that are no longer supported by the various vendors. Further, in the era of the global economy, many industries are under mandates to have business continuity plans in place. For example, in 2003 the Federal Reserve, the U.S. Office of the Comptroller of the Currency and the U.S. Securities and Exchange Commission (SEC) created new business continuity objectives for all financial institutions. On the international level, the Basel II Accords and the standards proposed by the International Accounting Standards Board include frameworks for managing physical risks. What is the impact to your enterprise if your business critical applications became unavailable? Would you lose revenue? Would you lose customers? Would you be in violation of regulatory requirements? Would your competition be able to capitalize on your extended downtime? Calculating the impact and costs of downtime and data loss is not a simple process because some of the effects are less recognizable than others. For example, while revenue lost during the outage can be measured, the impact to the enterprises reputation and goodwill is less apparent. When the business impact analysis and risk assessment are completed, the value of investing in a disaster recovery solution will become clear.
In response to the various mandates, many large enterprises have deployed some type of recovery solution. In many cases it evolves as a piecemeal DR solution, incorporating the technologies already in-house to supplement manual processes. Also, as corporations merge, the resulting enterprise may inherit a variety of DR solutions from various vendors using very different technologies, designs and business processes. Ultimately, these solutions become more complex resulting in solutions that are expensive, labor-intensive, and that often fail when tested (if tested), and require the IT organization to learn how to manage multiple components. GDOC is designed to help the business enterprise meet these challenges by streamlining the automation, testing and management requirements of a world-class recovery solution.
Lessons Learned about IT Survival

Events such as those on September 11, 2001 in the United States and more recent events such as the 2003 power failure in the Northeast United States and Hurricane Katrina in 2005 show how critical it is for businesses to be ready for both expected and unexpected interruptions. Various agencies, including the Federal Reserve, the Securities and Exchange Commission, and the Office of the Comptroller of the Currency met with industry participants to discuss lessons learned about IT survival. The following is a summary of those lessons: Geographical separation of facilities and resources is critical to maintaining business continuity. Any resource that cannot be replaced from external sources within the Recovery Time Objective (RTO) should be available within the enterprise, in multiple locations. This not only applies to buildings and hardware resources, but also to employees and data, since planning employee and data survival is very critical. Allowing staff to work out of a home office should not be overlooked as one way of being DR ready. Depending on the RTO and Recovery Point Objective (RPO) - RTO and/or RPO are typically expressed in hours or minutes - it may be necessary for some enterprises to implement an in-house DR solution. If this is the case, the facilities required to achieve geographical separation may need to be owned by the enterprise. The installed server capacity at the second data center can be used to help meet normal day-to-day data processing needs and fallback capacity can be provided either by prioritizing workloads (production, test, development, data mining) or by implementing capacity upgrades based on changing a license agreement, rather than by installing additional capacity. Disk resources need to be duplicated for disk data that is mirrored. Recovery procedures must be well-documented, tested, maintained and available after a disaster. Data backup and/or data mirroring must run like clockwork all the time.
It is highly recommended that the DR solution be based on as much automation as possible since in case of a disaster, one cannot assume that key skills will be readily available to restore IT services. The enterprises critical service providers, suppliers and vendors may be affected by the same disaster, therefore, one must enter into a discussion with them about their DR readiness.
What is GDOC?
GDOC is IBMs services framework and methodology for architecting high availability and disaster recovery across distributed platforms such as AIX, Solaris, HP/UX, Linux, and Windows. A GDOC solution is a combination of Symantec high availability software and IBMs service delivery methodology. GDOC uses Symantec Veritas Cluster Server HA/DR as the mechanism to automate recovery, and offers a choice of many different data replication technologies. While the focus of this whitepaper is on high availability and disaster recovery on the distributed platforms, it should be noted that there is a Veritas Cluster Server (VCS) agent for GDPS (IBMs corresponding continuous availability / disaster recovery Solution on z/OS). For enterprises that wish to automate both the distributed and mainframe platforms, recovery of both environments can be driven from a single console on z/OS using the VCS agent for GDPS to integrate both the GDOC and GDPS solutions. In the event of an outage or disaster, it can be quite challenging to manually coordinate isolated pockets of processes and procedures necessary to achieve minimal down time. Multiple manual processes are often used to implement and coordinate recovery plans that are inherently error-prone and could significantly offset time and resources invested in even the best of IT solutions, leading to further unplanned and costly downtime. GDOC addresses these problems by automating recovery at both the primary site and the disaster recovery site (a secondary customer site or a third party site). In addition to recovery automation, GDOC also has a proactive method for measuring DR-readiness using a simple, non-disruptive testing feature of Veritas Cluster Server called Fire Drill, including a point-intime copy of data at the secondary site. GDOC is an enterprise solution that automates recovery from planned and unplanned failures across multiple platforms within a single site and across sites.
Recovery Automation
The GDOC methodology places particular emphasis on automation in order to make disaster recovery more reliable. With GDOC, the business has the opportunity to make a business decision on whether or not to move operations to the DR site. This is not a decision for a piece of software or an administrator to make. The administrators job in the event of a disaster is to inform senior management of the actual event, what business critical applications are affected, the SLAs of those applications, and the estimated time to resolve the problem that caused the event. Senior management will then make an expedited decision on whether or not to declare a disaster. If a disaster is declared, instructions will be passed back to the administrator that site operations are to be moved. The administrator can start the recovery at the DR site with just a few clicks. From this point forward the recovery will be completely automated automated, but not automatic. The need for near continuous availability of business critical applications, spanning multiple OS platforms and hardware technologies, drives the importance of automation in any enterprise HA/DR solution. In GDOC, automation is used to replace several repeatable activities that would otherwise require human intervention. VCSs core engine, the High Availability Daemon (HAD) communicates with various resources within the cluster through agents. These agents monitor and orchestrate various components, and specify policies to act deterministically in case of failures and other events. VCS maintains an open framework to accommodate applications for which an agent doesn't currently exist. Accelerated development is possible using this standard agent framework and IBMs development experience.
Data Replication
Moving critical data between servers used to mean shuttling backup tapes to the DR site for restoration there. The notable shortcomings with this approach are extended down time and unnecessary loss of data. But todays relatively low cost of technology makes data transmission over high speed networks and across long distances the superior choice. Choosing the best replication mode (point-in-time, asynchronous, or synchronous) depends on a thorough understanding of the business application. A Business Impact Analysis will determine how critical an application is to the business organization relative to other applications, and some of the impacts of data loss for that application. The result of the Business Impact Analysis will be two key factors in the IT Business Continuity plan: the Recovery Point Objective (RPO) and the Recovery Time Objective (RTO). The RPO is a business decision. It represents how much data can be lost before the business organization begins to suffer. It is the result of combining the applications I/O
profile with various replication technologies and then factoring in the network bandwidth. The RTO is a business process decision and is part of the design of the failover mechanism. It indicates how quickly the application must be restarted in order to avoid negatively impacting the business. Replication Modes Periodic, or point-in-time replication, is creating a snapshot of the data and copying it to the target site. The impact on the application is only while the snapshot is being made, although it may be necessary to suspend writes in the application database. While this may be appropriate for small and relatively static files, it represents the highest exposure to data loss. Any data written to the primary volumes between snapshots will be lost. Asynchronous replication (Figure 1) significantly reduces the exposure to data loss but may require additional infrastructure to minimize any performance impact to the application. Each write operation to the primary storage volume is almost immediately duplicated to the remote site. The application can continue processing almost immediately without waiting for an acknowledgement that the write operation completed successfully. The data replication technology provides data consistency by writing data at the remote site in the same order in which it was written at the local site. If an outage occurs, in-flight I/Os (data written to the primary volume but not yet written to the remote site) will be lost. Intervention by the storage administrator or database administrator should not be required to re-establish database integrity. Synchronous replication (Figure 2) offers the least exposure to data loss, but has the highest impact on application performance. Application write operations are simultaneously sent to the volumes at the primary and remote sites. Because the application waits for acknowledgment that both write operations completed successfully, application performance is negatively impacted as distance increases. Two important considerations when choosing a replication technology are bandwidth and latency. Network bandwidth, also called throughput, measures the amount of data that can move over the channel per unit of time. Latency is the time required to complete a write operation due to the finite speed of light in optical fiber plus the delays caused by routers and switches. Latency has an impact on the performance of write operations when synchronous replication is used. Network bandwidth must be adequate for both synchronous and asynchronous replication to achieve the recovery point objective.
When choosing a replication mode, consider the requirements of the application. In synchronous mode, can the application tolerate waiting for the acknowledgement of the write operation from the DR site, or will the transactions time out and fail? Also consider the distance between the primary site and the DR site. The recommended range for synchronous replication is up to 100 km.
Primary Site
Recovery Site
latency
Database Server
Wait on remote write to continue processing
Acknowledge remote write
Disaster Recovery Server
Write to remote target
Database
Copy of Database
Examples: Veritas Volume Replicator, IBM Metro Mirror, EMC SRDF
Figure 1: Synchronous Replication
Primary Site
Recovery Site
Database Server latency Do not wait on remote write; continue processing
In parallel, asychronously, acknowledge remote write In parallel, asychronously, write to remote target
Disaster Recovery Server
Database
Copy of Database
Examples: Veritas Volume Replicator, IBM Global Mirror, EMC SRDF/A
Figure 2: Asynchronous Replication
Replication Technologies GDOC is an open and flexible solution that works with various replication technologies to deliver the desired RPO and RTO. With volume-based replication, also called host-based, the replication technology is implemented as a software product running in the server environment, usually along with the business application. It is usually managed by the server administrators. The replication software intercepts write operations to the primary volume or file system and duplicates them to a secondary volume or file system. For asynchronous replication mode, server-based replication technologies contain the functionality to help ensure data consistency across sites. While being volume-based makes this replication technology independent of the storage subsystem, it can impact the applications performance by adding complexity and consuming valuable CPU and memory server resources.
Examples of site or campus-wide volume-based mirroring technologies include Veritas Volume Manager and AIX LVM. Remote replication technologies include Veritas Volume Replicator, Double-Take (Windows), IBM/Softek Replicator for UNIX, and AIX GLVM. Storage-based replication is functionally similar to volume-based replication on the server in that write operations from the application result in writes to volumes in the primary storage array and writes to corresponding volumes in the secondary storage array. Many storagebased replication technologies have the volume-based replication technology embedded within the storage subsystems controller. Storage-based replication is a leading technology in the IT marketplace, and is usually managed by storage administrators. It has the advantage of not consuming server resources and can support multiple server types, but the replication function can add overhead to the storage subsystem. Additionally, it often locks the enterprise into a specific storage platform at both the primary and remote sites. Examples of storage-based replication include IBM Metro Mirror, IBM Global Mirror, EMC SRDF, and HDS TrueCopy. Database replication, another major replication technology, is provided by specially engineered database management engines. While it works well for managing database consistency, it adds overhead and complexity to the database and it does not provide replication services for any data outside of the database. Management of database replication is usually done by database administrators, who may have to manually intervene to re-synchronize the database. Popular examples of database replication include DB2 Universal Database HA/DR, Sybase Replication, Oracle Advanced Replication, and Informix Dynamic Server. When planning for a replication technology, consider which technologies is currently in-place and whether they are performing effectively. Weigh the time and cost to implement the technology against the ease of operation and the data integrity that it provides. In some cases, different replication technologies may be in use within an enterprise depending on the application platforms in play and the specific business requirements for a given application. Fortunately, the GDOC solution design coupled with the VCS technology can support these types of complex environments.
Symantec provides replication agents for VCS that allow Cluster Server to monitor system and application resources. Please see the Symantec Corporation Web site for more information on supported Cluster Server agents:
http://www.symantec.com/business/products/agents_options.jsp?pcid=pcat_business_cont &pvid=20_1
Testing
Testing DR capability has become a requirement in todays business environment. The behavior of a VCS cluster in a GDOC environment can be predicted and tested without the risk of any downtime. With the VCS Fire Drill tool, recovery readiness can be routinely validated without interrupting production processing at the primary site, without the extensive planning, cost and disruption that are usually associated with traditional DR testing. VCS Fire Drill is executed at the DR site. Fire Drill builds a near copy of the production service group being tested, leaving the network components out of the copy. VCS Fire Drill will use the native snapshot capability of the replication being used in the GDOC environment. VCS then tests whether or not the Fire Drill copy can be started against the snapshot of the replicated data at the DR site. If it can be started, the automated start-up of the business critical application will succeed in the event of a disaster. If it fails, information will be provided as to why, and any problems can be remediated, and the environment can be tested again. These testing features of VCS are incorporated into the design of a GDOC solution to provide rigorous and exhaustive testing of the cluster throughout the phases of deployment.
Monitoring
Organizations are now deploying business critical solutions with components spanning multiple system platforms. Therefore it is important to track events as they occur across such heterogeneous environments in order to monitor the health of any IT infrastructure. IT departments are usually faced with the task of delivering solutions that will provide prompt and reliable event monitoring. Installing and maintaining Veritas Cluster Server as a single product across several different Open System platforms enables you to use a single solution to manage multiple, complex integrated systems. VCS uses programs called agents1 to monitor and recover cluster2 components called resources3 based on configurable parameters.
Agents are programs that manage computer resources, such as a volume group or IP address, within a node in a cluster environment. Each type of resource requires an agent. An agent can manage multiple resources of the same type. 2 A VCS cluster comprises of related hardware and software components called Resource.
Figure 3: Veritas Cluster Server Agents Agents Veritas Cluster Server communicates through its agents (Figure 3) with various resources within each node in the cluster to monitor, control and recover resources. There are primarily three types of agents; bundled, enterprise and custom. Bundled agents such as NIC, Mount and Notifier are installed as part of VCS software. Enterprise agents can be installed as optional packages and are available for major applications such as DB2, Oracle, DataGuard, WebSphere and WebLogic. These enterprise agents have predefined functions compiled into their framework that interfacewith specific characteristics of its enterprise application resource type. These functions provide the interface necessary to manage complex application resources through common configurable options. Veritas Cluster Server comes with over fifty agents and new applications are supported each quarter with the quarterly Agent Pack release, thus reducing the consulting costs that come with custom development.
Resources are entities that can be managed, such as an application, file system, or database. Resources are classified into types based on common definition. The resources of the same type within a cluster node are managed by an agent.
If a compatible bundled or enterprise agent is not available for a resource, then a custom4 agent can be developed by following published guidelines . Based on experience deploying GDOC HA/DR Solutions, IBM has developed custom agents for several uncommon resource types that can be deployed without needing any further development. Notification A GDOC solution uses the Notifier agent for providing notifications on cluster events. Cluster events captured by the Notifier agent are sent to the Cluster Server Management Console, and can also be forwarded as SNMP traps to a SNMP V2 MIB compatible enterprise monitoring tool such as HP Openview and IBM Tivoli Monitoring. The Notifier agent can also use Veritas Cluster Server triggers to simultaneously send the events as an e-mail alert to a usersupplied recipient list. During GDOC implementation, IBM works closely with clients to seamlessly integrate the event notification capabilities of the GDOC solution with any existing enterprise monitoring system. Management Console A typical GDOC environment includes HA within a site and DR across multiple sites. The VCS Management Console is a tool that provides centralized management for an entire GDOC environment from a single console. Other features of the VCS Management Console are: Centralized monitoring and control Centralized point for deploying configuration changes Management of multiple cluster environments from almost anywhere Cluster capacity trend analysis One-click migration Failed site recovery Also, the robust capabilities of the VCS Management Console CLI and API interfaces make it possible to integrate with other enterprise management solutions.
5
4 5
Custom agents are not supported by Symantec Technical Support The guidelines can be found in Veritas Cluster Server Agent Developers Guide
Monitoring DR Readiness As described earlier, regular testing at the DR site validates DR-readiness. Continuous monitoring of key components provides reassurance of DR-readiness between regular tests. Monitoring is typically done in the context of the existing systems management framework using existing monitoring tools. The types of enhancements to monitoring that are required for DR-readiness include monitoring key files on internal disks at each site, and monitoring the state of data replication.
GDOC Architecture
Primary Site for the production application Secondary Site for application recovery Enable remote recovery by adding:
infrastructure at second site Data replication across sites Non-disruptive remote recovery testing using point-in-time copy DR readiness monitoring
VCS Cluster Active/Standby
Automated Recovery
Asynchronous Data Replication
Monitoring
Point-in-Time Copy
Figure 4: GDOC Architecture The GDOC Architecture (Figure 4) provides a solution is based on automating application startup, shutdown, and recovery from a failure using Veritas Cluster Server HA/DR. Veritas Cluster Server is used because it supports AIX, Solaris, HP/UX, Red Hat Linux, SUSE Linux, VMware, and Windows. There are four key elements to the GDOC solution: Redundant infrastructure within the production site and at a secondary site Asynchronous data replication using a supported replication technology chosen by the client Point-in-time copy of data used for non-disruptive disaster recovery testing Recovery automation using Veritas Cluster Server HA/DR
Case Study
VCS DB Cluster Active / Standby
Improve Local Availability by adding: Local clustering
Host Based Mirror (M1)
Host based mirroring Point-in-time copy
SVC Cluster 1
Host Based Mirror (M2)
SVC Cluster 2
Asynchronous Replication (Global Mirror)
M1
M2
Improve Remote Availability by adding: Data replication to remote site Remote point-in-time copy Remote clustering
Disk Group A
Disk Group A
Disk Group A
Figure 5: Both local and geographically remote high availability Availability within a Metropolitan Area The reference architecture in the Case Study (Figure 5) shown above is primarily for the database server and middleware servers at the primary site only. Static servers such as Web servers can be redundant but do not need to share a data repository. Veritas Volume Manager is used to mirror the critical database files across IBM SAN Volume Controller clusters. This type of host-based mirroring prevents outages caused by a failure of any SAN component, including a storage array. The two servers shown in the diagram above can be physically next to each other or they can be in adjacent buildings on the same campus (while still on the same storage area network). Veritas Cluster Server is used to protect the database or middleware (i.e. message hub) servers against server failures and to minimize downtime during planned outages. Both servers have physical access to the disks that contain the database or data store, but only one server logically controls the files at any given time. FlashCopy is used to create point-in-time copies of data that can provide a backup in case of logical data corruption. Additional point-in-time copies can also be made to refresh a test or quality assurance environment.
Both host based mirroring and FlashCopy provide a contingency (backout capability) for major upgrades or changes to the environment. Availability across a Wide Area To enable rapid recovery in the event of a disaster, hardware, software, and network infrastructure is deployed at a secondary site. Data is replicated to the secondary site using storage, database, or volume-based replication. The same application recovery automation used at the primary site is extended to the secondary site. In the case of a site-level disaster the remote recovery site is used to continue processing. A point-in-time copy of the data is used at the recovery site to perform non-disruptive disaster recovery testing. Data replication is not stopped, even during a test, so that the Recovery Point Objective (RPO) service level is not suspended. The startup of the application(s), using a point-in-time copy of data, is fully automated.
GDPS-GDOC Interface
With the introduction of the Veritas Cluster Server (VCS) agent for IBM Geographically Dispersed Parallel Sysplex (GDPS) Distributed Cluster Management (DCM), the z/OS environment can now be integrated with the GDOC Open Systems environment. With this agent, GDOC can participate in coordinated, cross-platform recovery that is managed by the GDPS DCM console. The agent is installed in a global service group at either the primary or secondary site within a Veritas Cluster Server global cluster. By connecting to the GDPS DCM console, the agent provides periodic cluster status information to GDPS. The agent also executes VCS commands on behalf of the GDPS DCM console to control the VCS environment and sends trigger alerts to GDPS. For example, the agent can respond to GDPS DCM requests for Veritas Cluster Server to stop a service group or cluster, switch a cluster to a remote site, or declare a site failure. The Veritas GDPS DCM agent requires: IBM GDPS 3.5 (and all requisite products) Veritas Cluster Server HA/DR 5.0 in a supported AIX, HP-UX, Linux or Solaris configuration Please review Veritas Cluster Server Agent for IBM GDPS DCM Installation and Configuration Guide for more information about the capabilities of this VCS agent.
IBM Global Technology Services (GTS) Offerings

The following GDOC services and offerings are available from IBM Global Services Technical Consulting Workshop (TCW) The Technology Consulting Workshop, or TCW, is a 2-day onsite workshop that helps you gain consensus within the organization on what your requirements are, what a high level solution might be, and what the next steps should be. There are pre-workshop activities and a pre-workshop questionnaire that enables IBM to gather as much information in advance as possible and to set expectations on which topics will be discussed during (and after) the workshop. The pre-workshop questionnaire also helps identify the necessary attendees for the workshop (those that can provide the required information) There are also post-workshop activities that result in a TCW Summary Report provided to you that includes:
Finding
Importanc
Findings Recommendations Next Steps
E C
Action A- C- E-
Recommendations Project Plan
D B
CCA Manager Console
CCA Manager Console
Month11 Week
Month22 Week
Month 3 Week3
Month 4 Week5
Month75 Week
Month12 Week 6
Fram
Any Web Browser (or telnet client) CCA Management Cluster CCA Management Cluster
Suppor
Primary Server High Availability Standby
Mobilis Any Web Browser (or telnet client) Supply chain Discovery
Determine KPI Disaster Recovery Server Share leading
Asses Pla
Mobilise
CCRs agree
VCS Cluster w/ Global Cluster Option
One (or Two) Node Determine design VCS Cluster w/ Global Conduct visioning Cluster Option
Desig
Oracle Instance(s) Data Replication
Vision
Complete Oracle Instance(s) Determine
Desig
Impleme
Trai Pilo Rollou Liv
Sustai
IBM Implementation Services for Geographically Dispersed Open Clusters (GDOC) This is a multi-vendor solution designed to protect the availability of critical applications that run on UNIX, Microsoft, Windows, VMware or Linux operating system based servers. GDOC is based on an Open Systems Cluster architecture spread across two or more sites with data mirrored between sites to provide high availability and disaster recovery. It is designed to provide you with similar functionality for open systems that GDPS provides for the IBM System z mainframe running z/OS. This type of solution can provide a much shorter recovery time for critical business applications, and is easier than recovering from tape backup, or replicating data with manually initiated recovery processes. IBM and Symantec have co-developed an integration that links GDOC and GDPS and provides benefits such as enterprise level disaster recovery and single console management and monitoring of both the mainframe and distributed disaster recovery platforms.
GDOC is a services framework and methodology that includes the integration of Veritas Cluster Server and associated software modules from Symantec Corporation. The solution comes with a base set of implementation services including: Assessment and planning Design Solution build Testing and deployment
Summary
Many of todays medium and large enterprises want to deploy in-house continuous availability or disaster recovery solutions that provide low (< 2 hours) recovery point objectives and low (< 4 hours) recovery time objectives for the most critical applications. GDOC improves local availability by designing, implementing and testing server clustering, host based mirroring, and point-in-time data copies. GDOC provides continuous availability with infrastructure at a second site, data replication across sites, non-disruptive remote recovery testing using point-in-time copy, disaster recovery readiness monitoring, and fully automated recovery. GDOC is a key component of a disaster recovery solution for the most critical applications that run on either Windows or UNIX/Linux systems. In addition to data redundancy provided by continuous data replication, GDOC provides continuous availability through the implementation of automated recovery, continuous monitoring of application recoverability and the ability to regularly test disaster recovery with minimal effort.
Additional Information
Veritas Data Center Software: http://www.symantec.com/business/theme.jsp?themeid=datacenter GDOC Home Page: http://www-935.ibm.com/services/us/index.wss/offering/its/a1026541 IBM Optimization and Integration Services: http://www-935.ibm.com/services/us/index.wss/offerfamily/gts/a1027708
Corporation 2009
IBM Systems and Technology Group Route 100 Somers, New York 10589 U.S.A. Produced in the United States of America, 02/2009 All Rights Reserved IBM, IBM logo, AIX, DB2, DB2 Universal Database, FlashCopy, GDPS, Geographically Dispersed Parallel Sysplex, Informix, System z, WebSphere and z/OS are trademarks or registered trademarks of the International Business Machines Corporation. Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom. InfiniBand and InfiniBand Trade Association are registered trademarks of the InfiniBand Trade Association. Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Symantec, the Symantec Logo, and Veritas are trademarks or registered trademarks of Symantec Corporation or its affiliates in the U.S. and other countries. UNIX is a registered trademark of The Open Group in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office. IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency, which is now part of the Office of Government Commerce. All statements regarding IBMs future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the users job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput improvements equivalent to the performance ratios stated here.
ZSW03064-USEN-00

B-Whitepaper Ibm Gdoc Cont Avail DR Solu 02-09 Zsw03064.en-Us

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

B-Whitepaper Ibm Gdoc Cont Avail DR Solu 02-09 Zsw03064.en-Us

Uploaded by

Copyright:

Available Formats

GDOC: The Distributed Continuous Availability / Disaster Recovery Solution February 2009

Geographically Dispersed Open Clusters

Teddi Maranzano teddim@us.ibm.com Funso Daramola fdaramola@us.ibm.com Alfredo Fernandez afernand@us.ibm.com

GDOC Geographically Dispersed Open Clusters Page 1

GDOC Geographically Dispersed Open Clusters Page 2

GDOC Geographically Dispersed Open Clusters Page 3

Lessons Learned about IT Survival

GDOC Geographically Dispersed Open Clusters Page 4

GDOC Geographically Dispersed Open Clusters Page 5

GDOC Geographically Dispersed Open Clusters Page 6

GDOC Geographically Dispersed Open Clusters Page 7

Wait on remote write to continue processing

Acknowledge remote write

Disaster Recovery Server

Write to remote target

Examples: Veritas Volume Replicator, IBM Metro Mirror, EMC SRDF

Figure 1: Synchronous Replication

GDOC Geographically Dispersed Open Clusters Page 8

Database Server latency Do not wait on remote write; continue processing

Disaster Recovery Server

Examples: Veritas Volume Replicator, IBM Global Mirror, EMC SRDF/A

Figure 2: Asynchronous Replication

GDOC Geographically Dispersed Open Clusters Page 9

GDOC Geographically Dispersed Open Clusters Page 10

GDOC Geographically Dispersed Open Clusters Page 11

GDOC Geographically Dispersed Open Clusters Page 12

GDOC Geographically Dispersed Open Clusters Page 13

VCS Cluster Active/Standby

Asynchronous Data Replication

GDOC Geographically Dispersed Open Clusters Page 14

Improve Local Availability by adding: Local clustering

Host Based Mirror (M1)

Host based mirroring Point-in-time copy

Host Based Mirror (M2)

Asynchronous Replication (Global Mirror)

GDOC Geographically Dispersed Open Clusters Page 15

GDOC Geographically Dispersed Open Clusters Page 16

IBM Global Technology Services (GTS) Offerings

Findings Recommendations Next Steps

Recommendations Project Plan

CCA Manager Console

CCA Manager Console

Determine KPI Disaster Recovery Server Share leading

VCS Cluster w/ Global Cluster Option

Complete Oracle Instance(s) Determine

GDOC Geographically Dispersed Open Clusters Page 17

GDOC Geographically Dispersed Open Clusters Page 18

You might also like