Professional Documents
Culture Documents
1
Contents
2
Executive Summary
The idea that data is a key business asset is hardly disputable. Even so, how to affix a direct financial value on data
remains a matter of debate. However, in a practical sense, the real value a business places on its data may be inferred
by how seriously it protects it. Data protection is one of the core elements of any Disaster Recovery plan. Maintaining
data availability is likewise essential to overall business continuity efforts. Difficult economic environments such as we
have experienced in the past year put commitment to data protection under challenge, with revealing results.
A trend toward increasingly aggressive recovery time objectives (RTOs) and recovery point objectives (RPOs)
is evident across the platform. But there remains evidence that, for many firms, the appropriate technologies for
achieving their RTOs and RPOs are not in place.
Data protection and high availability are related but still distinctly different objectives. Current findings support
increasing adoption of solutions to address both areas, but keeping data and applications continuously available has
not been addressed as fully or consistently as protecting against catastrophic data loss.
Traditionally, DR capabilities have been viewed as insurance against major natural or manmade disasters. But
increasingly, businesses of all sizes are realizing the significant impact of the many mini disasters that occur quite
frequently: power or network outages, maintenance and upgrade work, and even the corruption or loss of individual
files. Technologies for recovering from and actively avoiding these events are maturing and gaining support.
IBMs introduction of Power Systems servers has begun to blur the lines between their two previously distinct
groups of midrange server customers. The advantages offered under this converged platform are attractive to
businesses across the board. Despite the current state of the global economy, many firms are managing to find
innovative and effective ways to reduce the cost and complexity of migrating to Power.
1
Executive Summary
Next Steps
Evaluate the costs associated with downtime of any duration from any cause.
Set measurable goals for systems and data recovery speed and completeness (RTOs and RPOs) based upon the
needs of the business, not the capabilities of current technologies and procedures. Then, identify and work toward
implementing technologies and processes that will support your objectives effectively.
Where appropriate technologies are in place, ensure that the companys actual ability to recover from downtime or
data loss events is in fact sufficient to meet stated recovery and availability objectives. Live testing of systems and
procedures is necessary, for both the training and readiness of personnel and the evaluation of current capabilities
versus present and future needs.
2
Foreword: Resilience Under Pressure
Without a doubt, data has become the raw material of the information economy. If companies dont know what
[their data] is worth, they cant enhance, protect or measure the value of the data to the bottom line. Data isnt a
normal commodity. Its like water out of a tapvital to life yet so often taken for granted. 1
In our previous report on the State of Resilience on IBM Power Systems, we noted that working business leaders
understand intuitively the value of their data and know that they must daily address the practical, tangible issues
of protecting their business information from loss or corruption and putting it to work effectively, efficiently, and
profitably.
But just how that intuitive understanding is translated into pragmatic, front-line actions, in the datacenter and
throughout the business, varies from company to company, both in methods and in level of commitment and success.
Protecting the availability and even the existence of this digital water of life is certainly only one of many mission-
critical business objectives that executives must address. The final measure of business leadership is the ability to
identify and execute the right priorities with the optimum balance of resource allocation.
No one disputes the old saying time is money. The maxim that data is money, though not as deeply ingrained in
common business practice, is still taking increasingly firm hold in the minds of business leaders in companies of all
sizes and across all industries.
But just how strong is this belief? Well, there is nothing like adversity to test what one truly believes in. Actions taken
under pressure tell the true tale. And in reviewing the State of Resilience on IBM Power Systems over this past year,
one cannot escape the overriding reality of one of the most difficult business years ever endured. In every region of
the world, for giant global enterprises and small businesses alike, the recessionary economy has put success, even
survival, not just under pressure but under direct threat.
1
Steven Adler, from the CIO.com article Six Steps to Data Governance Success, May 31, 2007
3
Historically, companies that have adopted IBM Power Systems (and all their predecessors) have done so because of the
reliability, flexibility, and sheer power that these systems provide. As we review how these same demanding, forward-
thinking companies are dealing with the issues of data, application, and business resilience, we must keep in mind the
backdrop of this past years business environment. It is truly an informative story of resilience under pressure.
Report Focus
As part of its core mission, the Information Availability Institute (IAI) regularly surveys the IBM i and AIX customer
base, seeking to understand current and developing information availability and data protection needs. And clearly,
the range and sophistication of the tools and technologies available to business of all sizes to address these needs have
grown right along with the market.
The scope and depth of our survey efforts were increased this year, with additional attention to how responses differed
among companies of differing sizes. In addition to reviewing core factors surrounding information availability (RTO,
RPO, etc.), we sought to deepen our understanding of not only what technologies and methods are being applied, but
also how fully and effectively they are being implemented.
The data set used for this report includes the results of 18 surveys, conducted between September 2008 and October
2009. These surveys were conducted online using Web-based survey tools, as well as in person at gatherings of IBM i
and AIX professionals such as user groups and trade expos. In total, responses were received from nearly 4,000 technical
professionals and executives involved in the management of IBM i and AIX environments.
Responses were received from all geographies worldwide, with over 100 countries represented. Not all questions
were presented in all surveys. The most fundamental questions were asked in nearly all surveys, while questions on
4
some specific topics were posed in as few as one targeted research survey. To ensure validity, the results in this report
include only those questions where at least a few hundred responses were available for analysis.
Throughout this report, we will identify responses as pertaining to users of the IBM i and AIX operating systems. As a
practical matter, under these umbrella identifiers, we include all prior variants of the IBM i (i5/OS, OS/400) and do not
differentiate between, for example, AIX running on a legacy System i or System p server.
To begin our study, we asked questions about our survey participants perceptions of their organizations current
readiness to handle business interruptions (planned and unplanned downtime events).
1. What is your organizations recovery time objective (RTO) after a disaster or complete server
or application failure?
5
The State of Disaster Recovery Expectations
15%
With regard to RTOs, we expected and in fact found a bias
toward shorter timeframes as the business size increases. 10%
Although the differences between the three sets of 5%
responses within each RTO range option were no greater 0%
than 7 percent, looking again at the sum of responses for <1 hr 1 to 6 6 to 12 12 to 24 24 to 48 >48 hrs
hrs hrs hrs hrs
each group for 12 hours or less, we see the trend clearly:
Figure 3
63 percent for small businesses, 69 percent for medium,
and 72 percent for large enterprises.
6
The State of Disaster Recovery Expectations
Key Finding
Overall, AIX and IBM i shops, both pure and mixed, reported more aggressive RTOs, with distinct 10 percent
growth in the number of those targeting six hours or less. Data also seems to confirm that the larger a business
becomes, the more expensive downtime of any type becomes, with the result that increasingly shorter recovery
time goals are established.
2. What is your organizations recovery point objective (RPO), expressed in time? That is, what
is the maximum volume of transactions you are willing to lose as a result of a disaster or a
complete server or application failure?
20%
But there still remains a noticeable contingent of respon-
15%
dents whose organizations are apparently not especially
worried about losing many hours or even a days worth 10%
7
The State of Disaster Recovery Expectations
40% 40%
IBM i
Small
AIX
30% 30% Medium
Both
Large
20% 20%
10% 10%
0% 0%
No data A few Up to A few 1 day More than No data A few Up to A few 1 day More than
loss min 1 hr hrs 1 day loss min 1 hr hrs 1 day
Figure 5 Figure 6
Splitting the responses by operating system, the responses from IBM i users were roughly similar to those of AIX
users. But there is some evidence that shops that use both operating systems had a somewhat stronger bias toward
No Data Loss than either of the other groups.
Finally, reviewing the results for the RPO question split by company size, we found a distinct trend toward decreasing
tolerance of data loss as company size increased.
8
The State of Disaster Recovery Expectations
3. How confident are you that your companys Disaster Recovery (DR) plan for IT systems is
complete, tested, and ready to execute?
The current years results echo last years, with very little Overall DR Plan Confidence
change in any category. Though not shown here, in line 30%
with last year, the differences in responses when split by
25%
operating system were also minimal.
20%
Figure 7
4. What concerns do you have about the completeness of your Disaster Recovery (DR) plan for
IT systems?
This year, we wanted to dig a bit deeper into this question of confidence in DR readiness, so we invited all respondents
who indicated less than 100 percent confidence (except those who indicated that they had no DR plan to worry about)
to indicate what specifically was causing them to worry. We offered five options:
Note that we allowed each respondent to choose as many of the concerns as they felt applied to them. Thus, the
percentage values add up to greater than 100 percent.
9
The State of Disaster Recovery Expectations
tabulated the results from this question with the Communication Gap
prior question about overall confidence level. The Testing
goal was to see if any of the specific concerns were
Coordination
directly related to any specific level of confidence. For
example, were those who were 90 percent confident 0% 10% 20% 30% 40% 50%
more concerned about the ability of their staff to
Figure 8
execute than other groups?
We got a most interesting result. Both the rank order and the relative weight or strength of the five options was very
similar for all four groups. Only the number of options chosen differed. Specifically, the lower their reported confi-
dence, the greater number of the five options were chosen. But when all of the responses were added together, the
rank order/relative strength of the five responses did not differ significantly!
80%
70%
60% 90%
50% 75%
50%
40%
25%
30%
20%
10%
0%
Incomplete Outdated Communication gap Testing Coordination
Figure 9
10
Key Finding
Taken together, this indicates that the vast majority of IT managers and professionals, though they have DR
plans in place, are concerned that when pressed, they may not execute properly or at all. And they understand
implicitly that DR plans need to be tested regularly in order to ensure that both systems and staff are ready should
the recovery plan need to be invoked. But they concede that testing is currently not sufficient in frequency or in
completeness to assure them of success if their plan must be invoked.
When we discuss High Availability and Disaster Recovery, we are really addressing a wide range of technologies
and processes, all of which are focused on protecting data from loss or corruption. For this report, we bring together
results from questions that attempt to define and measure, among other things, which of these technologies and
processes are more or less prevalent among AIX and IBM i users today.
So, to begin, we reviewed the data protection technology landscape in general. A wide range of options was presented,
and the participants were invited to check all that apply. Thus the percentages reported add up to more than 100
percent.
As we review these results, keep in mind that this was one of the questions for which we received just under 4,000
responses over the past year.
11
The State of Data Protection Technologies
12
The State of Data Protection Technologies
Figure 11
This notion may well be
supported by the other split we did for this question, showing prevalence of each technology within company size.
It is clear from this chart that large enterprises employ a far more complete mix of technologies for data protection than
do medium or small businesses. They also show a distinct preference for Clustering, Logical Replication/Log Shipping
13
The State of Data Protection Technologies
+ Failover (which would imply that dual-site High Availability is in place), and the more recently introduced Virtual
Tape Library and Cross-Site Mirroring technologies.
The one technology where all three seem to agree is in the use of CDP, the only category where small businesses
outpaced the others in uptake by percentage, albeit only slightly.
We also note that, while the use of logical replication and failover show some small increases over that seen in past
reviews, there is still a persistent group of logical replication users who do not also implement failover, indicating that
they use replication to create a disk-based backup for data protection but are not committed to full High Availability.
6. If you experience a partial data loss on one of your serversmeaning accidental damage to or
deletion of a file, object, or librarywhat is your primary method (first attempt) for recovering
that data?
Our intent with this question was to understand to what extent tape backup is being relied upon to recover data. When
data recovery is required for reasons other than a complete disaster (or a serious server failure), there remains a
significant group of AIX and IBM i shops that still reach for offsite tape first.
This is both encouraging and worrisome. Given that effectively all firms reported keeping tape backups, it is encour-
aging that, when data must be recovered, about 57 percent of them utilize technologies that can recover discrete
amounts of data more quickly, and usually more completely, than is possible using tape backups.
The worrisome part is that 43 percent of respondents still do not. Given the availability of such a wide range of
data protection technologies, offered at a wide range of price points, why do firms persist in relying upon tape so
completely? It is quite likely that these companies do not and in fact cannot actually meet their reported RTOs and
RPOs when data loss occurs. One assumes that if they had other options that were faster and offered more complete
recovery, they would use them first. So by relying upon tape, they face RPOs that, on average, are longer than 12 hours
(and often 24 hours and more) with RTOs that are equally undesirable.
14
The State of Data Protection Technologies
Flash Copy
Other
Figure 12
7. Have you experienced unrecoverable data loss? If so, what was the primary reason that the
data was unrecoverable?
With this question, we sought to understand how the worst-case scenario of permanent loss of data actually occurs.
Keep in mind that these results come from a smaller subset of respondents, the ones willing to admit and discuss a
complete failure!
15
The State of Data Protection Technologies
We offered four prepared responses and an open text field where respondents could specify other reasons for data loss.
The prepared options included:
No Backup Copy: Data lost before tape, flash copy or disk copy created
Bad/Unusable Backup Copy: Bad or missing backup tape/disk copy
All Copies Bad: Data damage or deletion was mirrored/replicated on all backup media
Loss Reported Too Late: Past data retention period
Taken together, it is clear that the root causes of data being permanently lost are predominantly human error, both in
the implementation and execution of data-protection processes, and in the inherent weaknesses in technologies that
provide less than real-time backup to secondary disk. Within this second group, in cases such as the corruption of tape
or disk copies, human error may contribute, but certainly for tape backups, the weakness is found in the natural limits
of the technologys capabilities: losing data between backup runs and the often debated but always feared fallibility of
tape as a backup media.
16
Key Finding
Increasingly, relying exclusively or even predominantly upon tape as ones primary data protection and recovery
technology is rapidly becoming more than a case of less-than-optimal business practice. The cost to any
business of losing data permanently, along with the cost of unplanned downtime while waiting for data recovery,
is avoidable through investment in technologies that simply do a better job in all respects. Given the availability of
such a wide range of data-protection technologies, the rather modest sums required to implement CDP or other
real-time disk-to-disk replication options will provide significant, near-term ROI based upon avoiding the costs
associated with data loss and downtime.
Before discussing where things stand with regard to the adoption of Power Systems servers by existing IBM
midrange system customers, we need to acknowledge the proverbial elephant in the room: the seriously negative
economic situation. This years surveys were conducted during a 12-13 month period that saw one of the worst and
most widespread recessions ever. Any discussion of the adoption of new server technologies must be framed by this
overriding reality.
In late spring of 2009, we fielded a survey that focused primarily upon the impacts of the economy being felt by IT
professionals. In the end, the following single question tells the overall story rather completely.
17
The State of Migration to Power Systems
8. In response to the economy (recession), most companies have been forced to adjust. How
have your IT operations been affected by the economy?
18
The State of Migration to Power Systems
9. The following is a list of benefits that companies like yours often cite as reasons for moving to
Power Systems servers. Which two (2) benefits do you believe are the MOST valuable for your
company?
A fairly significant percentage of respondents indicated that they hoped to improve or simplify their HA or DR
protection as a direct result of upgrading. Taken together with the lack of concern about obsolescence, this may
indicate that many IBM shops are looking at bringing on newer servers and configuring their combined server
collection to provide HA through replication and failover technologies.
19
The State of Migration to Power Systems
10. What factors are preventing or delaying your company from upgrading to new Power
Systems servers?
Perhaps of greater interest was the follow-on question for those who indicated no current plans for upgrading to Power
Systems servers.
For this question, we again offered five responses covering a range of reasons for deciding against upgrading to
Power in the near future:
Figure 16
20
The State of Migration to Power Systems
Did you use High Availability or Disaster Recovery technologies to reduce or manage your upgrade project downtime?
With this second question, we were looking for those who utilized one of the HA or DR technologies that include
real-time replication and switching capabilities. Such solutions allow upgrade build, configuration, and testing to occur
on the new server while operations continue on the old server. In some cases, despite being offline, the new server
can have its data populated and then made completely current through the same technology, ensuring that the only
downtime required is for switching users over to the new server.
In this chart, we see that for those who had appropriate HA/DR technologies in place and used them to minimize
upgrade downtime, responses about downtime goals, as expressed in terms of general RTOs, were strongly biased
to the left, with many targeting less than one hour of downtime impact. This indicates a confidence in their HA/DR
technology to help keep downtime controlled. They have high expectations.
21
The State of Migration to Power Systems
But the actual results were not good news for these respondents. Overall, downtime was far greater than targeted, so
much so that the overall trend was rightward biased and the derived trend line was downward sloping, indicating a
negative correlation. Overall, downtime was far greater than desired or anticipated.
22
Summary: The State of Resilience
At the beginning of this report, we noted that to understand the State of Resilience today we must take into account the
state of the world economy over the past year. A year ago, we noted that our survey respondents exhibited a pattern of
careful and deliberate choices among the increasing number and range of data protection, availability, and optimization
technologies. Overall, they recognized the value of data and application resilience to their success but seemed to be
calibrating their investments. Over this past year, in the face of intense financial headwinds, we find that they held their
course and continued to make progress.
As financial constraints limited operational resources, rather than retreat to accepting less challenging RTOs and
RPOs, instead it appears that the AIX and IBM i user base realized the need for more aggressive levels of resiliency.
Protecting data and applications has proven to be a mandate that cannot be ignored or short-changed simply because
money is tight. Yet at the same time, we find that most IT managers are worried that their DR plans may not execute
properly or completely when invoked. They cite a lack of recent or complete testing as their main worry.
While the overall profile of data protection technology options has not changed radically in the past year, there are
nonetheless some trends to note.
Tape is still the firewall, the backstop for data loss in the event of catastrophe. It shows little sign of yielding that
position to spinning disk anytime soon. But beyond that, when reviewing the adoption of the entire range of data
protection options, we see a measurable trend toward granularity and immediacy. IBM i and AIX users are increasingly
seeking real-time protection of data as it is created and changed, as well as the ability to recover from any level of data
loss or corruption. While the adoption of full High Availability is increasing, the real interest and energy are directed
toward improved data recoverability. As CDP technologies are maturing, both in capability and ease of use, they are
being quickly adopted among small, medium, and large businesses alike.
23
Summary: The State of Resilience
The strongly negative business environment of this past year has most surely had a dampening effect on the rate
at which IBM midrange customers are moving up to Power Systems. Undoubtedly, basic financial issuessuch as
reduced cash flow and the increased cost and reduced availability of business loanshas discouraged investment
in new hardware. So too, for many firms, taking on a potentially disruptive upgrade project with reduced staff was
simply not warranted. But, as always seems to happen with technology adoption, there are among the IBM midrange
community those leaders who have shown the way to overcome the workload problem, using their High Availability
and replication technologies to measurably speed and simplify migration.
Overall, the State of Resilience is improving among IBM i and AIX customers. Under financial pressure, the most
notable gains were achieved with technologies and projects well-suited to cost containment. What remains to be seen is
whether this trend will continue or whether the path to higher resilience will lead in another direction as the economy
improves in the coming year.
24
About the Information Availability Institute
The Information Availability Institute (IAI) provides research and education that helps business professionals of all
disciplines to understand, evaluate, and apply information availability technologies.
Drawing upon the experience and resources of Vision Solutions, its technology partners, and independent industry
experts, the IAI is committed to identifying and communicating improvements in technologies that increase infor-
mation availability and overall business resilience across the entire enterprise.
Copyright 2009, Vision Solutions Inc. All Rights Reserved. IBM, AIX, IBM i, i5/OS, OS/400, AS/400, RS/6000, System i, System p, iSeries, pSeries, and Power Systems are trademarks or registered trademarks of International
Business Machines Corporation.
25
Information Availability Institute visionsolutions.com | info@visionsolutions | 1-949-253-6500
26