6671 - The State of Resilience On IBM Power Systems

The State of Resilience on
IBM Power Systems
Research Findings Based on Surveys of IBM i and

AIX Users
1
Contents
Executive Summary _________________________________________________________________________ 1

Foreword: Resilience Under Pressure __________________________________________________________ 3
Report Focus _______________________________________________________________________________ 4
Research Environment and Methodology _______________________________________________________ 4
The State of Disaster Recovery Expectations ____________________________________________________ 5
Recovery Time Objectives ____________________________________________________________ 5
Recovery Point Objectives ____________________________________________________________ 7
Disaster Recovery Confidence _________________________________________________________ 9
Disaster Recovery Concerns __________________________________________________________ 9
The State of Data Protection Technologies _____________________________________________________ 11
Data Recovery Methods _____________________________________________________________ 11
Partial Data Loss ___________________________________________________________________ 14
Unrecoverable Data ________________________________________________________________ 15
The State of Migration to Power Systems ______________________________________________________ 17
Economic Impacts on IT _____________________________________________________________ 18
Migration to Power Systems _________________________________________________________ 19
Summary: The State of Resilience ____________________________________________________________ 23
2
Executive Summary
The idea that data is a key business asset is hardly disputable. Even so, how to affix a direct financial value on data
remains a matter of debate. However, in a practical sense, the real value a business places on its data may be inferred
by how seriously it protects it. Data protection is one of the core elements of any Disaster Recovery plan. Maintaining
data availability is likewise essential to overall business continuity efforts. Difficult economic environments such as we
have experienced in the past year put commitment to data protection under challenge, with revealing results.
Central Issues for Executives
A trend toward increasingly aggressive recovery time objectives (RTOs) and recovery point objectives (RPOs)
is evident across the platform. But there remains evidence that, for many firms, the appropriate technologies for
achieving their RTOs and RPOs are not in place.
Data protection and high availability are related but still distinctly different objectives. Current findings support
increasing adoption of solutions to address both areas, but keeping data and applications continuously available has
not been addressed as fully or consistently as protecting against catastrophic data loss.
Traditionally, DR capabilities have been viewed as insurance against major natural or manmade disasters. But
increasingly, businesses of all sizes are realizing the significant impact of the many mini disasters that occur quite
frequently: power or network outages, maintenance and upgrade work, and even the corruption or loss of individual
files. Technologies for recovering from and actively avoiding these events are maturing and gaining support.
IBMs introduction of Power Systems servers has begun to blur the lines between their two previously distinct
groups of midrange server customers. The advantages offered under this converged platform are attractive to
businesses across the board. Despite the current state of the global economy, many firms are managing to find
innovative and effective ways to reduce the cost and complexity of migrating to Power.
1
Executive Summary
Next Steps
Evaluate the costs associated with downtime of any duration from any cause.
Set measurable goals for systems and data recovery speed and completeness (RTOs and RPOs) based upon the
needs of the business, not the capabilities of current technologies and procedures. Then, identify and work toward
implementing technologies and processes that will support your objectives effectively.
Where appropriate technologies are in place, ensure that the companys actual ability to recover from downtime or
data loss events is in fact sufficient to meet stated recovery and availability objectives. Live testing of systems and
procedures is necessary, for both the training and readiness of personnel and the evaluation of current capabilities
versus present and future needs.
2
Foreword: Resilience Under Pressure
Without a doubt, data has become the raw material of the information economy. If companies dont know what
[their data] is worth, they cant enhance, protect or measure the value of the data to the bottom line. Data isnt a
normal commodity. Its like water out of a tapvital to life yet so often taken for granted. 1
Steven Adler, Director of Data Governance Solutions at IBM
In our previous report on the State of Resilience on IBM Power Systems, we noted that working business leaders
understand intuitively the value of their data and know that they must daily address the practical, tangible issues
of protecting their business information from loss or corruption and putting it to work effectively, efficiently, and
profitably.
But just how that intuitive understanding is translated into pragmatic, front-line actions, in the datacenter and
throughout the business, varies from company to company, both in methods and in level of commitment and success.
Protecting the availability and even the existence of this digital water of life is certainly only one of many mission-
critical business objectives that executives must address. The final measure of business leadership is the ability to
identify and execute the right priorities with the optimum balance of resource allocation.
No one disputes the old saying time is money. The maxim that data is money, though not as deeply ingrained in
common business practice, is still taking increasingly firm hold in the minds of business leaders in companies of all
sizes and across all industries.
But just how strong is this belief? Well, there is nothing like adversity to test what one truly believes in. Actions taken
under pressure tell the true tale. And in reviewing the State of Resilience on IBM Power Systems over this past year,
one cannot escape the overriding reality of one of the most difficult business years ever endured. In every region of
the world, for giant global enterprises and small businesses alike, the recessionary economy has put success, even
survival, not just under pressure but under direct threat.
1
Steven Adler, from the CIO.com article Six Steps to Data Governance Success, May 31, 2007
3
Historically, companies that have adopted IBM Power Systems (and all their predecessors) have done so because of the
reliability, flexibility, and sheer power that these systems provide. As we review how these same demanding, forward-
thinking companies are dealing with the issues of data, application, and business resilience, we must keep in mind the
backdrop of this past years business environment. It is truly an informative story of resilience under pressure.
Report Focus
As part of its core mission, the Information Availability Institute (IAI) regularly surveys the IBM i and AIX customer
base, seeking to understand current and developing information availability and data protection needs. And clearly,
the range and sophistication of the tools and technologies available to business of all sizes to address these needs have
grown right along with the market.
The scope and depth of our survey efforts were increased this year, with additional attention to how responses differed
among companies of differing sizes. In addition to reviewing core factors surrounding information availability (RTO,
RPO, etc.), we sought to deepen our understanding of not only what technologies and methods are being applied, but
also how fully and effectively they are being implemented.
Research Environment and Methodology
The data set used for this report includes the results of 18 surveys, conducted between September 2008 and October
2009. These surveys were conducted online using Web-based survey tools, as well as in person at gatherings of IBM i
and AIX professionals such as user groups and trade expos. In total, responses were received from nearly 4,000 technical
professionals and executives involved in the management of IBM i and AIX environments.
Responses were received from all geographies worldwide, with over 100 countries represented. Not all questions
were presented in all surveys. The most fundamental questions were asked in nearly all surveys, while questions on
4
some specific topics were posed in as few as one targeted research survey. To ensure validity, the results in this report
include only those questions where at least a few hundred responses were available for analysis.
Throughout this report, we will identify responses as pertaining to users of the IBM i and AIX operating systems. As a
practical matter, under these umbrella identifiers, we include all prior variants of the IBM i (i5/OS, OS/400) and do not
differentiate between, for example, AIX running on a legacy System i or System p server.
The State of Disaster Recovery Expectations
To begin our study, we asked questions about our survey participants perceptions of their organizations current
readiness to handle business interruptions (planned and unplanned downtime events).
1. What is your organizations recovery time objective (RTO) after a disaster or complete server
or application failure?
In line with results from past surveys, there is a signif-

icant bias toward shorter timeframes for recovery. RTO All
Based upon the strength observed last year in the range 35%
of responses that were less than six hours, we allowed 30%
respondents the option of choosing Less than 1 hour. 25%
The result is a clearer view of the increasing stringency 20%
in RTOs being reported. It is notable that 55 percent of 15%
participants indicated RTOs of less than six hours, up 10%
from 45 percent in last years results. 5%
0%
<1 hr 1 to 6 6 to 12 12 to 24 24 to 48 >48 hrs
Within these results, we compared the responses from hrs hrs hrs hrs
the IBM i shops to those from the AIX shops. Despite
some variations when looking at each RTO group, Figure 1
5
overall there was general consensus among the responses.

For example, summing each groups responses across RTO by OS
categories, 72 percent of AIX respondents cited less than
35%
12 hours as the objective, compared to 69 percent in IBM i IBM i
30%
shops, only a 3 percent difference. AIX
25% Both
A special note at this juncture: In line with IBMs direction 20%
toward hosting AIX and IBM i on a common platform 15%
(Power Systems servers), and knowing that both operating
10%
systems could be run on System i servers previously, this
5%
year we also analyzed results for those firms reporting that
they use both AIX and IBM i in their datacenter. 0%
<1 hr 1 to 6 6 to 12 12 to 24 24 to 48 >48 hrs
hrs hrs hrs hrs
For RTO, there were only slight variations between these Figure 2
IBM dual citizens and the other respondents. And 68
percent indicated less than 12 hours as the objective, right
in line with the other respondents.
RTO by Company Size

We also expanded our efforts this year to include gathering
additional data regarding the scale of businesses in which 35%
participants work, based upon number of employees. We 30% Small
identified businesses as small, medium, and large enter- Medium

25%
Large
prises, with 100 and 1000 employees as the dividing lines.
20%
15%
With regard to RTOs, we expected and in fact found a bias
toward shorter timeframes as the business size increases. 10%
Although the differences between the three sets of 5%
responses within each RTO range option were no greater 0%
than 7 percent, looking again at the sum of responses for <1 hr 1 to 6 6 to 12 12 to 24 24 to 48 >48 hrs
hrs hrs hrs hrs
each group for 12 hours or less, we see the trend clearly:
Figure 3
63 percent for small businesses, 69 percent for medium,
and 72 percent for large enterprises.
6
Key Finding
Overall, AIX and IBM i shops, both pure and mixed, reported more aggressive RTOs, with distinct 10 percent
growth in the number of those targeting six hours or less. Data also seems to confirm that the larger a business
becomes, the more expensive downtime of any type becomes, with the result that increasingly shorter recovery
time goals are established.
2. What is your organizations recovery point objective (RPO), expressed in time? That is, what
is the maximum volume of transactions you are willing to lose as a result of a disaster or a
complete server or application failure?
In line with the prior years results, concern expressed

about data loss is greater than that expressed about RPO All
downtime, as evidenced by a definite trend toward the 40%
extreme left No Data Loss response. The real news here 35%
is the significant increase in strength of the trend: 5 percent 30%
higher than last year in the No Data Loss category.
25%
20%
But there still remains a noticeable contingent of respon-
15%
dents whose organizations are apparently not especially
worried about losing many hours or even a days worth 10%
of online data in the event of a catastrophic server failure. 5%

This may reflect a subset of businesses that are still 0%
No data A few Up to A few 1 day More than
comfortable with relying upon manual re-entry of transac- loss min 1 hr hrs 1 day
tions from paper records.
Figure 4
7
RPO by OS RPO by Company Size

50% 50%
40% 40%
IBM i
Small
AIX
30% 30% Medium
Both
Large
20% 20%
10% 10%
0% 0%
No data A few Up to A few 1 day More than No data A few Up to A few 1 day More than
loss min 1 hr hrs 1 day loss min 1 hr hrs 1 day
Figure 5 Figure 6
Splitting the responses by operating system, the responses from IBM i users were roughly similar to those of AIX
users. But there is some evidence that shops that use both operating systems had a somewhat stronger bias toward
No Data Loss than either of the other groups.
Finally, reviewing the results for the RPO question split by company size, we found a distinct trend toward decreasing
tolerance of data loss as company size increased.
8
3. How confident are you that your companys Disaster Recovery (DR) plan for IT systems is
complete, tested, and ready to execute?
The current years results echo last years, with very little Overall DR Plan Confidence
change in any category. Though not shown here, in line 30%
with last year, the differences in responses when split by
25%
operating system were also minimal.
20%
Only about 16 percent of the respondents had full confi- 15%
dence (100 percent) that their DR plan was complete, 10%

tested, and ready to go. About 50 percent of all respondents 5%
expressed at least some doubt or concern, as they rated 0%
their confidence at 75 percent to 90 percent. 100% 90% 75% 50% 25% No Plan
Figure 7
4. What concerns do you have about the completeness of your Disaster Recovery (DR) plan for
IT systems?
This year, we wanted to dig a bit deeper into this question of confidence in DR readiness, so we invited all respondents
who indicated less than 100 percent confidence (except those who indicated that they had no DR plan to worry about)
to indicate what specifically was causing them to worry. We offered five options:
Incomplete: Plan does not include all important IT systems

Outdated: May not protect current IT configuration
Communication Gap: IT staff training/knowledge incomplete
Testing: DR plan has not been tested recently or has not been tested completely
Coordination: Lack of integration between IT DR plans and DR plans of other departments
Note that we allowed each respondent to choose as many of the concerns as they felt applied to them. Thus, the
percentage values add up to greater than 100 percent.
9
Clearly, concern over testing leads the responses,

with concern over the ability of IT staff to execute Concerns Overall
properly coming in second.
Incomplete
Taking this inquiry one step further, we cross- Outdated
tabulated the results from this question with the Communication Gap
prior question about overall confidence level. The Testing
goal was to see if any of the specific concerns were
Coordination
directly related to any specific level of confidence. For
example, were those who were 90 percent confident 0% 10% 20% 30% 40% 50%
more concerned about the ability of their staff to
Figure 8
execute than other groups?
We got a most interesting result. Both the rank order and the relative weight or strength of the five options was very
similar for all four groups. Only the number of options chosen differed. Specifically, the lower their reported confi-
dence, the greater number of the five options were chosen. But when all of the responses were added together, the
rank order/relative strength of the five responses did not differ significantly!
Specific Concerns by Confidence Level
80%
70%
60% 90%
50% 75%
50%
40%
25%
30%
20%
10%
0%
Incomplete Outdated Communication gap Testing Coordination
Figure 9
10
Key Finding
Taken together, this indicates that the vast majority of IT managers and professionals, though they have DR
plans in place, are concerned that when pressed, they may not execute properly or at all. And they understand
implicitly that DR plans need to be tested regularly in order to ensure that both systems and staff are ready should
the recovery plan need to be invoked. But they concede that testing is currently not sufficient in frequency or in
completeness to assure them of success if their plan must be invoked.
The State of Data Protection Technologies
When we discuss High Availability and Disaster Recovery, we are really addressing a wide range of technologies
and processes, all of which are focused on protecting data from loss or corruption. For this report, we bring together
results from questions that attempt to define and measure, among other things, which of these technologies and
processes are more or less prevalent among AIX and IBM i users today.
5. What data protection methods do you currently employ?

Over the years, data protection technologies beyond basic tape backup have been introduced, with the goal of offering
more efficient and granular management of data through the use of journaling, logical replication, clustering, disk
mirroring, continuous data protection (CDP), and more.
So, to begin, we reviewed the data protection technology landscape in general. A wide range of options was presented,
and the participants were invited to check all that apply. Thus the percentages reported add up to more than 100
percent.
As we review these results, keep in mind that this was one of the questions for which we received just under 4,000
responses over the past year.
11
Several features of the results

are very clear. First, tape backup Data Protection by OS
is clearly the foundation of all Tape Backup /
Offsite Storage
but a minority of data protection
schemes, with between 85 and 92 Clustering
percent of all AIX and IBM i shops
including it in their responses. Logical Replication
(Server to Server)
That is not news. But of greater Virtual Tape Library

(VTL)
interest is the next tier of
technologies, with an overall Cross Site Mirroring
combined prevalence of about
Logical Replication +
30 to 35 percent, Logical Repli- Failover
cation and Clustering. (Note that
for simplicity of discussion, we Flash Copy
include in the Logical Replication Continuous
IBM i
category both the journal-based Data Protection
(CDP) AIX
methods used under the IBM i
Both
OS and the log shipping methods Geographic Mirroring
commonly used under AIX.)

Metro Mirroring /
PPRC
Within these results, we find that
Global Mirroring
logical replication technologies
are found less frequently in AIX
SRDF
environments than in IBM i
environments, while clustering is 0% 20% 40% 60% 80% 100%
far more frequently found under
Figure 10
AIX than IBM i. This is, of course,
indicative of the differences in the usual storage architectures implemented under these operating systemsthat is,
shared storage for AIX servers versus Direct Access Storage Device (DASD) for IBM i systems.
12
But perhaps the more interesting

Data Protection by Company Size
result from this question, beyond
the AIX-to-IBM i comparisons Tape Backup /
Offsite Storage
under each technology, is
found by looking at the results Clustering
from companies that have both
Logical Replication
operating systems in use. In all
(Server to Server)
but two categories, Clustering
Virtual Tape Library
and SRDF, these dual citizen (VTL)
shops reported utilizing each
technology more frequently than, Cross Site Mirroring
or at least as frequently as, the
Logical Replication +
firms using only one or the other. Failover
In simpler terms, the dual-OS Flash Copy
environments employ, in general, Continuous

Data Protection
a greater number of technologies (CDP) Small
to protect their data. Perhaps Medium
Geographic Mirroring
there is in these companies a Large
greater willingness or need to Metro Mirroring /

mix and match technologies to PPRC
ensure against data loss. More

Global Mirroring
tools in the kit, as it were, in order
to ensure that they have the right
SRDF
tool for the job under different
Data Recovery scenarios. 0% 20% 40% 60% 80% 100%
Figure 11
This notion may well be
supported by the other split we did for this question, showing prevalence of each technology within company size.
It is clear from this chart that large enterprises employ a far more complete mix of technologies for data protection than
do medium or small businesses. They also show a distinct preference for Clustering, Logical Replication/Log Shipping
13
+ Failover (which would imply that dual-site High Availability is in place), and the more recently introduced Virtual
Tape Library and Cross-Site Mirroring technologies.
The one technology where all three seem to agree is in the use of CDP, the only category where small businesses
outpaced the others in uptake by percentage, albeit only slightly.
We also note that, while the use of logical replication and failover show some small increases over that seen in past
reviews, there is still a persistent group of logical replication users who do not also implement failover, indicating that
they use replication to create a disk-based backup for data protection but are not committed to full High Availability.
6. If you experience a partial data loss on one of your serversmeaning accidental damage to or
deletion of a file, object, or librarywhat is your primary method (first attempt) for recovering
that data?
Our intent with this question was to understand to what extent tape backup is being relied upon to recover data. When
data recovery is required for reasons other than a complete disaster (or a serious server failure), there remains a
significant group of AIX and IBM i shops that still reach for offsite tape first.
This is both encouraging and worrisome. Given that effectively all firms reported keeping tape backups, it is encour-
aging that, when data must be recovered, about 57 percent of them utilize technologies that can recover discrete
amounts of data more quickly, and usually more completely, than is possible using tape backups.
The worrisome part is that 43 percent of respondents still do not. Given the availability of such a wide range of
data protection technologies, offered at a wide range of price points, why do firms persist in relying upon tape so
completely? It is quite likely that these companies do not and in fact cannot actually meet their reported RTOs and
RPOs when data loss occurs. One assumes that if they had other options that were faster and offered more complete
recovery, they would use them first. So by relying upon tape, they face RPOs that, on average, are longer than 12 hours
(and often 24 hours and more) with RTOs that are equally undesirable.
14
Primary (First Attempt) Data Recovery Method
Restore from Offsite Tape
Restore from HA/DR Backup Disk
Restore from Offsite Tape and Journals Combined
Rebuild from Raid/Disk Parity Information
Retrieve from CDP Backup
Mirrored Disk: Local (same site)
Mirrored Disk: Remote Site
Virtual Tape Library (VTL)
Re-code or Re-enter Manually
Flash Copy
Other
0% 10% 20% 30% 40% 50%
Figure 12
7. Have you experienced unrecoverable data loss? If so, what was the primary reason that the
data was unrecoverable?
With this question, we sought to understand how the worst-case scenario of permanent loss of data actually occurs.
Keep in mind that these results come from a smaller subset of respondents, the ones willing to admit and discuss a
complete failure!
15
We offered four prepared responses and an open text field where respondents could specify other reasons for data loss.
The prepared options included:
No Backup Copy: Data lost before tape, flash copy or disk copy created
Bad/Unusable Backup Copy: Bad or missing backup tape/disk copy
All Copies Bad: Data damage or deletion was mirrored/replicated on all backup media
Loss Reported Too Late: Past data retention period
Under the option of Other, we received a

number of interesting write-in responses,
including: Causes of Unrecoverable Data
Backup server DB corruption No backup copy

High Availability was not set up correctly,
and one days worth of data was lost Bad/unusable backup copy
Essential libraries were missed during

All copies bad
routine backups
New process backup not considered Loss reported too late
Backup software did not do its job
Application error in writing data Other
Computer operator mistake 0% 10% 20% 30% 40% 50%
Inexperience and no backup procedure
in place
User unable to identify files Figure 13
Unable to differentiate needed from not
Taken together, it is clear that the root causes of data being permanently lost are predominantly human error, both in
the implementation and execution of data-protection processes, and in the inherent weaknesses in technologies that
provide less than real-time backup to secondary disk. Within this second group, in cases such as the corruption of tape
or disk copies, human error may contribute, but certainly for tape backups, the weakness is found in the natural limits
of the technologys capabilities: losing data between backup runs and the often debated but always feared fallibility of
tape as a backup media.
16
Key Finding
Increasingly, relying exclusively or even predominantly upon tape as ones primary data protection and recovery
technology is rapidly becoming more than a case of less-than-optimal business practice. The cost to any
business of losing data permanently, along with the cost of unplanned downtime while waiting for data recovery,
is avoidable through investment in technologies that simply do a better job in all respects. Given the availability of
such a wide range of data-protection technologies, the rather modest sums required to implement CDP or other
real-time disk-to-disk replication options will provide significant, near-term ROI based upon avoiding the costs
associated with data loss and downtime.
The State of Migration to Power Systems
Before discussing where things stand with regard to the adoption of Power Systems servers by existing IBM
midrange system customers, we need to acknowledge the proverbial elephant in the room: the seriously negative
economic situation. This years surveys were conducted during a 12-13 month period that saw one of the worst and
most widespread recessions ever. Any discussion of the adoption of new server technologies must be framed by this
overriding reality.
In late spring of 2009, we fielded a survey that focused primarily upon the impacts of the economy being felt by IT
professionals. In the end, the following single question tells the overall story rather completely.
17
8. In response to the economy (recession), most companies have been forced to adjust. How
have your IT operations been affected by the economy?
Clearly, businesses were (and as of this

writing still are) very concerned about Impacts of the Economy on IT
the economy and its impacts on revenue.
Costs relating to investments in future Training/education reduced
or eliminated
operationsfrom training, education, and
staffing levels to hardware and application Hiring postponed
projectsare being avoided or at least
severely reduced. Only a small minority Hardware projects cancelled
or postponed
of respondents indicated that, as of last
spring, their operations had not been Application projects cancelled
or postponed
impacted.
Staffing cut
So, acknowledging this backdrop to the
situation, we nonetheless sought to under-
IT staff working much longer hours
stand some of the dynamics behind the
uptake of Power Systems servers among
Some IT functions outsourced
the existing IBM midrange customer base.
For those who indicated in a prior question
IT not impacted
that they definitely planned to upgrade
to Power Systems servers in the future,
0% 10% 20% 30% 40% 50% 60%
a total of about 50 percent of all respon-
dents, we asked the following question. Figure 14
18
9. The following is a list of benefits that companies like yours often cite as reasons for moving to
Power Systems servers. Which two (2) benefits do you believe are the MOST valuable for your
company?
For this question we offered five benefit

statements including:
Benefits of Upgrading
Reduced total cost/investment in

Reduced Cost
hardware
Increased speed/processing capacity
Increased Speed/
Simplified/improved Disaster Processing Capacity
Recovery or High Availability
protection Simplified/Improved DR or HA
Reduced data center operating costs
(Power/Cooling)
Continued IBM Support: Our current Reduced Operating Costs
systems are old and will not be

supported by IBM Continued IBM Support
0% 5% 10% 15% 20% 25% 30% 35%

Clearly, what these companies are
looking for is understandable: increased Figure 15
computing speed and power, at lower
cost. Almost paradoxically, given the huge emphasis on cost containment, reduction of operating costs was not one of
the main motivators for upgrading. Few were concerned about being forced to upgrade due to sheer obsolescence.
A fairly significant percentage of respondents indicated that they hoped to improve or simplify their HA or DR
protection as a direct result of upgrading. Taken together with the lack of concern about obsolescence, this may
indicate that many IBM shops are looking at bringing on newer servers and configuring their combined server
collection to provide HA through replication and failover technologies.
19
10. What factors are preventing or delaying your company from upgrading to new Power
Systems servers?
Perhaps of greater interest was the follow-on question for those who indicated no current plans for upgrading to Power
Systems servers.
For this question, we again offered five responses covering a range of reasons for deciding against upgrading to
Power in the near future:
Concerns About the Economy

Lack of Time: Too many other projects and priorities
Cost of upgrade/migration too high
Cost of new servers too high
Complexity of Upgrading/Migrating: insufficient in-house skills, knowledge or experience
Key Finding Why Not Upgrading

Not unexpectedly, concerns over the economy (and thus
Economy
revenues) topped the list of reasons for not committing
to upgrading to Power. Costs related to the purchase Lack of Time
of the servers and to the overall cost to the company
of undertaking such a project also factored strongly. Project Cost
Indeed, in the months following this survey, IBM Server Cost

announced new purchasing plans and rebates intended
to bring the cost of acquisition more into alignment with Complexity
their current customers real budgetary limits.

0% 5% 10% 15% 20% 25%
Figure 16
20
Managing upgrade downtime

With regard to reducing the cost and complexity
HA Used to Reduce Upgrade Downtime
of upgrading or migrating to Power Systems,
we had yet another set of follow-on questions 40%
that were posed only to those respondents who Downtime Goal
35%
indicated they had already upgraded/migrated Trend
(Logrithmically smoothed)
to Power. The results from those follow-on 30%
Actual
questions are instructive. 25% Trend
20%
To begin with, near the beginning of the
included surveys on this question, we asked 15%
all participants about their downtime limits in 10%
terms of their RTO. Then, for those who later
5%
confirmed that they had completed an upgrade
to Power Systems, we asked two additional 0%
<1 hr 1 to 6 6 to 12 12 to 24 >48 hrs
questions: hrs hrs hrs
How much downtime did you experience as a Figure 17
direct result of the upgrade process?
Did you use High Availability or Disaster Recovery technologies to reduce or manage your upgrade project downtime?
With this second question, we were looking for those who utilized one of the HA or DR technologies that include
real-time replication and switching capabilities. Such solutions allow upgrade build, configuration, and testing to occur
on the new server while operations continue on the old server. In some cases, despite being offline, the new server
can have its data populated and then made completely current through the same technology, ensuring that the only
downtime required is for switching users over to the new server.
In this chart, we see that for those who had appropriate HA/DR technologies in place and used them to minimize
upgrade downtime, responses about downtime goals, as expressed in terms of general RTOs, were strongly biased
to the left, with many targeting less than one hour of downtime impact. This indicates a confidence in their HA/DR
technology to help keep downtime controlled. They have high expectations.
21
But, as evidenced by the more evenly

distributed responses about their Did Not Use HA
actual experience, it was clear that their
ambitious RTOs were not always fully Downtime Goal
40% Trend
met, even with the use of HA to reduce (Logrithmically smoothed)
the downtime. Still, comparing the results 35%
Actual
using logarithmically smoothed trend lines, 30% Trend
there was still a fairly close correlation. By 25%
and large, goals were achieved.
20%
However, for the other group, the result 15%
was quite different. First, this group most 10%

surely includes those who have not imple- 5%
mented an HA or DR solution that could
0%
help avoid downtime. It also includes those <1 hr 1 to 6 6 to 12 12 to 24 >48 hrs
hrs hrs hrs
who may have had the technology at their
disposal but did not, for whatever reason, Figure 18
use it to help with their upgrade. Second,
in comparison with the previous group, their reported RTOs were much less aggressive. While still biased to the left,
the trend line, while remaining upward sloping, was much flatter. Goals were basically less aggressive.
But the actual results were not good news for these respondents. Overall, downtime was far greater than targeted, so
much so that the overall trend was rightward biased and the derived trend line was downward sloping, indicating a
negative correlation. Overall, downtime was far greater than desired or anticipated.
22
Summary: The State of Resilience
At the beginning of this report, we noted that to understand the State of Resilience today we must take into account the
state of the world economy over the past year. A year ago, we noted that our survey respondents exhibited a pattern of
careful and deliberate choices among the increasing number and range of data protection, availability, and optimization
technologies. Overall, they recognized the value of data and application resilience to their success but seemed to be
calibrating their investments. Over this past year, in the face of intense financial headwinds, we find that they held their
course and continued to make progress.
As financial constraints limited operational resources, rather than retreat to accepting less challenging RTOs and
RPOs, instead it appears that the AIX and IBM i user base realized the need for more aggressive levels of resiliency.
Protecting data and applications has proven to be a mandate that cannot be ignored or short-changed simply because
money is tight. Yet at the same time, we find that most IT managers are worried that their DR plans may not execute
properly or completely when invoked. They cite a lack of recent or complete testing as their main worry.
While the overall profile of data protection technology options has not changed radically in the past year, there are
nonetheless some trends to note.
Tape is still the firewall, the backstop for data loss in the event of catastrophe. It shows little sign of yielding that
position to spinning disk anytime soon. But beyond that, when reviewing the adoption of the entire range of data
protection options, we see a measurable trend toward granularity and immediacy. IBM i and AIX users are increasingly
seeking real-time protection of data as it is created and changed, as well as the ability to recover from any level of data
loss or corruption. While the adoption of full High Availability is increasing, the real interest and energy are directed
toward improved data recoverability. As CDP technologies are maturing, both in capability and ease of use, they are
being quickly adopted among small, medium, and large businesses alike.
23
Summary: The State of Resilience
The strongly negative business environment of this past year has most surely had a dampening effect on the rate
at which IBM midrange customers are moving up to Power Systems. Undoubtedly, basic financial issuessuch as
reduced cash flow and the increased cost and reduced availability of business loanshas discouraged investment
in new hardware. So too, for many firms, taking on a potentially disruptive upgrade project with reduced staff was
simply not warranted. But, as always seems to happen with technology adoption, there are among the IBM midrange
community those leaders who have shown the way to overcome the workload problem, using their High Availability
and replication technologies to measurably speed and simplify migration.
Overall, the State of Resilience is improving among IBM i and AIX customers. Under financial pressure, the most
notable gains were achieved with technologies and projects well-suited to cost containment. What remains to be seen is
whether this trend will continue or whether the path to higher resilience will lead in another direction as the economy
improves in the coming year.
24
About the Information Availability Institute
The Information Availability Institute (IAI) provides research and education that helps business professionals of all
disciplines to understand, evaluate, and apply information availability technologies.
Drawing upon the experience and resources of Vision Solutions, its technology partners, and independent industry
experts, the IAI is committed to identifying and communicating improvements in technologies that increase infor-
mation availability and overall business resilience across the entire enterprise.
Copyright 2009, Vision Solutions Inc. All Rights Reserved. IBM, AIX, IBM i, i5/OS, OS/400, AS/400, RS/6000, System i, System p, iSeries, pSeries, and Power Systems are trademarks or registered trademarks of International
Business Machines Corporation.
25
Information Availability Institute visionsolutions.com | info@visionsolutions | 1-949-253-6500
26

6671 - The State of Resilience On IBM Power Systems

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

6671 - The State of Resilience On IBM Power Systems

Uploaded by

Copyright:

Available Formats

The State of Resilience on

IBM Power Systems

Research Findings Based on Surveys of IBM i and

Executive Summary _________________________________________________________________________ 1

Central Issues for Executives

Steven Adler, Director of Data Governance Solutions at IBM

Research Environment and Methodology

The State of Disaster Recovery Expectations

In line with results from past surveys, there is a signif-

overall there was general consensus among the responses.

RTO by Company Size

participants work, based upon number of employees. We 30% Small

identified businesses as small, medium, and large enter- Medium

In line with the prior years results, concern expressed

of online data in the event of a catastrophic server failure. 5%

RPO by OS RPO by Company Size

Only about 16 percent of the respondents had full confi- 15%

dence (100 percent) that their DR plan was complete, 10%

Incomplete: Plan does not include all important IT systems

Clearly, concern over testing leads the responses,

Taking this inquiry one step further, we cross- Outdated

Specific Concerns by Confidence Level

The State of Data Protection Technologies

5. What data protection methods do you currently employ?

Several features of the results

That is not news. But of greater Virtual Tape Library

commonly used under AIX.)

But perhaps the more interesting

In simpler terms, the dual-OS Flash Copy

environments employ, in general, Continuous

greater willingness or need to Metro Mirroring /

ensure against data loss. More

Primary (First Attempt) Data Recovery Method

Restore from Offsite Tape

Restore from HA/DR Backup Disk

Restore from Offsite Tape and Journals Combined

Rebuild from Raid/Disk Parity Information

Retrieve from CDP Backup

Mirrored Disk: Local (same site)

Mirrored Disk: Remote Site

Virtual Tape Library (VTL)

Re-code or Re-enter Manually

0% 10% 20% 30% 40% 50%

Under the option of Other, we received a

Backup server DB corruption No backup copy

Essential libraries were missed during

Unable to differentiate needed from not

The State of Migration to Power Systems

Clearly, businesses were (and as of this

For this question we offered five benefit

Reduced total cost/investment in

systems are old and will not be

0% 5% 10% 15% 20% 25% 30% 35%

Concerns About the Economy

Key Finding Why Not Upgrading

Indeed, in the months following this survey, IBM Server Cost

their current customers real budgetary limits.

Managing upgrade downtime

How much downtime did you experience as a Figure 17

direct result of the upgrade process?

But, as evidenced by the more evenly

However, for the other group, the result 15%

was quite different. First, this group most 10%