You are on page 1of 13

Business Continuity Solution Series A Vision Solutions White Paper May 2006

Understanding Downtime

Understanding Downtime
Executive Summary
The flow of accurate, real-time information is at the heart of 21st century business. Today, when information is not available for any reason, business activity often stops. And when business stops, the costs mount quickly. The great majority of system and data unavailability is the result of planned downtime that occurs due to required maintenance. The other side of the coin is unplanned downtime. This is typically perceived to be associated with natural disasters or hardware and software failures, but human error actually plays a bigger role. While unplanned downtime accounts for only about 20% of all downtime, its unexpected nature means that any single downtime incident may be more damaging to the enterprise than occurrences of planned downtime. Calculating costs associated with downtime is often more difficult than it appears. Business executives can normally provide reasonable information on some of the tangible costs of downtime, such as salaries paid to idled employees, penalties incurred if service level agreements are breeched, and so on. However, less easily quantifiable costs, such as the cost of lost customer loyalty or the cost of prospects that weren't converted into customers because the company's business stopped, can be much higher. Understanding the tangible costs is just the beginning. It is equally important to identify and, as much as possible, quantify the intangible costs. These include the long-term impact of damage to reputation, brand and customer loyalty. Business continuity is achieved by mitigating the impact of system downtime on an organization. Meeting this objective requires a clear understanding of downtime issues, causes and costs, so that business executives and system managers can arrive at an acceptable level of investment that meets the organizations business continuity and profitability goals. This white paper discusses four primary topics and provides an Annual Cost of Downtime Worksheet that will assist you in calculate your business downtime costs. The chief topics discussed include:

Definitions of downtime. Various causes of downtime in business. Factors and costs of downtime, both tangible and intangible. Formulas for estimating the annual labor cost from downtime and annual revenue loss from downtime.

Causes of Downtime
PLANNED DOWNTIME ISSUES Planned downtime typically accounts for at least 80% of all downtime. Despite being planned, this downtime can cause serious problems for modern organizations. When an enterprise evolves into a 24x7 operation, how can it perform critically important hardware, software, security and data maintenance? Production servers cant be taken down; yet, among other maintenance tasks, backups must be taken and various software upgrades must be implemented to protect the business and keep it competitive. Planned downtime can be broken down into three primary subsets: 1. Normal IT-infrastructure operations activities performed on a regular basis to maintain system protection and health. 2. Maintenance program and software activities constitute the bulk of this category (program fixes, for example). 3. Unique periodic events like deployments of hardware and software, which usually can be scheduled with substantial lead-time.

Figure 1 Its easy to see why planned downtime outweighs unplanned downtime as the chief concern of IT system managers.

The planned downtime caused by system administration is usually precautionary. Most server architectures can accommodate security and administration settings adjustments without requiring an IPL (Initial Program Load or reboot), but it is easier and safer to keep users off the system while doing administrative work. Server hardware and software upgrades are another cause of planned downtime. Software upgrades, security patches and bug fixes consume greater manpower and time as the complexity of environments and software grows. These items account for 15% or more of planned downtime. In addition, while the process of upgrading an operating system or applying software is becoming more automated, this tends to restrict the choices available to staff.??????? Production system testing often requires a significant amount of planned downtime, especially in single-server environments.

Other planned events include items outside the control of IT personnel, such as the scheduled loss of utility power. Backup processes account for nearly 60% of all planned downtime (see Figure 2), and, whats more, planned downtime in turn often leads to unplanned downtime when a backup takes longer than expected. Traditionally, IT personnel backed up systems and servers overnight when everyone had gone home. But todays server environments are rapidly approaching (or have already met) 24-hour utilization, so the traditional backup window has not only shrunk, it may have disappeared altogether.

Figure 2 Planned downtime activity breakdown.

One other interesting statistic: 25% to 50% of the backups are unreliable,. UNPLANNED DOWNTIME About 20% of all downtime is unplanned. As can be seen from Figure 3, human error is the singlelargest reason for unplanned downtime. Disasters are a small contributor to the problem of unplanned downtime, but these highprofile problems draw attention and resources away from the lesssensational planned issues that constitute the bulk of downtime and, in many cases, hit the bottom-line harder because of their frequency.

Figure 3 Unplanned downtime causes.

Employee errors, as a percentage of all errors, have been trending upward steadily to the point where industry analysts now attribute as much as 60% of all system failures to human error. Hardware malfunctions, which have been declining steadily for the past 15 years due to increased product reliability and the skill of IT personnel at identifying potential faults prior to their impact, now account for 10% or less of all failures. Software failure has also

declined steadily over that time, but the trend has been not nearly as dramatic as for hardware. The operations-overruns component of Figure 3 is a measure of jobs or events running outside their planned window. This component has grown as a result of the increased demand for online access and the growth in the number of transactions processed.

Examining the Cost of Downtime


According to Dunn & Bradstreet, 59% of Fortune 500 companies experience a minimum of 1.6 hours of downtime per week. To put this in perspective, assume that an average Fortune 500 company has 10,000 employees who are paid an average of $56 per hour including benefits ($40 per hour salary + $16 per hour in benefits). Just the labor component of downtime costs for such a company would be $896,000 weekly, which translates into more than $46 million per year. Of course, this assumes that everyone in the company would be forced to stop all work in a downtime scenario, and that may not be so. But, since the operations of many companies are increasingly knit together by their information technology, system downtime now hampers the productivity of almost everyone in the organization, and completely sidelines a significant and growing percentage of them. Figure 4 charts the combined results of recent research from a number of analyst firms, showing the cost of one hour of service interruption for a variety of businesses. While the downtime cost for catalog sales and airline reservations may fall below $100,000 an hour, the downtime cost for some financial institutions is estimated at between $60,000 and $250,000 a minute! What does this mean for the economy as a whole? Some analysts estimate that U.S. business lost over $9 billion dollars due to downtime in 2003.

Figure 4 Estimated downtime hourly costs by industry. Not surprisingly, financial transactions are at the greatest risk, but no type of commerce can escape the considerable impact of downtime.

Figure 5, from Gartner, just one of many industry analyst organizations to document the issue, clearly illustrates the many facets of downtime costs. With this list of the primary cost contributors in place, its now time to turn to some of the math involved in calculating actual downtime costs. Following are some reasonable guidelines that will help to understand and calculate the costs and consequences of downtime as they may apply to their particular organizations.
Figure 5: Downtime cost contributors. Source: Gartner.

COST ELEMENTS OF DOWNTIME


TANGIBLES The primary tangible downtime costs include labor costs for idled workers; specific lost revenue; financial penalties, including statutory fines and late fees; costs required to get back to business; lost or spoiled inventory; and equipment replacement, repair and/or rental needed to get back online. Near the end of this paper, we have included an Annual Cost of Downtime Worksheet that will aid in identifying a variety of downtime costs and arriving at a total downtime impact number. But, for now, let us look more closely at two of the biggest tangible costs, downtime labor costs and lost revenue. Labor After lost revenue, the most readily apparent downtime cost comes from idled workers. The following is a good, basic formula for estimating annual labor costs from downtime: Labor Cost = LA x LB x LC x LD Now lets examine the components of the formula. 1. Determine the number of people impacted. (LA) Instances of past downtime can be used as the basis for seeing who downtime affects. Dont forget to survey every group or department.

2. Determine the extent of impact. (LB) The impact differs from department to department and from worker to worker within a department. Some people will merely have their productively reduced somewhat during a downtime event, while others will be completely idled. Estimate each group of workers decline in productivity as a percentage of typical output. Then either review the averages of the different groups and designate a reasonable company-wide average or isolate your study by individual employee groups to attain a somewhat higher degree of accuracy. 3. Determine the average employee cost per hour. (LC) We suggest that you meet with human-resources and financial personnel to agree on the average employee cost per hour in terms of salary, benefits and overhead. Look at each department and job classification, but the analysis can be made a little easier by combining work groups having similar job functions. When the numbers have been compiled, again, you will need to designate a reasonable company-wide average or you may choose to isolate your analysis to an individual employee group. Note that this typically yields a conservative estimate of the labor-related downtime costs as it represents the cost of employing people, whereas the lost value is the worth that they would have delivered which, because companies are in business to earn a profit, is higher than the labor costs. 4. Determine the number of annual downtime hours. (LD) This area of your study should be comparatively easy, but it is sometimes surprising how many companies neglect to record the details of serious events or maintain logs of scheduled procedures. Documentation of downtime hours will make the analysis of annual downtime a lot simpler and more accurate. Revenue Revenue is so fundamental to business that calculating normal hourly revenue flows should be something that can be done relatively quickly and accurately, although assessing the revenue impact of downtime may not be so cut and dry. The formula for revenue loss depends on a variable impact factor that colors the final results. A good, basic formula for estimating annual revenue loss from downtime is: Lost Revenue = (RA / RB) x RC x RD Now lets examine the components of the formula. 1. Determine gross annual revenue. (RA) The finance department should be able to provide a reasonable estimate of revenue for the current and next year. Be sure to include all revenue, including

products and services, because downtime is usually a company-wide hindrance. If your organization is subject to wide revenue swings, use a multi-year average. 2. Determine total annual business hours. (RB) Because business hours are usually a matter of company policy, it is generally easy to come up with a precise value for this factor. 3. Determine downtime impact factor. (RC) Examine the order flow for a given period immediately following the resolution of a previous downtime event. The amount by which sales exceed the typical pattern of orders per hour yields an estimate of come-back business. Calculate this as a percentage of normal orders per hour and subtract it from 100% to yield a first cut at the downtime impact on revenue, i.e., the percentage of sales that were lost during the downtime event and not recovered later. Some previously loyal customers may become so frustrated that they never come back, opting instead to switch allegiances to a competitor. You must consider the defectors lifetime value, which is the net present value of all purchases an individual defector would have made. Defecting customers increase the impact factor. Because of the difficulty of forecasting this value, an educated guess is likely the best you will be able to derive. The longer the downtime event, the greater the propensity for permanently lost customers, leading to impact factors that may substantially exceed 100%. The estimate of annual downtime hours that was derived when assessing downtime labor costs should then be used to calculate the total annual revenue lost due to downtime. Other Tangibles Following are other elements to be considered. In some instances, it might be difficult to determine the cost impact of these elements, but each requires serious consideration in any calculation. Getting back to business: re-entering data; rebuilding files and transactions; recovering manufacturing process; contacting customers and resolving issues left unattended while systems were off-line. Overtime pay may be required to accomplish these tasks. Equipment repair, purchase and/or rental to get back online. Marketing campaign to rectify tarnished image (campaign cost is tangible; longterm impact of fallen image is intangible). Lost or spoiled inventory. Missed financial-filing deadlines and possible penalties. Late-delivery penalties or added shipping costs.

Liability exposure relative to service level agreements, safety or health issues; attorney fees. Compensatory payments relative to breached contracts.

THE INTANGIBLES A list of potential intangible downtime costs is almost endless, subject to the characteristics and business models of individual companies. Here, we focus on just two, damaged reputation and employee morale. Other costs might include damage to credit ratings, analyst reaction to share value, negative publicity and competitive disadvantage in the market. Damaged Reputation It is an unfortunate fact that customers are much more likely to talk widely about bad customer service experiences than good. So its not hard to imagine the ill will resulting from a major system outage. The impact could be huge and take years to reverse. Of particular importance, too, is the tarnished image presented to investors and how a downtime event, particularly if it is reported in media, can impact on a companys stock price. Several years ago, when Amazon.com was off-line for a number of hours because of a server failure, the companys stock fell 25% in the next days trading. It may be difficult to assess the long-term effect of a damaged reputation without investing in surveys and other research. Even so, stock downturns are quite tangible, as are the marketing man-hours and media dollars required to reestablish and polish an organizations profile. In the absence of surveys and research, absolute revenue levels over the 24 months following a major downtime event as compared with the revenue levels before the event (allowing for expected growth) will serve as a fairly good indicator of the companys success in rectifying its image. Employee Moral Employees generally want to do a good job. If the tools necessary to do so are not available, or are unreliable, employees may begin to think that management doesnt care. Frustration may lead to careless behavior that could cut deeply into productivity. Whats more, such behavior has a way of spreading throughout an organization, sometimes with alarming speed. Disgruntled employees especially the good ones may leave, resulting in lost output until a replacement is found, not to mention costs to hire, train and gain equivalent productivity from replacement employees, which are often estimated at more than a years salary for each replacement employee.

FINAL CONSIDERATIONS Before we look at our Annual Cost of Downtime Worksheet, it is important to review a few other issues. 1. System Reliability Even if a CPU is 99% reliable, your system will not be 99% reliable. The multiple elements of the system CPUs, operating systems, power supplies, disk drives, database-management systems, application software, and network devices and connections all contribute nominal increments of unreliability that collectively may contribute to substantial downtime. For example, a system comprising 10 elements, each with 99% reliability, would have an overall reliability factor of 90.44% (0.9910) and would therefore be expected to be unavailable 9.56% of the time. In a 24x7, 365-day environment, this is almost 838 hours or 35 days of downtime annually. 2. Downtime Relative to Time of Occurrence Downtime costs vary with the time of an outage. Increasingly, we are becoming a 24x7 global economy, but not all businesses employ personnel who work during nighttime hours. A system outage at 3 a.m. for 2 minutes may have little impact on such an organization. Even round-the-clock businesses, such as most Webbased eCommerce, have highs and lows in activity throughout a 24-hour period. And, of course, downtime occurring on a weekday may have a dramatically different cost impact than downtime occurring on a weekend or holiday. 3. Concrete History of Planned Downtime As previously noted, planned downtime constitutes over 80% of all downtime, and the very fact that it is planned means the past record of maintenance, operations and periodic events is a reasonable basis for forecasting future planned downtime costs. Use 12 to 24 months of history because some planned activities, such as operating-system/software/hardware upgrades and disaster recovery testing, may vary considerably from month to month. Because the duration of some planned activities, such as database backups and reorganizations, vary depending on database sizes, be sure to adjust the number derived in this manner to accommodate organizational growth. 4. Significant Contractual Obligations Organizations that sign legally binding Service Level Agreements with their customers may suffer penalties as a result of lost availability. These costs are often significant and should be included in any downtime cost calculation.

Whats Next?
Mitigating downtime can be fairly straight-forward or complex, just like your business. Solutions vary widely based on requirements, perceived risk, and calculated ROI. Clearly, a solution that eliminates downtime risk and provides robust data integration for an international, multi-platform pharmaceutical enterprise will not be even remotely similar to the solution for an up-and-coming tool-and-dye shop chiefly concerned with disaster recovery. But both concerns are linked by the common understanding that information availability ensures competitive advantage and is an absolute necessity in modern markets. The reason for calculating potential downtime costs is to allow the organization to make rational decisions on the appropriate level of investment that should be made in enhancing business continuity. How to go about achieving business continuity is the subject of other Vision Solutions white papers and other documents. Please contact your Vision representative for more information.

Annual Cost of Downtime Worksheet


The following worksheet consolidates many of the key tangible and intangible cost components and other information discussed in this white paper, delivering a basic model for calculating a cost of downtime number. Since most decisions on an availability strategy or solution mix are based on a company-wide annual perspective, our worksheet is designed to establish a baseline annual number against which an ROI can be assessed. Readers using the worksheet are encouraged to reference applicable sections in this white paper for clarification, as needed. Results should be considered to be a beginning step in comprehensive downtime analysis.

2006, Vision Solutions, Inc. All rights reserved. BCSS is a trademark of Vision Solutions, Inc. iSeries is a trademark of IBM. All other trademarks are the property of their respective owners. Product specifications subject to change without notice. Vision Solutions is a member of the IDION group of companies. 17911 Von Karman, 5th Floor Irvine, California 92614 USA Tel: +1.949.253.6500 +1.800.683.4667 Fax: +1.949.253.6501 www.visionsolutions.com

You might also like