HMI 2010 EnterpriseReport Jan 2011

HMI?
How Much Information
Enterprise Server Information
Report on Enterprise Server Information
How Much Information? 2010
James E. Short Roger E. Bohn Chaitanya Baru
Date of Publication: January 2011 Last Update: December 2010 Website Publication: April 2011
How Much Information? 2010 Report on Enterprise Server Information
HOW MUCH INFORMATION? 2010

ACKNOWLEDGEMENTS EXECUTIVE SUMMARY 1 INTRODUCTION.......................................................................................................................................... 8 1.1 Data and Information................................................................................................................................................9 1.2 What is Enterprise Information?.............................................................................................................................10 1.3 Measuring Transaction Work..................................................................................................................................11 1.4 How Many Bytes?...................................................................................................................................................13 2 How Many Servers?.............................................................................................................................15 2.1 Types of Servers......................................................................................................................................................15 2.2 Counting Servers.....................................................................................................................................................15 2.3 World Server Sales..................................................................................................................................................16 3 World Server Capacity......................................................................................................................18 3.1 Server Workloads....................................................................................................................................................19 3.2 Core and Edge Computing Model: Server Measurement Points............................................................................22 3.3 Server Workload Allocations...................................................................................................................................24 3.4 World Server Capacities .......................................................................................................................................25 4 WORLD SERVER INFORMATION..............................................................................................................26 4.1 How Many Bytes? ..................................................................................................................................................26 4.2 Contribution to Total Bytes ....................................................................................................................................26 4.3 Our Results in Context: Comparisons with Other Information Studies .................................................................27 4.4 Discussion: Estimating Enterprise Information from Servers.................................................................................28 5 TRENDS, PERSPECTIVES AND CHALLENGES IN INFORMATION INTENSIVE COMPUTING 28 5.1 Measuring Capacity and Performance in Emerging Information Architectures.....................................................28 5.2 Data Intensive Computing Platforms......................................................................................................................29 5.3 Back to the Future: Data Discovery, Data Generation, Data Preservation..............................................................30 APPENDIX .....................................................................................................................................................32 ENDNOTES....................................................................................................................................................33
TABLES AND FIGURES Figure 1: Information Flows In An Enterprise ................................................................................................................ 11 Figure 2: The Flows We Are Interested In ......................................................................................................................12 Figure 3: Example Server Workloads ..............................................................................................................................13 Figure 4: Modern Computer Servers ...............................................................................................................................16 Figure 5: Improvements In Server Performance ............................................................................................................19 Figure 6: Schematic Of Modeling Approach .................................................................................................................20 Figure 7: TPC-C Simulated Workflow ...........................................................................................................................21 Figure 8: SPECweb2005 Simulated Workflow ..............................................................................................................22 Figure 9: VMmark Simulated Workload..........................................................................................................................23 Figure 10: Three-Tier Web Server Configuration ...........................................................................................................24 Figure 11: Core and Edge Computing Model .................................................................................................................25 Figure 12: Estimated Workload Percentages 2004-2008 ...................................................................................................................26 Figure 13: World Server Information Summary 2008..........................................................................................................................27 Figure 14: Contribution to World Server Information 2008............................................................................................29
Table 1: World Server Information..................................................................................................................................... 9 Table 2: Data and Information.......................................................................................................................................... 10 Table 3: World Server Information by Server Class 2008 in Zettabytes ......................................................................... 14 Table 4: Installed Base, Shipments and Retirements of Servers for the World and U.S., 2000-2005............................. 17 Table 5: World Server Sales 2004-2008........................................................................................................................... 17 Table 6: Performance Benchmarks by Server Class by Server Workload ...................................................................... 20 Table 7: Estimated Workload Percentages by Year.......................................................................................................... 24 Table 8: Server Potential Capacities 2004-2008............................................................................................................. 26 Table 9: Contribution to World Server Information 2008 ............................................................................................... 28
ACKNOWLEDGEMENTS
This report is the product of industry and university collaboration. We are grateful for the support of our industry sponsors and university research partners. Financial support for the HMI? research program and the Global Information Industry Center is gratefully acknowledged. Our foundation and corporate sponsors are: Alfred P. Sloan Foundation AT&T Cisco Systems IBM Intel Corporation LSI Oracle Seagate Technology Special thanks for research and technical advice is extended to the following individuals: Richard Clarke, AT&T Clod Barrera, IBM Jeffrey Smits and Terry Yoshii, Intel Dieter Gawlick, Garret Swart and Thomas Oestreich, Oracle Dave Anderson, Brook Hartzell and Jeff Burke, Seagate Bruce Herndon, VMware The authors bear sole responsibility for the contents and conclusions of the report. Questions about the report may be addressed to the Global Information Industry Center at the School of International Relations and Pacific Studies, UC San Diego: Roger Bohn, Director rbohn@ucsd.edu Jim Short, Research Director jshort@ucsd.edu Pepper Lane, Program Coordinator pelane@ucsd.edu 858-534-1019 Press inquiries should be directed to Rex Graham, IR/PS Communications Director, ragraham@ucsd.edu (858) 534-5952 Center Website: http://hmi.ucsd.edu/howmuchinfo.php Report Design by Theresa Jackson, Orchard View Color: www.orchardviewcolor.com

James E. Short Roger E. Bohn Chaitanya Baru
How Much Information? 2010
Executive Summary
In 2008, the worlds servers processed 9.57 zettabytes of information, almost 10 to the 22nd power, or ten million million gigabytes. This was 12 gigabytes of information daily for the average worker, or about 3 terabytes of information per worker per year. The worlds companies on average processed 63 terabytes of information annually. Our estimates come from an analysis of the total work capacity of the installed base of computer servers in enterprises worldwide. Information through non-computer sources telephones or physical newspapers for example is not included. We define enterprise server information as the flows of data processed by servers as inputs plus the flows delivered by servers as outputs. A single chunk of information, such as an email message, may flow through multiple servers and be counted multiple times. Two-thirds of the worlds total of 9.57 zettabytes was processed by low-end, Entry-level servers costing $25,000 or less. The remaining third was processed by Midrange and High-end servers, those costing more than $25,000. Transaction processing workloads issuing an invoice, paying a bill, checking a stock level amounted to approximately 44% of all the bytes processed. Web services and office applications contributed the other 56%. Servers configured as virtual machines processed about half of all the bytes in Web services and office applications. We also conducted a separate analysis of improvements in server performance and capital cost. Midrange servers processing Web services and business application workloads doubled their performance per dollar in 1.5 years. Raw performance for this server class doubled approximately every 2 years. High-end servers processing transaction workloads had the longest doubling times: both performance/cost and raw server performance doubled approximately every 4 years. This report covers how much information was processed by the installed base of computer servers in companies worldwide in 2008. It complements an earlier report on information consumption, which estimated 3.6 zettabytes of information was consumed by American households in 2008. Later reports will cover storage systems and enterprise networks.
1 INTRODUCTION
Businesses today are awash with information and the data used to create it. Daily, managers are confronted with growing information volumes far greater than can possibly be consumed.1 Where is all of this data being created? What happens to it? How much information is being processed by computer servers in companies worldwide?
The goal of the How Much Information? Program is to create a census of the worlds data and information. How much information is created and consumed annually? What types of information is created? Who consumes it? And what happens to information after it is used? For our purposes, we distinguish between information that is created and used in organizations work information used for productive purposes and consumer information seen or heard by people not at work information created and used for consumption. We expand on Of course, the data processed by servers is not these definitions below. Last year we reported on all of the information that exists in any company consumer information for different media in and (although in most companies it is likely to be the outside the home, such as watching television, great majority of it). There is a wealth of paper playing computer games, going to the movies, documents in every organization, and there exists listening to the radio or talking on a cellular phone.2 Nationwide, we found that Americans spent approximately 11.8 This study covers how much information was processed and delivered by the installed hours viewing or listening to media on an average base of computer servers in enterprises worldwide in 2008. Server capacity measures day. They consumed 3.6 were derived from world sales and shipments data published by analyst firms Gartner and zettabytes of information IDC. Server performance was estimated using industry benchmarks published by the in 2008, or approximately Transaction Processing Performance Council (TPC), the Standard Performance Evaluation 34 gigabytes per person Corporation (SPEC), and VMware. We used industry standard benchmarks to define a per day. Our estimate is consistent measure of server work performed, and converted it into its byte equivalent. many times greater than totals from previous Performance data was taken from server test results submitted to benchmark standard studies. Why? In large bodies by hardware vendors. We adjusted this data based on the date of availability of the part the difference lies in test system and other factors. our use of a very inclusive definition of information - we measured the flow of information, not just the fraction of information many digital data storage devices, from personal that is retained. A zettabyte is 10 bytes, or 1,000 storage media DVDs, flash drives and the like - to billion gigabytes. See Appendix: Counting Very mammoth capacity, enterprise storage systems. Data Large Numbers. stored and archived on storage systems is defined as data at rest. Data at rest requires data processing Our second report conveys our findings for and output to constitute what we define as enterprise information. We define enterprise server information. Conversely, paper documents, records, information as the flows of data processed by image libraries and the like have long been defined as computer servers as inputs plus the flows delivered information in printed or image form, such as that
by servers as outputs.3 Note that our definitions of work and consumer information are different and complementary. Work information is data processed for productive use for example, to guide an immediate action, or to use as context for a future action. Consumer information is data processed and delivered for consumptive use to delight, to entertain, to enjoy. How much work information is processed by computer servers annually in companies worldwide? And why are we using computer servers to estimate the flow of data processed? Servers are the digital workhorses of the modern firm. Servers host the companys work applications, process the data flows, and manage the data traffic going in and out of the firms storage systems. Small companies may have tens of servers of varying sizes and capacities; large enterprises may have tens of thousands of servers. We do not include in our definition information that people may see or hear while at work that is not processed by servers.
What does this report cover?
stored in case files in a law library, or customer files archived for long-term storage in an outside storage facility. We will add these sources in the future.
Section 1 introduces our concepts and measurement methods. Section 2 looks at different types of servers and how to count them. Section 3 considers server workloads, charts server performance measured by industry benchmarks, and calculates world server capacity. Section 4 summarizes total annual server information and different contributions to that total.
Table 1: World Server Information

Summary of Results What is measured
Bytes processed plus delivered Bytes per worker per year
World 2008 Total

9.57 zettabytes
Notes
3.01 terabytes
3.18 billion workers in world labor force
Bytes per worker per day
12.0 gigabytes
Section 5 discusses some interesting factors in enterprise information growth, server performance and data intensive computing platforms.
1.1 Data and Information

Bytes per company per year 63.4 terabytes 151 million world businesses registered
Sources: CIA Factbook, World Labor Organization, Dun & Bradstreet
A few highlights from our findings (Table 1): 9.57 zettabytes of information was processed by servers in companies worldwide in 2008. That amounts to: 3.0 terabytes of information per worker per year, or 12 gigabytes per worker per day (based on the ILO and CIA Factbooks estimate of 3.18 billion people in the world labor force in 2008).4 63 terabytes of information per company per year (based on Dun & Bradstreets 151 million world businesses registered with D&BS D-UN-S system in 2008).5 Two thirds of the world total of 9.57 zettabytes of information was processed by low-end, entry-level servers costing less than $25,000 USD per machine. The remaining third was processed by midrange and high-end servers, costing between $25,000 and $500,000 (for midrange servers), and over $500,000 (for high-end servers). Our report is divided into five sections:
Data are collections of numbers, characters, images or other outputs from devices that represent physical quantities as artificial signals intended to convey meaning. Artificial because data is created by machines such as sensors, barcode readers, or computer keyboards. Digital data has the desirable properties that it is easy to capture, create, communicate and store: so easy in fact, that increasingly we are flooded by it. Information is a subset of data, data being the lowest level of abstraction from which information and knowledge are derived. In its most restrictive technical meaning, information can be thought of as an ordered sequence of signals.6 Information processing refers to the capacity of computers and other information technology (IT) machinery to process data into information. Past high-level studies of enterprise information have generally measured data of only two kinds: the data that gets stored on physical storage media, and communications data that is in flow, transmitted over local-area or wide-area networks in the firm.7 Unlike data, information has the further property that it must have meaning for its intended use.8 People define that meaning, whether it is the information required for an immediate decision or the collection of background information for a judgment or action to be taken in the future. The amount of human involvement increases as we move from a focus on data to one of information we store data on computers; we use computers to create and manage information. (Table 2)
10
1.2 What is Enterprise Information?

There is no common definition of enterprise information in use today certainly there are hundreds. For our purposes, we define enterprise information as the flows of data processed by servers as inputs, plus the flows of data delivered by servers as outputs. As every company has its own unique computing architecture, we have based our estimates on a simplified model of computing and information flows in the firm. (Figure 1) The flows of data from storage devices to servers to user devices are the flows we are interested in. (Figure 2) Our model assigns servers processing different types of work for example, a database server or a web server, or a server configured to run as a virtual machine to the appropriate industry benchmark. We use the performance data reported by the benchmark along with server hardware costs to derive measures of server capacity.
Data
Table 2: Data and Information

Information Data with meaning for intended use
Requires human intervention to define meaning (relevance, purpose) Requires consensus of meaning for action Can be replicated, but often hard to transfer accurately Can be stored, but often difficult to recall economically
Artificial signals intended to convey meaning

Easily captured by machines Easily manipulated Easily transferred In form suitable for quantification Easily stored
Which Bytes?
Our analysis estimates the amount of enterprise information by counting the number of bytes processed and delivered to end users or to applications accessed by end users. Why bytes? And what is the relationship between bytes and information? We utilize a set of benchmarks (TPC, SPEC, VMmark) representative of enterprise workloads. We compute the total number of bytes delivered by deriving how many bytes are processed or delivered by transactions and applications defined within each of the standardized benchmarks. Total bytes are each benchmarks measure of how much work the server has performed. How well this definition of information matches information in a real enterprise environment depends upon how well the selected benchmarks represent the transaction and application work performed in companies.
typically trades off the number of requests the server must process, with the number of bytes that the server must transfer (disk I/Os). Virtual Machine processing. Virtualization is an important software technology deployed in most companies today. The basic principle is that virtualization allows a single physical machine to run multiple virtual machines, sharing the resources of the single machine across multiple applications. Different virtual machines can run different operating systems and applications on the same physical computer. We include virtual machine processing of multiple workloads in our analysis. Application processing. Our analysis includes some, but not all application processing done
Our byte total is based on the analysis of performance data from four standard industry workloads: Online transaction processing (OLTP). The OLTP workload processes clerical data entry and retrieval processes in a real time transaction processing environment. OLTP workloads require very high server performance, are optimized for a common set of transactions, and in large firms support thousands of concurrent users. Web server processing (WWW or Web). Web server workloads process documents (in the form of Web pages) to the Web clients requesting them a typical application would be a user searching for information using a Web browser. Web server performance
11
Figure 1: Information Flows In An Enterprise

in s ed ss ture e c c ro hite p d arc an ut rver p e n i s ta er Da lti-ti u m
Proc ess devi ed data ces as in delivere d form ation to edge flow s
Web Application Servers Web Presentation Servers
Edge Servers
red Sto flows a t da
Database Servers
Storage Devices
red Sto flows a t da
Application Servers Web Servers
User Devices
Store storag data on e devic es
ta da es t c pu vi In de rs cal e Us n lo o
Access applications and data at edge and core
on application servers. Examples of application server workloads not directly measured would include customer relationship management (CRM) and human resources management (HRM), and business analytics. We will add these sources in the future. Our definition of information emphasizes the flow of data processing and data outputs. We count all instances of data processing and every flow delivered as output. Our definition expands on many other definitions of enterprise information.9 An alternative approach, for example, could go to the opposite extreme: only counting data that is stored on some media somewhere in the firm - printed material, digital images or digital video whether that data is subsequently used or not.10
1.3 Measuring Transaction Work

Servers are the natural starting point for the analysis of enterprise information. Servers input, process and output the raw material of the firms information resources - the companys data transforming that data into usable information and directing it to its point of use. Measuring the work done by servers counts the great majority of the total amount of information in any enterprise. Our goal is to estimate how many bytes of information are being processed annually by all servers in the world. To do this, we need a common yardstick that we can apply to all servers and to all types of work being completed in firms.
12
Figure 2: The Flows We Are Interested In

Database Machines
Application Servers Database Servers Web Servers
Web Services
Edge Servers
Virtual Machines Storage Devices User Devices
Virtual Machines
We divide this into the components: Total bytes per year = World server capacity, expressed in transactions per minute Bytes per transaction Annual load factor (hours of operation x fraction of full load) Where: World server capacity in transactions per minute = $ spent / $ per measured transactions per minute We focus on transactions as the unit of work performed by servers for both heuristic and practical reasons. Workload transactions are common to all enterprises regardless of company size, industry sector, or technology complement. All companies process orders, make payments, pick from inventory, and deliver products and services to their customers. In support of these activities,
company IT departments run application and email servers, manage file and print servers, and provide Web access and Web services. The diverse mix of transaction work performed by servers has been classified into workloads, shorthand for application-level transactions and data processed by servers. (Figure 3) The most important of these workloads have been simulated in benchmarks designed to test server performance and derive comparative price-performance measures. We use results from three of the most extensively applied industry benchmarks, TPC-C, SPECweb2005, and VMmark. Each benchmark, explained in Section 3, simulates one or more enterprise workloads and computes results for server transaction performance, which we convert into byte equivalents. All told, we analyzed price, performance and capacity data for over 250 servers tested from 2004 to 2009 using one or more of the benchmarks.
13
1.4 How Many Bytes?

Our calculations for measuring enterprise information start by breaking down server types and locations where servers are deployed in firms. Analyst firms Gartner and IDC map the server
Midrange servers are machines costing less than $500,000. Server systems in this price range encompass many configurations, including multi-core, multi-processor tower systems, blade servers and small mainframe servers. Midrange servers are housed in server
Figure 3: Example Server Workloads

Financial (eBanking) Consumer (eCommerce) Web Services (Search) Email and Messaging
Web Servers:
Online Transaction Processing (OLTP) Online Analytical Processing (OLAP)
Database Servers:
Web Application Servers
Database Servers
Web Presentation Servers
Edge Servers
Application Servers
Web Services (Search) Email and Messaging Firewall and Security File and Print
Edge Servers:
Web Servers
Virtual Machines
Web Services and eCommerce Application Processing File and Print Email and Messaging Customer Relationship Management Financial Management Supply Chain Management
Virtual Machines:
Application Servers:
market into three price ranges, according to the price (stated as factory revenue) of the manufacturers entry-level system in each price range.11 Entry-level servers are machines priced less than $25,000. Servers sold in this category in 2008 were dual core, single or dual processor machines with a minimum of frills typically they would be deployed in non-critical business application areas, configured for low-cost general computing. An example workload would be a file and print server.
closets, server rooms and company datacenters, and run a diverse mix of workloads, including transaction processing, Web services and email, online transaction processing and virtual machines. More expensive midrange servers would be deployed in medium and critical business application areas and would be managed by professional IT staff. High-end servers are machines costing over $500,000. These systems are large, complex, multi-core, multi-processor mainframe servers
14
Computer Transactions
In computer programming, a transaction is an activity or request involving a sequence of data exchange and data processing. In a database management system, transactions are sequences of operations that read or write database elements. Orders, purchases, changes, additions and deletions are typical business transactions. An example of an order-entry transaction would be a catalog merchandise order phoned in by a customer and entered into a computer by a telephone sales representative. The order transaction involves checking an inventory database, confirming that the item is available, placing the order, confirming that the order has been successfully placed, and advising the customer of the expected time of shipment. As a rule, the entire sequence is viewed as a single transaction, and all of the steps must be completed before the transaction is successful and the database is updated.
located in mid-tier and corporate datacenters. They almost always are deployed exclusively to business critical application workloads where very high performance and very high reliability are required. Example workloads are online transaction processing (OLTP) and online analytic processing (OLAP). Much of our research has gone into estimating the amount of information processed by each server class for the installed base of servers worldwide in 2008. For this measure of information we used bytes the number of bytes processed as input and bytes delivered as output. When measured in bytes, our results show that servers are processing an enormous quantity of information work in firms today. Entry-level servers (lower performance but far more numerous) processed 6.31 zettabytes of information in 2008, 66 percent of all enterprise information created worldwide (Table 3). Midrange servers processed 2.8 zettabytes or approximately 29 percent. Highend servers processed 451 exabytes of enterprise information, or approximately 4.7 percent of the total. Our estimate of 9.57 zettabytes is many times greater than that found in previous studies. A March 2008 study by IDC reported that the total worldwide digital universe in 2007 was approximately 281 exabytes, and would not reach one zettabyte until 2010.12 According to IDC, companies created, captured or replicated about 35% of the total digital universe, approximately 14 exabytes, coming directly from servers in corporate datacenters. Why is there such a huge discrepancy in our numbers? Potentially there are many possible factors, and IDC probably did not include throughput data (data processed and output) in its estimates.13 We comment further on how our results compare with other information studies later in this report.
Table 3: World Server Information by Server Class 2008

Entry-level
Total Bytes by Server Class (in zettabytes) 6.31
Midrange
2.80
High-end
.451
Total
9.57
Server Workloads
A server workload is the amount of work that a server produces or can produce in a specified period of time. But what do we mean by work? Technically, workload refers to both the request stream presented by clients (the work) as well as the server response to the requests (the load). An example would be users submitting requests to a Web server to search and display Web pages showing product information for an eCommerce purchase. How quickly the server responds to the requests defines the load. Typical server workloads would include eCommerce transactions, database transactions, Web server and email server transactions, and file-and-print. We analyzed server performance data for ten simulated enterprise workloads.
15
Measuring Information Value Instead of Quantity?

We quantify enterprise information by summing bytes processed plus bytes delivered by each server. This puts equal weight on every byte. If we could look at the details of individual information flows, we could use a measure that counts some information as more valuable than others. But there are no generally accepted metrics for information value, much less a practical way to measure in detail. Similar valuation issues come up with any aggregate measure, such as total economic activity (is every dollar equally valuable?) or total miles driven (a speeding ambulance is more valuable than joyriding). Our measure does capture the higher implicit value of redundant flows. When an enterprise views some information as important enough to process it on redundant servers, the worlds total server capacity reflects this and so does our measurement.
a Web site. A shared server may be accessed by multiple users for multiple purposes. Dedicated servers are more common in larger businesses where computing resources are customized for specific needs. Dedicated servers provide faster data access, allow higher network traffic rates, and can be more closely controlled (for example, for performance, security, or backup). Servers are configured by the work tasks they perform. For our purposes, we are most interested in the following server types: Database Server: A computer server dedicated to database storage and retrieval. The database server holds the database management system and accesses the companys data storage devices. Application Server: Application servers are dedicated to running one or more software applications. Web Server: Web servers host internal or external Web sites, serving Web pages back and forth to users. Mail Server: Mail servers host the companys email system. File Server: File servers house applications that are configured to send and receive files within applications. Think of them as a superset of the server types above. A database server may be part of a file server, for example. Server benchmark testing is organized by server type and by simulated enterprise workload. Benchmark results are used in industry to compare the performance and price-performance of different servers and workloads. For example, the Transaction Processing Performance Council benchmarks database servers using the TPC-C benchmark. Since 2001, over 250 performance tests of database servers have been conducted. As we will explain, we make use of TPC-C and other server benchmarks to compute the amount of server work performed.
2 How Many Servers?

Servers are the information workhorses of the modern firm. (Figure 4) Servers are powerful computers that serve out applications and data to networks of users. For example, a company Web server connects users to the Internet, providing users with the compute resources necessary to access Web pages and background support such as firewall protection. Servers are ubiquitous. Small firms may have a few servers; large enterprises will have tens of thousands. Google reportedly has the largest installed base of servers in the world, estimated to be over one million.14 Similar estimates have Microsoft between half a million and threequarters of a million servers worldwide.15
2.2 Counting Servers

Our analysis relies on data and estimates we have calculated from IDC and Gartner data reporting server sales, the installed base of servers and server shipments, plus measured data and estimates of server capacities for representative server models for each server class.16 Gartner and IDC data are widely used in the IT industry, but as with all data their strengths, weaknesses and comparability should be clearly understood before drawing conclusions.
2.1 Types of Servers

Servers are either dedicated or shared. A dedicated server performs only one task such as hosting
16
Figure 4: Modern Computer Servers
We report in Table 4 data on the installed base of servers published by Koomey (2007) for the years 2000-2005.17 The estimated total installed base of servers worldwide, including shipments and retirements, is shown in Table 4. Entry-level servers dominate the installed base, representing over 90% of the total number of servers worldwide on a unit basis. Midrange servers comprise most of the rest. Highend servers represent only a few tenths of one percent of the total number of servers on a unit basis. Depending on the server class chosen, the U.S has about 30 to 40 percent of the servers in the world.
2.3 World Server Sales

U.S. and world server sales, a barometer of the U.S. computer manufacturing sector, are closely tracked by analysts and vendors. Broadly speaking, servers are a $50+ billion market worldwide. Since 2006, annual shipments for all server classes have averaged slightly over 8 million units.
Our analysis has relied on annual and quarterly sales data published by IDC and Gartner Group (all data in U.S. dollars). We used IDCs publicly released data for annual worldwide sales and quarterly, year-over-year percentage increases or decreases in sales for three classes of servers entry-level, midrange, and high-end, as our baseline dataset.18 IDC reported total worldwide server sales were $53.3 billion dollars in 2008. (Table 5) Shipments came in at just over 8.1 million units. Entry-level server sales were $29.3 billion; midrange sales were $11.7 billion, and high-end server sales were $12.3 billion. Reflecting recession effects, the market contracted 14 percent in the final quarter of 2008, to $13.5 billion.19 Worldwide server unit shipments declined 12 percent compared to the same quarter in 2007. Overall, the 2008 market declined approximately 3.3% to $53.3 billion dollars. Unit shipments, however, grew slightly to 8.1 million units. Table 5 presents annual sales in U.S. dollars for all server classes for the years 2004-2008. The
17
Table 4: Installed Base, Shipments and Retirements of Servers for the World and U.S., 2000-2005 (in 000s)
World Totals Year
2000 2001 Installed Base 2002 2003 2004 2005 2000 2001 Shipments 2002 2003 2004 2005 2000 2001 Retirements 2002 2003 2004 2005
U.S. Total
14,114 17,555 18,492 20,125 24,746 27,282 4,223 4,198 4,397 5,237 6,275 7,017 1,905 757 3,461 3,603 1,655 4,481
Entry-level
12,240 15,596 16,750 18,523 23,441 25,959 3,926 3,981 4,184 5,017 6,083 6,822 1,631 626 3,030 3,243 1,165 4,304
Midrange
1,808 1,890 1,683 1,540 1,238 1,264 283 206 204 211 184 187 264 125 411 355 485 161
High-end
65.6 69.1 59 62.3 66 59.4 13 10.4 9.4 8.8 8.6 8.5 10 6.9 19.6 5.5 4.9 15.1
Entry-level
4,927 5,907 6,768 7,578 8,658 9,897 1,659 1,492 1,714 2,069 2,517 2,721 300 513 853 1,259 1,437 1,482
Midrange
663 701 574 530 432 387 111 66 67 76 53 62 116 28 194 120 151 106
High-end
23 22.5 23.1 21.4 23.3 22.2 4.8 3.6 3.1 2.9 2.8 2.6 5 4.1 2.5 4.6 0.9 3.7
Total
5,613 6,630 7,365 8,130 9,113 10,306 1,774 1,562 1,784 2,148 2,572 2,786 420 545 1,049 1,383 1,589 1,592
Source: Koomey (2007) and IDC. Units are in 000s. Notes: 1 Installed base is measured at the end of the year (December 31). 2 Installed base and shipments include both enterprise and scientific servers. The data does not include server upgrades. 3 Retirements are calculated from the installed base and shipments data. 2000 Retirements calculated using the 1999 installed base and year 2000 shipments. 4 World includes the U.S.
Table 5: World Server Sales 2004-2008

Annual World Server Sales 2004-2008 Current U.S. Dollars (in billions) Server Class
Entry-level Midrange High-end Total
2004
$24.40 $12.80 $12.20 $49.5
2005
$27.30 $12.80 $11.60 $51.80
2006
$28.50 $12.20 $12.00 $52.80
2007
$30.80 $12.60 $11.60 $55.10
2008
$29.30 $11.70 $12.20 $53.30
Total
$140.50 $62.30 $59.90 $262.70
units reported in Table 5 are current dollars spent. Current dollars are appropriate for our purposes because price-performance ratios each year are based on current dollars. Midrange and highend server revenue was relatively flat over our target years 2004-2008. Entry-level server revenue increased from 24.4 billion in 2004 to 30.8 billion in 2007. Midrange and entry-level server revenue fell in 2008; high-end server revenue increased slightly over the same period.
Source: HMI? 2010. Data compiled from IDC Quarterly Server Tracking Reports, 2004-2009.
18
Where are Google, Microsoft and Yahoo! Servers In These Numbers?

The IDC and Gartner data may under estimate the number of custom servers in use at large Internet companies such as Google, Microsoft and Yahoo! These companies order components such as personal computer motherboards directly from the manufacturer, and assemble custom-designed severs themselves.20 Google, Microsoft and Yahoo! do not release internal data on their server numbers, but a cottage industry of energetic observers exists in the blogsphere, routinely publishing estimates (some of which may even be accurate). Google has been reported to have over 1 million servers; Microsoft and Yahoo! each perhaps half that. If all of these servers were custom-designed units, and all were entry-level class servers, adding 2 million servers to the entry-level server category for the world in 2005 would increase the installed base from 25.9 million to 27.9 million servers, or about 8%.
3 World Server Capacity

The tremendous growth in datacenter capacity required by new consumer and enterprise media is a familiar topic in todays business press. Nielsen reported that unique visitors to Twitter.com increased 1,382 percent year-over-year, from 475,000 unique visitors in February 2008 to 7 million visitors in February 2009, making it the fastest growing site in Nielsens Member Communities category for the year 2008. Facebook grew 228% year-over-year, from 20 million unique visitors in February 2008 to 65.7 million visitors in February 2009.21 Projecting these trends forward, HP expects a 650% growth in enterprise data over the next five years, with more than three-quarters of that growth in unstructured data.22
Server capacity and performance can be defined both technically and operationally. Technically it refers to the servers theoretical capacity what is the servers hardware capacity to input, process and output work, measured in a maximum transaction rate or in total bytes? Operationally, capacity refers to the ability of a server configuration to meet future resource needs. A typical capacity concern of datacenter managers is whether the server, storage and network resources will be in place to handle an increasing number of requests as the number of users and transactions increase. Planning for future increases is an ongoing task in a datacenter managers life capacity planning. For our purposes, we are interested in analyzing a snapshot of the installed capacity and the utilized capacity of all world servers in 2008. We define installed capacity as the sum of the maximum performance ratings of all installed servers in
Improvements in Server Performance: Doubling Time

We also conducted a separate analysis of improvements in server performance and capital cost over the years 2004-2008. Measuring computing performance is controversial, since performance is context-specific and many factors are involved.23 We used a doubling time metric, defined as the number of years it takes for a given parameter to double, in order to compare our results with other studies using this metric.24 Our results were as follows: Midrange servers processing Web services and business application workloads doubled their performance per dollar in 1.5 years. Raw performance for this server class doubled approximately every 2 years. At the other end, high-end servers processing transaction workloads had the longest doubling times: both performance/cost and raw server performance doubled approximately every 4 years.25 (Figure 5)
19
Figure 5: Improvements In Server Performance

Price-Performance
SPEC Midrange
Performance
SPEC Entry-level
of server capacity how many hours are servers actually working? What is their average load factor? What workloads are they processing? Recalling our car example, if we think of a server as a delivery truck delivering bytes, how many hours is a delivery truck operated in a year (hours worked)? What is the trucks average speed compared to its maximum speed (load factor)? How many packages can it carry at a time (bytes)? Figure 6 illustrates our modeling approach.
TPC-C High-end
3.1 Server Workloads

Our analysis uses workload transaction performance measured by three benchmarks TPC-C, SPECweb2005, and VMmark - for three classes of servers entry-level, midrange and highend (Table 6). The three benchmarks test different workloads, with some overlap: TPC-C tests a single workload, online transaction processing; SPECweb2005 tests three Web server workloads, Banking, eCommerce and Support; VMmark tests six concurrent virtual machines (VMs) each running a single workload: Database server, Mail server, Web server, Java server and File server. One VM workload, Standby server, is configured as part of the VMmark performance test, but runs entirely in the background as a fixed load, with no results
TPC-C Midrange
TPC-C Entry-level
Doubling Time In Years (a shorter doubling time is better)

Notes 1. Price-Performance calculated using HMI? adjusted server hardware costs. 2. Performance calculated using TPC-C and SPECweb2005 benchmark results for representative servers in each server class for years 2004-2008. 3. SPECweb2005 benchmark results are for Banking workload. Sources: TPC-C, SPEC, HMI? 2010
2008. Utilized capacity is defined as the sum of the measured performance of all installed servers that year, adjusted by server load factors and the hours the servers are available for use. Since neither number can be directly measured, we estimate both. Our capacity model expresses installed capacity as the maximum number of transactions per minute theoretically possible for all servers added together in 2008. What do we mean by this? Imagine for a moment that every server in the world was running at its maximum performance rating, and we had a way to accurately count the number of transactions that every server processed. Installed capacity would be the total number of transactions processed in a year. This sum is a theoretical maximum, not a realistic one. Defining installed capacity in this way is akin to saying that the maximum performance capacity for all automobiles in the world is the sum of their top speeds driving flat-out down the highway. It is an interesting number, but not a realistic one. Instead, we need to adjust our theoretical maximum by taking into account the estimated utilization
A New Way to Measure Capacity

One contribution of this research is a way to aggregate capacity over very different kinds of servers. When server prices vary from $500 to $500,000, counting server units (boxes) is not an adequate measure. Instead we implicitly assume that companies spend dollars for server capacity in the most efficient way for their specific requirements. Then we carefully measure capital cost per unit of benchmarked performance. Then we translate different benchmarks into a common unit: bytes. This is discussed further in Section 5.
20
Figure 6: Schematic Of Modeling Approach

1
By sales year 2004-2008 By server type (Entry-level, Midrange, High-end) By workload W (TPC, SPEC, VMmark)
200x Worldwide Server Sales in current U.S. dollars
Adjusted Price Performance 200x Reference Server

($ per benchmark transaction / hour)
Bytes Per Transaction

(byte equivalent for each benchmark transaction)
Installed capacity of 200x servers to process workload W (bytes/hour)
Summing over sales, server types and workloads
Sales year 2004-2008
Workload W TPC, SPEC,VM
Installed capacity of 200x servers to process workload W

(bytes/hour)
hours/year X fraction of full load
Server availability
(% servers each workload)
Workload Allocation
Total Bytes
all types
Server types Entry, Mid, High
reported (this workload is intended to simulate the use of a standby server for peak load or back-up processing as would be typical in an operational environment).26 TPC-C (Online Transaction Processing) The TPC-C benchmark simulates a large wholesale outlets inventory management system. The test system is made up of a client system, which simulates users entering and receiving screen based transaction data, a database server system, which runs the database management system (DBMS), and the storage subsystem, which provides the required disk space for database and processing needs.27 The performance of the test system is measured when it is tasked with processing numerous short business transactions concurrently (Figure 7). The TPC-C workload involves a mix of five concurrent transactions of different types and complexity either executed on-line or queued for deferred execution:28 New Order: a new order entered into the database (approx 45%) Payment: a payment recorded as received from a customer (approx 43%)
Order Status: an inquiry as to whether an order has been processed (approx 5%) Stock Level: an inquiry as to what stocked items have a low inventory (approx 5%) Delivery: an item is removed from inventory and the order status is updated (approx 5%) TPC-C publishes two sets of results: raw performance, measured in transactions per minute, and price-performance, where the cost of the test system is divided by the transaction rate. We adjusted the published costs to reflect realistic hardware configurations.
Table 6: Performance Benchmarks by Server Class by Server Workload

Entry-level
Database Server Web Server Virtual Machine
Six virtual machines each running a single workload concurrently
Midrange
TPC-C SPECweb2005 VMmark
High-end
TPC-C
TPC-C SPECweb2005 VMmark
21
SPECweb2005 (Web server) SPECweb2005 is a benchmark published by the Standard Performance Evaluation Corporation (SPEC) for measuring a systems ability to act as a Web server. The benchmark is designed around three workloads: banking, e-commerce, and support. SPECweb2005 reports a performance score for each of the three workloads, measured in the number of simultaneous user sessions the system is able to support while meeting quality of service (QOS) requirements. An overall, weighted score is also reported.29 (Figure 8) The three workloads are designed to simulate enterprise applications and contain the following tasks: - SPECweb2005_Banking The banking load emulates a user session where the banking site transfers encrypted and non-encrypted information with simulated users. Typical user requests include log-on/log-off, bank balance inquiry, money transfers, etc. - SPECweb2005_Ecommerce The e-commerce load emulates an e-commerce site where customers
browse product information and place items in a shopping cart for purchase. Simulated activity includes customer scanning of product web-pages, viewing specific products, placing orders in a shopping cart and completing the purchase. - SPECweb2005_Support The support workload emulates a vendor support site that provides downloads such as driver updates and documentation. The load simulates customers viewing and downloading product and support documentation. VMmark (virtual machine workloads) VMmark, published by VMware, is the first virtual machine benchmark in the industry.30 It is designed to measure the performance of virtualized servers using a collection of sub-tests derived from benchmarks developed by the Standard Performance Evaluation Corporation (SPEC). VMmark test workloads include: Database server, Mail server, Java server, Web server (using a version of SPECweb2005), File server, and a Standby (idle) server. The unit of server work measured is called a
Figure 7: TPC-C Simulated Workflow

Transaction Mix Database Server
New Order (45%)
Payment (43%)
Order Status (5%)
Stock Level (5%)
Client System
Delivery (5%)
Sources: Transaction Processing Performance Council (TPC) Hewlett Packard, An overview of the TPC-C benchmark on HP ProLiant servers and server blades, August 2007.
Mid-Tier Transaction Servers With Queuing Application
Storage Subsystem
22
Tile. Each Tile represents one group of six virtual machines, each machine running one workload: (Figure 9) Mail server This workload simulates a mail server in a company data center. Java server This workload simulates Java performance, important in many multi-tiered enterprise applications. Web server This workload simulates Web server performance. A modified version of SPECweb2005 is used.31 Database server The database server workload simulates an online transaction workload, similar to a light version of TPC-C.32 File server This workload simulates the performance of a file server. File servers are computers responsible for the central storage and management of data files so that other devices on the same network can access the files.
Standby server The standby workload simulates a stand-by or idle server, used in computing environments to handle new workloads, or workloads with unusual peak load behavior. VMmark reports a performance metric for each workload, and the total number of Tiles the system is able to run within quality of service (QOS) requirements.33
3.2 Core and Edge Computing Model: Server Measurement Points

Figure 10 illustrates a traditional three-tier Web server configuration. The first tier, the Web presentation server, serves content to the client devices and users accessing the system. The application, for example, could be customers accessing their online banking accounts and conducting transactions. The second server tier, comprising the Web application server, processes requests made by the presentation server and sends back the appropriate responses. The Web server
Figure 8: SPECweb2005 Simulated Workflow

Client System
eCommerce
Request Type index search browse browse product line product detail customize1 customize2 customize3 cart login shipping billing confirm Total
Prime Client
Web Server
BESIM (Back End Simulator)
Source: Standard Performance Evaluation Corporation (SPEC).
Storage Subsystem
23
How Representative Are Industry Benchmarks?

Much of our work has gone into estimating the number of bytes processed in workloads simulated by industry benchmarks. Determining the correct byte equivalent of server transactions for simulated workloads, however, is tricky. Benchmark workloads are imperfect representations of workloads in real life. And benchmark engineers use a variety of tricks to optimize the performance of server hardware when setting up and running benchmark tests. This includes adding high performance hardware that would be economically if not technically infeasible in most work situations. Vendors themselves advise customers to view benchmarks as informative, but no substitute for customers conducting their own configuration and performance testing. We have attempted to account for the many factors as best we can. Our assumptions are subject to change based on improved methodology and better data.
may need to access the third server tier, the database server, to process these requests. The edge servers job is to pass requests and data (content) back and forth between the Web application server, the client devices and users. Each server tier may contain multiple processors, and the entire system could be configured in a traditional one-applicationper-server deployment, or in a virtualized server deployment. A scaled-up version of the three-tier model is illustrated in Figure 11. This figure illustrates a simplified model of a current enterprise computing environment. There are three IT system environments a Core high performance transaction processing environment, located in the companys main datacenters. An Edge applications and Web services environment surrounds the core - edge servers process workloads connecting the company and its business processes to business partners and to customers. Said to be sitting on the outside, or edge of the datacenter, edge servers are a mix of Web servers, firewall servers, file and print servers, storage servers and email servers in almost all companies they comprise the largest part of a companys IT infrastructure. Client devices are in user hands and attached to the edge. Client devices include all of the digital devices used by employees - mobile phones, notebook computers, storage devices and so forth.34
Figure 9: VMmark Simulated Workload

Six Virtual Machines = One VM Tile Virtual Machines
Mail Server
File Server
Standby Server
Test Server
Source: VMware
OLTP Database
Web Server
Java Order Entry
Notes: OLTP (online transaction processing) File Server and Standby Server not included in calculations.
24
Figure 10: Three-Tier Web Server Configuration

Edge Servers Web Presentation Server User Devices
Three-Tier Web Server Database Server Web Application Server
Storage Device
Figure 11 also illustrates our server measurement points and their corresponding benchmarks. Core OLTP transactions are measured by TPC-C; Web services applications are measured by SPECweb2005; and VM servers, which can be in either environment, are measured by VMmark. We do not measure application servers directly there is no single benchmark that addresses a representative subset of applications running on general-purpose servers in a typical company.35 While some fraction of Web-based application transactions are measured in VMmark and SPECweb2005, omitted are server transactions that support middleware, packaged software application suites such as PeopleSoft, SAP and business intelligence programs such as SAS. We would need to know how these programs scale with respect to a database transaction, or a Web eCommerce transaction, to estimate how their inclusion would affect our capacity and information calculations. We continue to research this area.
$ per measured transactions per minute = price of test server hardware divided by measured server performance To derive capacities, we need 1) the dollars spent for a specific server class, 2) benchmark tests that report the price-performance of the server class by year, and 3) the dollar value of servers allocated to a particular benchmark. As workload allocations by companies are not directly measurable, we rely on our own estimates, guided by expert interviews, industry data and our own judgment. Table 7 presents our workload allocations. We use these percentages in our capacity calculations. We estimate, for example, that just over a third of all server work processed in companies is made up of core database transactions. The remaining two-thirds of server work is processed on Web servers, with a quarter of that work virtualized.36 (Figure 12)
3.3 Server Workload Allocations

We estimate server capacity as the ratio of dollars spent over dollars per measured transactions per minute: World server capacity = $ spent / $ per measured transactions per minute Where: $ spent = dollars spent in a given calendar year on Entry-level, Midrange and High-end servers
Year
2004 2005 2006 2007 2008
Table 7: Estimated Workload Percentages by Year

TPC-C
35% 35% 37% 38% 40%
SPECweb2005
65% 65% 63% 37% 30%
VMmark
25% 30%
25
Figure 11: Core and Edge Computing Model
CORE
SPEC TPC-C
Web Application Servers Database Servers
PROCESSED OUTPUT
Edge Servers
Web Presentation Servers

PROCESSED
OUTPUT
Edge Servers Storage Devices Virtual Machines
VMmark
User Devices
PROCESSED OUTPUT
EDGE
Midrange servers show different capacity trends. If all midrange servers sold in 2004 processed only SPECweb2005 workloads, their core capacity was 26.9 billion transaction requests per minute. By 2008, the corresponding capacity for SPECweb2005 workloads was 274 billion transaction requests per minute, or better than a ten-fold increase in four years. Several factors could account for the much higher growth rate of midrange server capacity compared with that of entry-level servers. Factors include: a higher proportion of multiprocessor, multi-core midrange servers were sold (more processors and more cores positively affects benchmark performance); midrange server price-performance improved faster than entry-level server price performance; or midrange server test configurations may have been able to take greater advantage of the other resources in the test system positively affecting test performance. Our server capacity assumptions, methodology and calculations are complex and we will not attempt
3.4 World Server Capacities

Table 8 summarizes our calculations for server capacities by benchmark, by server class, by year. For example, referring to the top row of data, TPC-C, entry-level servers, the data entry for 2004, 80.8 billion transactions, can be interpreted as follows: if all entry-level servers sold in 2004 processed only TPC-C workloads, their maximum performance capacity would be 80.8 billion transactions per minute. By 2008, the corresponding TPC-C capacity was 392 billion transactions per minute, or slightly less than a fivefold increase in four years. In comparison, entrylevel servers processing SPECweb2005 workloads had a core capacity of 125 billion transaction requests per minute in 2004. By 2008, the corresponding capacity was 957 billion transaction requests per minute, an eight fold increase in four years. SPECweb2005 capacity, therefore, increased at roughly twice the rate of TPC-C capacity for entry-level servers from 2004 to 2008.
26
to explain them in this report. For interested readers, we have completed a background technical working paper which explains our key assumptions, describes our methodology in much greater detail, and gives sample calculations.37
Percentage of Core and Edge Measured Workloads
Figure 12: Estimated Workload Percentages 2004-2008

100% 90% 80% 70% 60% 50% 40% 30% 20% 0% 2004 2005 2006 2007 2008 35% 35% 37% 38% 40% 65% 65% 63% 37% 30%
SPEC
25%
30%
VM
4 WORLD SERVER INFORMATION

4.1 How Many Bytes?
World servers processed 9.57 zettabytes (9.57 1021 bytes) of information in 2008. The majority of this information (6.31 zettabytes) was processed by entry-level servers. Midrange servers processed 2.8 zettabytes, and high-end servers processed 451 exabytes of information. (Figure 13)
TPC
4.2 Contribution to Total Bytes

Table 9 breaks down total bytes by server class and benchmark workload. The respective contribution of each server type and each workload is given by the row and column percentages in the table.
Online transaction processing, measured by TPC-C, accounts for almost 45% of all server bytes processed in 2008. Web services and general computing, processed by entry-level and midrange servers and including bytes processed by virtual machines, account for the rest. Midrange and highend servers process a disproportionate share of total
Table 8: Server Potential Capacities 2004-2008

Capacity by Year of Purchase Benchmark Metric
tpmC (billions)
Class Server
Entry-level
2004
80.80 16.90 4.40 125.10 26.90 ---------
2005
98.10 19.20 4.30 148.00 53.80 ---------
2006
191.10 35.30 4.80 609.50 83.90 ---------
2007
184.00 73.30 7.50 848.00 175.20 --296.10 56.50 ---
2008
392.30 108.00 10.10 956.70 274.00 --523.30 76.70 ---
TPC-C
Midrange High-end Entry-level
SPEC
rpmSPEC (billions)
Midrange High-end Entry-level
VMmark
apmVM (billions)
Midrange High-end
NOTES: Each number is transaction capacity if all class servers were performing a single benchmark. tpmC = transactions per minute C
rpmSPEC = requests per minute SPEC apmVM = actions per minute VMmark --- = Workload not allocated to server class, or no benchmark test available
27
Figure 13: World Server Information Summary 2008

451 exabytes
Entry-level
2.8 zettabytes 6.31 zettabytes
Midrange High-end
there are inherent uncertainties in many of our assumptions, and they are subject to change based upon improved methodology and data. Interested readers should consult our background technical working paper for details on key assumptions, our methodology for converting benchmark transactions into bytes, and for a complete description of how we derived total 2008 server bytes.
4.3 Our Results in Context: Comparisons with Other Information Studies

Our total of 9.57 zettabytes is many times greater than findings from previous studies. For example, in their 2003 How Much Information? study, Peter Lyman and Hal Varian reported that the total volume of information flowing through electronic channels worldwide telephone, radio, TV and the Internet contained almost 18 exabytes of new information in 2002. Their total is three orders of magnitude less than our total calculated for just enterprise information why? The key was their focus on counting new information - Lyman and Varian did not count digital copies nor did they attempt to count the amount of data processing that was necessary to produce new information. They counted just what they could identify as a new or unique piece of information for example, the first airing of a radio or television show, a telephone call, or the first print copy of a book.38 Our results also differ significantly from current industry studies. In 2007 IDC and EMC reported that the total digital universe what they defined as information that is either created, captured, or replicated in digital form - was 281 exabytes.39 They hypothesized that by 2010 70% of the digital universe would be created by individuals, but enterprises would have responsibility for 85% of it. IDC updated their findings in 2010, estimating that by year-end the total digital universe would grow to1.2 zettabytes, or about 5 times their 2007 total. IDC calculated a compound annual growth rate of approximately 60% between the years 2007 and 2010. 40 Why are there such large differences in our information totals from these earlier studies? There are many possible factors, including differences in definitions, what sources of information are included and not included in the totals, among others. Most important is our use of a broader, more
Total Server Information in 2008: 9.57 zettabytes bytes high-end servers make up only about twotenths of one percent of all installed servers (0.22%), but process 5% of total annual bytes. Midrange servers, far more numerous, make up approximately 5% of all installed servers, and process 29% of total annual bytes. In contrast, the ubiquitous entry-level servers make up over 94% of all the installed servers in the world, and process two-thirds of all the bytes. (Figure 14) The magnitude of OLTP transaction processing almost half of all bytes reflects the growing importance of workload specific computing in recent years. The model reverses the general purpose computing model that has dominated enterprise computing for over a decade. We discuss workload specific computing and other important computing trends in Section 5.
How much is 9.57 zettabytes?

How much is 9.57 zettabytes? There are about 2.5 megabytes in Stephen Kings longest novel, so he would have to stack his novels from here to Neptune and back about 20 times to equal one year of server information. Each of the worlds 3.2 billion workers would have to read through a stack of books 58 km (36 miles) long each year.
Our byte totals are based on assumptions about how servers actually work in firms how servers are allocated and what workloads are assigned to them, how they are utilized and what load factors are typical. We have based our assumptions on the best available data at the time of our study. However,
28
Table 9: Contribution to World Server Information 2008

Workload Contribution By Server Class (in zettabytes) Entry-level
TPC-C SPEC VMmark Column Total % by Server Class 2.58 1.66 2.07 6.31 66.00%
Midrange
1.21 0.89 0.69 2.8 29.30%
High-end
0.45
Row Total
4.24 2.55 2.77
% by Workload
44.40% 26.70% 29.00% 100%
0.45 4.70%
9.57 100%
Grand Total = 9.57 zettabytes = 9.57 * 10 21 bytes
inclusive definition of information. We included estimates of the amount of data processed as input and delivered by servers as output. This is a different emphasis than estimating the amount of stored data (data at rest) or counting the first instance of new information being created (the first airing of a radio program, or the first release of a new television show).
4.4 Discussion: Estimating Enterprise Information from Servers

If there is one major surprise in our analysis, it is the sheer volume of information processed by servers. Using a more detailed methodology, we found much greater volumes of information is in flow in enterprises than has been previously documented. Moreover, our data also points to the continuing growth in information processing capacity and in performance per thousand dollars of server cost. Capacity trends appear to track the popular interpretation of Moores Law namely, that the delivered price performance of computing systems doubles roughly every 1.5 to 2 years. While our analysis develops a composite picture of server workload processing in firms, we only look at servers. We have yet to add other important sources such as voice communication that does not go through a server.
5 TRENDS, PERSPECTIVES AND CHALLENGES IN INFORMATION INTENSIVE COMPUTING

5.1 Measuring Capacity and Performance in Emerging Information Architectures
Our methodology for measuring enterprise server information asks, how much work do servers actually do? We calculated the amount of work processed and reported our results in capacity per dollar rather than capacity per server. This is an important transformation, and we made it for a number of reasons. First, our analysis uses benchmark tests that are consistent across server types and over the time period covered in our analysis (2004 to 2009). The same TPC-C workload, for example, used to test an entry-level server in 2005 was used to test a midrange or high-end server in 2007. Second, we calculated server capacity per dollar in order to apply a common yardstick across all server classes, server ages and workloads. Third, our method for calculating server capacities yields a theoretical upper limit of performance the number of bytes processed per minute or per hour calculated from the benchmark results. Benchmark engineers tweak their systems for maximum performance. Benchmark results, therefore, need
29
Figure 14: Contribution to World Server Information 2008

Zettabytes
3.0 2.5 2.0 1.5 1.0 .5 0
and with all types of work being completed in firms. We will continue to research a more extensive capacity measure - one including additional IT equipment such as storage and network devices.
5.2 Data Intensive Computing Platforms

Our results help confirm the magnitude of big data challenges in industry today. Server capacities are roughly doubling every other year, driving similar growth rates in stored data and network capacity. The many challenges in data intensive computing are driven by increasing data volumes, the need for integration of ever increasing sources of heterogeneous data, and the need for rapid processing of data to support data-intensive decision-making. Many firms are finding it necessary to rethink their approaches to corporate IT for economies of scale and in view of new industry initiatives in cloud computing and green datacenters. Massively parallel computing systems for data intensive computing now come in many different forms, including multiple core systems, large memory systems, shared nothing platforms, and software optimized, custom hardware systems. Many of the same factors - big data to cost of ownership - are putting pressure in the direction of more centralization of IT resources to address burgeoning capacity and data management issues. Increasingly, companies are confronted by a new set of technical, social, and business model issues in emerging information environments. Platforms for data intensive computing must be balanced in terms of their I/O capability, memory capacity, processing speeds, and the bandwidth of interconnects that link the corresponding components. While I/O bandwidth may be an important issue for some types of data intensive computing problems, in other cases the critical aspect is the I/O transaction rate that can be sustained. Three major types of architecture platforms are of interest to data intensive computing applications. We note them below and provide examples in high-performance and industry computing environments: Large memory systems. Very large memory systems can store very large data structures in memory and reduce or remove disk latencies during processing and also greatly improve the performance of applications that exhibit random I/O accesses. Hardware and software
Entry-level
VM
Midrange
SPEC
Server Class
High-end TPC-C
Benchmark
Grand Total = 9.57 1021 bytes to be weighted by average server load factors and available hours of use that approximate how an average server works in a company. We estimated server loads in practical terms how hard the servers get used based on our own data and judgment, and vetted our estimates with industry experts. Our load estimates do not reduce to a single measure of CPU, or memory, or I/O utilization. Rather, they are estimates of average utilization relative to the benchmarks and workloads used in our calculations. Our capacity per dollar measure was especially helpful when confronting the practicalities of estimating total server information. Enterprise computing environments are context-specific. Computing and application workloads vary widely across firms even among those of similar company size and industry sector. To make sense out of a complex environment, it was necessary to define a common yardstick that we could use with all servers
30
vendors are already exploiting solid-state disks (SSDs) and investigating other largescale memory technologies. For example, the National Science Foundations (NSF) next supercomputer, called Gordon, is designed to fuse traditional High Performance Computing (HPC) with HPD, or High Performance Data processing. When fully configured and deployed, Gordon will feature 245 teraflops of total compute power (one teraflop or TF equals a trillion calculations per second), 64 terabytes (TB) of DRAM (digital random access memory), and 256 TB of flash memory, about one-quarter of a petabyte.41 Trends in large memory system deployments and processing are not addressed in our current analysis. Shared-nothing platforms, which consist of multiple independent nodes in parallel, have been prevalent since the mid-1980s as viable architectures for scalable, highly data parallel processing. More recently, sharednothing systems using commodity hardware have proven effective for massive scale, data parallel applications, such as web indexing by Google. Systems of this type, with thousands of nodes, are in use at large, Internet-scale businesses. Our current analysis does not address the special cases of Google, Microsoft, and other Internet-scale businesses, either in server counts or in estimating capacities using workload analyses. While their inclusion would almost certainly not change the order of magnitude of our world analysis, an analysis of US enterprise server information would require further investigation. Also, shared-nothing platforms are a key component in cloud computing environments. There is a strong possibility that cloud computing will become an important, if not key part of the solution for enterprise computing in the future. Therefore, incorporating shared-nothing architectures into our analysis will help address a significant component of enterprise information. Database Machines. Another area that is re-emerging is that of computer architectures designed for database intensive computing. The legacy in this area reaches all the way back almost 30 years ago to the field of database machines and other specialized hardware designs to support specific classes of database applications. A number of factors, ranging
from the need to efficiently deal with the data deluge, relative flexibility and improved costs of hardware design and fabrication, and the need for energy conservation are leading towards a re-examination of architectures, with the goals of systems optimized for massive data processing. In the past, the area of database machines led to the development of hardware/software systems such as Teradata, and to parallel software systems like IBM DB2 Parallel Edition, a shared-nothing commercial database system.42 Currently, the release of Oracles Exadata Database Machine points in the direction of specialized hardware design and optimization attuned to extreme database performance.43 In highperformance computing environments, some scientific experiments that expect to generate very large amounts of data are investigating hardware embedding of processing algorithms to deal with the continuous data rates from high-resolution instruments.44 In scientific computing and large-scale data warehouses, it may soon become necessary to think of provisioning datasets with hardware. The dataset becomes the first order object with the computing platform dependent on it, rather than the current practice, which is the reverse.
5.3 Back to the Future: Data Discovery, Data Generation, Data Preservation
Data intensive computing is about the data, and necessarily requires a deep engagement between the business users on the one hand (sales managers, supply chain managers, financial analysts, etc.), and IT and technical experts on the other (the firms IT professionals, technical specialists in vendor companies, etc.). This alignment does not happen without the required investments in time and resources made by senior business, line management and IT management. With the vast proliferation of available data, there is increasing need for innovative search techniques that assist users with data discovery. It should be possible, for example, for users to specify the type of data they are looking for and have a system respond with useful results as well as recommendations for guiding the next search step. Current business intelligence and general search applications do not
31
provide this kind of capability. Another example would be the increasing need for integration of very heterogeneous data, given the need for addressing complex issues. In medicine something as conceptually simple as a lifetime personal medical chart, or a database of all test results for a family, would be examples. Traditional methods of data integration require significant manual intervention to actually integrate the data (e.g. by creating integrated database views, etc) and do not scale. Novel tools and techniques are needed to facilitate such integration. One approach is referred to as ad hoc data integration, an approach that allows users to control, on-the-fly, which data are to be integrated. However, this approach requires a significant semantic infrastructure to be in place, and that is very rarely the case. Finally, a longer-term issue in both enterprise and research environments is data archiving and digital data preservation. In research settings, preserving scientific data is generally deemed to have intrinsic value. In enterprise settings, business policy and statutory regulations require preservation of data for a number of years after use. Federal agencies including NIH and NSF have recently announced needs for major data plans to address issues of data archiving and data preservation. And longterm data archiving and data preservation is a growing challenge for business organizations, beyond current retention policies - typically seven years. There are many industries financial services, insurance, exploration and geological sciences, engineering or entertainment where arbitrary data age limits make little sense. We have not addressed data and information storage in this analysis, but we will do so in the future. The issues are complex involving technical as well as policy considerations. Nonetheless, in the future digital data archiving and preservation will require as much enthusiasm in research and in industry settings as we have provided to data generation and data processing.
32
APPENDIX
Counting Very Large Numbers
Byte (B) Kilobyte (KB) Megabyte (MB) = = = 1 byte 10 bytes
3
= = =
1 1,000 1,000,000
= = =
One character of text One page of text One small photo One hour of High-Definition video, recorded on a digital video camera at its highest quality setting, is approximately 7 Gigabytes The largest consumer hard drive in 2008 AT&T carried about 18.7 Petabytes of data traffic on an average business day in 2008
106 bytes
Gigabyte (GB)
109 bytes
1,000,000,000
Terabyte (TB)
1012 bytes
1,000,000,000,000
Petabyte (PB)
1015 bytes
1,000,000,000,000,000
Exabyte (EB) Zettabyte (ZB)
= =
1018 bytes 1021 bytes
= =
1,000,000,000,000,000,000 1,000,000,000,000,000,000,000
Approximately all of the hard drives in home computers in Minnesota, which has a population of 5.1M
33
ENDNOTES
1. In December of 2007, Steven Lohr of The New York Times asked
whether information overload was a $650 billion drag on the US economy, citing estimates he found in analyst and industry sources. Steven Lohr, Is Information Overload a $650 Billion Drag on the Economy? The New York Times, Bits, December 20, 2007.
10. Our definition of enterprise information is broader than that found
2. Roger E. Bohn and James E. Short, How Much Information? 2009
Report on American Consumers, Global Information Industry Center (GIIC), University of California San Diego, December 2009.
http://hmi.ucsd.edu/howmuchinfo.php
3. We do not include computer, communications or disk overhead in our
calculations. By overhead, we refer to the amount of processing resources used by system software, such as the operating system, transaction processing (TP) monitor or database manager. In communications, data that is not part of the user data, but is stored or transmitted with it. Examples would include data for error checking, channel separation, or addressing information.
4. The World Factbook, U.S. Central Intelligence Agency, available at:
in most studies of organizational information. Classically, organizational information has been defined by measuring the communications volume in a given firm, or by estimating the amount of stored data in the organization. Lyman and Varian did not separate out enterprise information in their 2000 and 2003 reports. Rather, they estimated the amount of new information stored annually on digital storage devices, and added to that the growth of printed office documents. Their total was 18.4 exabytes. See Lyman and Varian, Section IX B.2. Copies of Information Stored/Published on Hard Drives - Accumulated Stock, in Peter Lyman and Hal Varian, How Much Information? 2003 Report, University of California, Berkeley.
11. IDCs Server Taxonomy reports server sales in three price ranges:
International Labor Organization, Labor Force Statistics, available at:
https://www.cia.gov/library/publications/the-world-factbook/ geos/xx.html http://laborsta.ilo.org/STP/guest
5. Dun&Bradstreet D-U-N-S registration listed 151 million business in
its Worldbase global commercial database in 2008. In 2010 the current number is 160 million. Available at:
Entry-level servers (servers priced less than $25,000), Midrange servers ($25,000 to $499,999), and High-end servers ($500,000 or more). The revenue data is stated as factory revenue for a server system. Factory revenue represents those dollars recognized by multi-user system and server vendors for units sold through direct and indirect channels and includes the following embedded components: Frame or cabinet and all cables, processors, memory, communications boards, operating system software, other bundled software and initial internal and external disks in the shipped server. Note that IDC publishes server data using alternate price categories. We have used the price ranges defined here to be consistent with previous research published by the EPAs Energy Star Program, Report to Congress on Server and Data Center Energy Efficiency Public Law 109-431, August 2, 2007 Available at:
http://www.dnb.co.uk/about-dnb.asp http://www.dnb.co.uk/dnb-database.asp
Source: IDC Worldwide Quarterly Server Tracker. Web Link:
http://www.energystar.gov/ia/partners/prod_development/ downloads/EPA_Datacenter_Report_Congress_Final1.pdf http://www.idc.com/getdoc.jsp?containerId=IDC_P348
6. For the purposes of this study, we distinguish between data and
information information is a subset of data. Data is created by machines, such as sensors, barcode readers or computer keyboards, and transformed by other machines, such as cable routers (location change), storage devices (time shift), and computers (symbol and meaning change). See Sections 1.1-1.3 in Roger E. Bohn and James E. Short, How Much Information? 2009 Report on American Consumers, Global Information Industry Center, University of California San Diego, December 2009.
7. Researchers have defined data in motion as kinetic data, using
12. John Gantz et. al., The Diverse and Exploding Digital Universe: An
Updated Forecast of Worldwide Information Growth Through 2011, IDC White Paper, March 2008.
13. IDC describes its methodology as:
the analogy between kinetic and potential energy. Kinetic data, or data moving around the firm, is accessible for use by many applications and therefore can provide greater value. In contrast, stored data (or data at rest) has potential information value. But it first must be made kinetic and processed in an application to deliver value to users.
8. Drucker defined information as data endowed with relevance and
purpose. Peter F. Drucker, The Coming of the New Organization, Harvard Business Review 66 (January-February 1988), pp. 45-53.
Develop a forecast for the installed base of devices or applications that could capture or create digital information. Estimate how many units of information files, images, songs, minutes of video, phone calls, packets of information were created in a year. Convert these units to megabytes using assumptions about resolutions, digital conversion rates, and usage patterns. Estimate the number of times a unit of information is replicated, either to share or store. The latter can be a small number, for example, the number of spreadsheets shared, or a large number, such as the number of movies written onto DVDs or songs uploaded onto a peer-to-peer network. A complete presentation is at: http://www.emc.com/collateral/ analyst-reports/expanding-digital-idc-white-paper.pdf
9. Davenport reviews a number of academic and applied definitions
of information in Chapter 1, Information and Its Discontents: An Introduction, in Thomas H. Davenport, Information Ecology (New York: Oxford University Press, 1997).
34
14. Google does not release public information on the number of
gartner.html
company servers. The 1 million machine estimate was published in a Gartner research briefing on 26 June 2007. Gartner Research Brief, Look Beyond Googles Plan to Become Carbon Neutral, ID Number: G00149834, Publication Date: 26 June 2007. See also Google: One Million Servers and Counting at http://www.pandia.com/sew/48115. None of the large, public Internet companies report information on
explosion/datacenter-challenges-include-social-networks-risingenergy-costs-614
David Cappuccio: Rising use of social networks, rising energy costs, and a need to understand new technologies such as virtualization and cloud computing are among the top issues IT leaders face in the evolving datacenter the 650 percent enterprise data growth over the next five years poses a major challenge, in part because 80 percent of the new data will be unstructured. See http://www.infoworld.com/d/data-
their installed server base or datacenters. However, there is an active blogsphere with estimates popping up frequently. Some may even be right. Examples: See http://www.idg.no/bransje/bransjenyheter/ article57876.ece; http://www.pandia.com/sew/481-gartner.html
23. There are many different approaches to server performance analysis.
Worldwide Quarterly Server Tracker, information at <http://www.idc. com/getdoc.jsp?containerId=IDC_P348. Gartner releases public data on its quarterly market review through Gartner Press Releases, at
16. Both IDC and Gartner track server information. IDC publishes the
http://na2.www.gartner.com/it/section.jsp?type=press_releases& format=xhtml&year=2010&show_archived=true. IDC tracks the
installed base of servers using vendor data on shipments and equipment lifetimes derived from vendor reports and market surveys. Equipment lifetimes are based on an analysis of server service contracts and other information. The data includes both enterprise and scientific (HPC or high performance) servers, and excludes server upgrades.
17. We reference data collected by Koomey and associates and published
in the EPA EnergyStar Server Report as background. We do not use this data in our calculations. See Report to Congress on Server and Data Center Energy Efficiency: Public Law 109-431, U.S. Environmental Protection Agency ENERGY STAR Program, August 2, 2007. Also see Jonathan Koomey, Estimating Total Power Consumption by Servers in the U.S. and the World, Final Report, February 15, 2007.
18. IDC Quarterly Server Tracker, at http://www.idc.com Note that
Server performance has been analyzed at the CPU level, at the system level, and at the applications level. While the application level is closest to the tasks that users are actually performing, data at this level are the hardest to measure and results to generalize. The following references have examples using different metrics and workloads: Martin Pinzger, Automated Web Performance Analysis, with a Special Focus on Prediction, Proceedings of iiWAS2008, November 24-26, 2008, Linz, Austria; Gaurav Banga and Peter Druschel, Measuring the capacity of a web server, in USENIX Symposium on Internet Technologies and Systems, pages 61-71, Monterey, CA, December 1997; Alaa R. Alameldeen, Carl J. Mauer, Min Xu, Pacia J. Harper, Milo M.K. Martin, Daniel J. Sorin, Mark D. Hill and David A. Wood, Evaluating Nondeterministic Multi-threaded Commercial Workloads, Proceedings of the Computer Architecture Evaluation using Commercial Workloads (CAECW-02) February 2, 2002; Paul Barford and Mark Crovella, Generating Representative Web Workloads for Network and Server Performance Evaluation, SIGMETRICS 96 Madison, WI, USA; Pradeep Padala, Xiaoyun Zhu, Zhikui Wang, Sharad Singhal, Kang G. Shin, Performance Evaluation of Virtualization Technologies for Server Consolidation, HP Laboratories Palo Alto, HPL-2007-59, April 11, 2007; David Mosberger and Tai Jin, httperfm: A Tool for Measuring Web Server Performance, HP Research Labs, Hewlett-Packard Co. Palo Alto CA 94304; and Henning Schulzrinne, Sankaran Narayanan, Jonathan Lennox, and Michael Doyle, SIPstone - Benchmarking SIP Server Performance, Working Paper, Columbia University 2002.
24. Jonathan G. Koomey, Christian Belady, Michael Patterson, Anthony
IDC and Gartner report worldwide server sales slightly differently. IDC reports factory revenue. Gartner adds revenue from the distributor channel to factory revenue. We have used IDC data for consistency.
19. Ibid. IDC Worldwide Quarterly Server Tracker. 20. CNET News, Google uncloaks once-secret server, April 1, 2009.
Santos, and Klaus-Dieter Lange, Assessing Trends Over Time in Performance, Cost, and Energy Use for Servers, Final Report, August 17, 2009. Section Data and Methods.
25. Our complete doubling time analysis is presented in a background
http://news.cnet.com/8301-1001_3-10209580-92.html
21. Nielsenwire, Twitters Tweet Smell of Success, March 18, 2009. RANK Site Feb 08 Feb 09 % growth
technical working paper that accompanies this report. See James E. Short, Roger E. Bohn and Chaitan Baru, How Much Information? 2010, Report on Enterprise Information, Background Technical Working Paper No. 01, November 2010.
26. It is important to note the differences among the benchmarks in
1 2 3
Twitter.com Zimbio Facebook
475,000 809,000 20,043,000
7,038,000 2,752,000 65,704,000
1382% 240% 228%
http://blog.nielsen.com/nielsenwire/online_mobile/twitters-tweetsmell-of-success/
22. Jon Brodkin, Datacenter challenges include social networks, rising
energy costs: Researcher warns that projected 650 percent growth in enterprise data over the next five years poses a major challenge to IT leaders, InfoWorld, December 2, 2009. Brodkin quotes Gartner analyst
how the actual tests are run: TPC-C tests a single workload, OLTP database transactions, and reports a single performance metric, tpmC. SPECweb2005 tests three workloads, Banking, Ecommerce and Support. SPEC runs each test consecutively and reports results for each workload, and a weighted, total result. VMmark tests six workloads, all running concurrently. Results are published for each workload, and for the number of tiles the test system can successfully process meeting QOS requirements. By definition, VMmark benchmarks running concurrently would not be able to achieve comparable results if they were run individually on a test system where all processing and other system resources were devoted to a single workload.
35
27. The TPC-C benchmark was approved in July 1992. The first result
was published in September 1992. The test system recorded a transaction per minute C of 54 (tpmC), and a cost per tpmC of $188,562. This compares with a typical cost per tpmC today under $0.25. Since 1992, the TPC-C test specification has been updated, and for some test years results are not comparable. All test results used in this report are comparable using TPC-C software Version 5.
28. The TPC-C benchmark models a typical online transaction
and in large companies, there are many thousands of these devices. Progressively, client devices are becoming more and more dependent on the edge environment for computing resources and support. For example, for accessing data and applications, or for accessing email and voice communications. We do not address client devices in this analysis, but will do so in the future.
35. There is no industry-standard benchmark for application servers. The
processing (OLTP) environment. The benchmark simulates a large wholesale outlets inventory management system. TPC-C involves a mix of five concurrent transactions of different types and complexity either executed on-line or queued for deferred execution. The model consists of a number of warehouses, each with ten (or more) terminals representing point-of-sale or point-of-inquiry stations. Transactions are defined around entering and delivering orders, recording payments, checking the status of orders, and monitoring the stock level at the warehouses. Two transactions model behind-the scenes warehouse activity - the stocking level inquiry and the delivery transaction. The stocking level inquiry scans a warehouse inventory for items that are out of stock or nearly so. The delivery transaction collects orders and marks those that have been delivered. While the TPC-C benchmark portrays the activity of a wholesale supplier, the benchmark is not limited to the activity of any particular business segment. Instead, according to TPC, it represents any industry that must manage, sell, or distribute a product or service. See Raab, Walt K. and Amitabh, Shah, Overview of the TPC Benchmark C: The Order-Entry Benchmark at http://www.tpc.org/tpcc/detail.asp
29. The SPECweb2005 documentation page is available at www.spec.org A concise description of the benchmark is also available at
most-widely adopted application benchmark, SPECjAppServer2004, is a multi-tier benchmark for measuring the performance of Java 2 Enterprise Edition application servers. SPECjAppServer2004 has undergone several software revisions since its introduction in 2004, and current results are not compatible with results from previous software releases. As a result, there are an insufficient number of test system results for our purposes. Over time, however, as SPEC stabilizes the test software and the number of tests increase, we may be able to incorporate results from this benchmark. The great majority of performance testing in the applications area, however, is done by the vendors themselves for their customers. All of the major vendors Oracle, SAP, Teradata have elaborate configuration benchmarks to size customer systems. Results are customer specific. SAP for example, publishes Standard Performance Benchmarks for all of their major software products. Oracle and Teradata do the same thing. SAPs application benchmark page is at:
http://www.sap.com/solutions/benchmark/index.epx
36. Comparable benchmark data for all server classes does not exist for
https://sp.ts.fujitsu.com/dmsp/docs/benchmark_overview_ specweb2005.pdf
30. The VMmark documentation and FAQ page are at
http://www.vmmark.com/. A concise overview of VMmark is also available at ftp://ftp.compaq.com/pub/products/servers/ benchmarks/VMmark_Overview.pdf

31. VMmark uses a modified Ecommerce workload test from
all test years between 2004 and 2008. There are gaps, for example, in server models and machine configurations tested by TPC-C and those tested by SPECweb2005. We have made several adjustments to address machine comparability across test years. Second, while virtualization has been deployed in companies for years, VMmark began testing and publishing benchmark results in 2007. Therefore, 2007 is the first year we can include VMmark results in our calculations. Third, while we are not able to include application server measurements in our calculations at this time, some of the application server processing taking place is included in the TPC-C, SPECweb and VMmark test results all of these workloads are applications themselves. But we do not have direct measurements for some workloads that may be important, Decision Support / Business Intelligence for example. We continue to research this area.
37. See background technical working paper. 38. Lyman and Varian did not separate out information delivered at work
SPECweb2005.
32. Our technical working paper covers each of the benchmarks in
greater detail. See James E. Short, Roger E. Bohn and Chaitan Baru, How Much Information? 2010, Report on Enterprise Server Information, Background Technical Working Paper No. 01, November 2010.
33. VMmark can be set up to run in a single tile or in multiple tiles
from household or personal information received out of work. Peter Lyman and Hal R. Varian, How Much Information, 2003. Available at:
http://www.sims.berkeley.edu/how-much-info-2003
the decision is left to the test engineers. As a practical matter, engineers estimate the maximum number of tiles the test system will be able to run successfully, and start the test in that range. They then add one tile at a time until the test run fails. The test is then rerun with the maximum successful number of tiles recorded. VMmark test procedures are described in the benchmark technical discussion and FAQ page available at: http://www.vmmark.com
34. Client devices sit at the edge of the Edge computing environment.
39. John Gantz et. al., The Diverse and Exploding Digital Universe: An
Updated Forecast of Worldwide Information Growth Through 2011, IDC White Paper, March 2008.
40. John Gantz and David Reinsel, The Digital Universe Decade Are
You Ready? May 2010. http://www.emc.com/digital_universe
41. UC San Diego News Center, Whats Next for High Performance
Client devices include all of the digital devices that are in employee hands -mobile phones, notebook computers, storage devices and so on. Of course, much critical work activity takes place on client devices,
Computing: Fusing High-Performance Data with High-Performance Computing Will Speed Research, February 24, 2010. http://ucsdnews.
ucsd.edu/newsrel/supercomputer/02-24NextForHPC.asp
36
42. C. Baru, G. Copeland, G. Fecteau, A. Goyal, H. Hsiao, A. Jhingran,
S. Padmanabhan, and W. Wilson, DB2 Parallel Edition, IBM Systems Journal, April 1995.
43. The Oracle Exadata Database Machine is a package of 2 database
servers, 14 storage servers, optimized network switches and I/O bandwidth, and user rack storage capacities of 100 TB per rack. Oracle Datasheet: Oracle Exadata Database Machine X2-8.
http://www.oracle.com/ocom/groups/public/@otn/documents/ webcontent/173705.pdf
44. San Diego Supercomputer Center News, NSF Awards $20 Million to
SDSC to Develop Gordon, November 04, 2009. http://www.sdsc.edu/
News%20Items/PR110409_gordon.html
Global Information Industry Center UC San Diego 9500 Gilman Drive, Mail Code 0519 La Jolla, CA 92093-0519 http://hmi.ucsd.edu/howmuchinfo.php

HMI 2010 EnterpriseReport Jan 2011

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

HMI 2010 EnterpriseReport Jan 2011

Uploaded by

Copyright:

Available Formats

HMI?

How Much Information

Enterprise Server Information

Report on Enterprise Server Information

How Much Information? 2010

James E. Short Roger E. Bohn Chaitanya Baru

How Much Information? 2010 Report on Enterprise Server Information

HOW MUCH INFORMATION? 2010

How Much Information? 2010 Report on Enterprise Server Information

How Much Information? 2010 Report on Enterprise Server Information

Report on Enterprise Server Information

How Much Information? 2010

What does this report cover?

How Much Information? 2010 Report on Enterprise Server Information

Table 1: World Server Information

World 2008 Total

3.18 billion workers in world labor force

Bytes per worker per day

1.1 Data and Information

How Much Information? 2010 Report on Enterprise Server Information

1.2 What is Enterprise Information?

Table 2: Data and Information

Artificial signals intended to convey meaning

How Much Information? 2010 Report on Enterprise Server Information

Figure 1: Information Flows In An Enterprise

Web Application Servers Web Presentation Servers

red Sto flows a t da

red Sto flows a t da

Application Servers Web Servers

Store storag data on e devic es

Access applications and data at edge and core

1.3 Measuring Transaction Work

How Much Information? 2010 Report on Enterprise Server Information

Figure 2: The Flows We Are Interested In

Virtual Machines Storage Devices User Devices

How Much Information? 2010 Report on Enterprise Server Information

1.4 How Many Bytes?

Figure 3: Example Server Workloads

Online Transaction Processing (OLTP) Online Analytical Processing (OLAP)

Web Application Servers

Web Presentation Servers

How Much Information? 2010 Report on Enterprise Server Information

Table 3: World Server Information by Server Class 2008

How Much Information? 2010 Report on Enterprise Server Information

Measuring Information Value Instead of Quantity?

2 How Many Servers?

2.2 Counting Servers

2.1 Types of Servers

How Much Information? 2010 Report on Enterprise Server Information

Figure 4: Modern Computer Servers

2.3 World Server Sales

How Much Information? 2010 Report on Enterprise Server Information

Table 5: World Server Sales 2004-2008

How Much Information? 2010 Report on Enterprise Server Information

Where are Google, Microsoft and Yahoo! Servers In These Numbers?

3 World Server Capacity

Improvements in Server Performance: Doubling Time

How Much Information? 2010 Report on Enterprise Server Information

Figure 5: Improvements In Server Performance

3.1 Server Workloads

Doubling Time In Years (a shorter doubling time is better)

A New Way to Measure Capacity

How Much Information? 2010 Report on Enterprise Server Information

Figure 6: Schematic Of Modeling Approach