Professional Documents
Culture Documents
Date of Publication: January 2011 Last Update: December 2010 Website Publication: April 2011
TABLES AND FIGURES Figure 1: Information Flows In An Enterprise ................................................................................................................ 11 Figure 2: The Flows We Are Interested In ......................................................................................................................12 Figure 3: Example Server Workloads ..............................................................................................................................13 Figure 4: Modern Computer Servers ...............................................................................................................................16 Figure 5: Improvements In Server Performance ............................................................................................................19 Figure 6: Schematic Of Modeling Approach .................................................................................................................20 Figure 7: TPC-C Simulated Workflow ...........................................................................................................................21 Figure 8: SPECweb2005 Simulated Workflow ..............................................................................................................22 Figure 9: VMmark Simulated Workload..........................................................................................................................23 Figure 10: Three-Tier Web Server Configuration ...........................................................................................................24 Figure 11: Core and Edge Computing Model .................................................................................................................25 Figure 12: Estimated Workload Percentages 2004-2008 ...................................................................................................................26 Figure 13: World Server Information Summary 2008..........................................................................................................................27 Figure 14: Contribution to World Server Information 2008............................................................................................29
Table 1: World Server Information..................................................................................................................................... 9 Table 2: Data and Information.......................................................................................................................................... 10 Table 3: World Server Information by Server Class 2008 in Zettabytes ......................................................................... 14 Table 4: Installed Base, Shipments and Retirements of Servers for the World and U.S., 2000-2005............................. 17 Table 5: World Server Sales 2004-2008........................................................................................................................... 17 Table 6: Performance Benchmarks by Server Class by Server Workload ...................................................................... 20 Table 7: Estimated Workload Percentages by Year.......................................................................................................... 24 Table 8: Server Potential Capacities 2004-2008............................................................................................................. 26 Table 9: Contribution to World Server Information 2008 ............................................................................................... 28
ACKNOWLEDGEMENTS
This report is the product of industry and university collaboration. We are grateful for the support of our industry sponsors and university research partners. Financial support for the HMI? research program and the Global Information Industry Center is gratefully acknowledged. Our foundation and corporate sponsors are: Alfred P. Sloan Foundation AT&T Cisco Systems IBM Intel Corporation LSI Oracle Seagate Technology Special thanks for research and technical advice is extended to the following individuals: Richard Clarke, AT&T Clod Barrera, IBM Jeffrey Smits and Terry Yoshii, Intel Dieter Gawlick, Garret Swart and Thomas Oestreich, Oracle Dave Anderson, Brook Hartzell and Jeff Burke, Seagate Bruce Herndon, VMware The authors bear sole responsibility for the contents and conclusions of the report. Questions about the report may be addressed to the Global Information Industry Center at the School of International Relations and Pacific Studies, UC San Diego: Roger Bohn, Director rbohn@ucsd.edu Jim Short, Research Director jshort@ucsd.edu Pepper Lane, Program Coordinator pelane@ucsd.edu 858-534-1019 Press inquiries should be directed to Rex Graham, IR/PS Communications Director, ragraham@ucsd.edu (858) 534-5952 Center Website: http://hmi.ucsd.edu/howmuchinfo.php Report Design by Theresa Jackson, Orchard View Color: www.orchardviewcolor.com
Executive Summary
In 2008, the worlds servers processed 9.57 zettabytes of information, almost 10 to the 22nd power, or ten million million gigabytes. This was 12 gigabytes of information daily for the average worker, or about 3 terabytes of information per worker per year. The worlds companies on average processed 63 terabytes of information annually. Our estimates come from an analysis of the total work capacity of the installed base of computer servers in enterprises worldwide. Information through non-computer sources telephones or physical newspapers for example is not included. We define enterprise server information as the flows of data processed by servers as inputs plus the flows delivered by servers as outputs. A single chunk of information, such as an email message, may flow through multiple servers and be counted multiple times. Two-thirds of the worlds total of 9.57 zettabytes was processed by low-end, Entry-level servers costing $25,000 or less. The remaining third was processed by Midrange and High-end servers, those costing more than $25,000. Transaction processing workloads issuing an invoice, paying a bill, checking a stock level amounted to approximately 44% of all the bytes processed. Web services and office applications contributed the other 56%. Servers configured as virtual machines processed about half of all the bytes in Web services and office applications. We also conducted a separate analysis of improvements in server performance and capital cost. Midrange servers processing Web services and business application workloads doubled their performance per dollar in 1.5 years. Raw performance for this server class doubled approximately every 2 years. High-end servers processing transaction workloads had the longest doubling times: both performance/cost and raw server performance doubled approximately every 4 years. This report covers how much information was processed by the installed base of computer servers in companies worldwide in 2008. It complements an earlier report on information consumption, which estimated 3.6 zettabytes of information was consumed by American households in 2008. Later reports will cover storage systems and enterprise networks.
1 INTRODUCTION
Businesses today are awash with information and the data used to create it. Daily, managers are confronted with growing information volumes far greater than can possibly be consumed.1 Where is all of this data being created? What happens to it? How much information is being processed by computer servers in companies worldwide?
The goal of the How Much Information? Program is to create a census of the worlds data and information. How much information is created and consumed annually? What types of information is created? Who consumes it? And what happens to information after it is used? For our purposes, we distinguish between information that is created and used in organizations work information used for productive purposes and consumer information seen or heard by people not at work information created and used for consumption. We expand on Of course, the data processed by servers is not these definitions below. Last year we reported on all of the information that exists in any company consumer information for different media in and (although in most companies it is likely to be the outside the home, such as watching television, great majority of it). There is a wealth of paper playing computer games, going to the movies, documents in every organization, and there exists listening to the radio or talking on a cellular phone.2 Nationwide, we found that Americans spent approximately 11.8 This study covers how much information was processed and delivered by the installed hours viewing or listening to media on an average base of computer servers in enterprises worldwide in 2008. Server capacity measures day. They consumed 3.6 were derived from world sales and shipments data published by analyst firms Gartner and zettabytes of information IDC. Server performance was estimated using industry benchmarks published by the in 2008, or approximately Transaction Processing Performance Council (TPC), the Standard Performance Evaluation 34 gigabytes per person Corporation (SPEC), and VMware. We used industry standard benchmarks to define a per day. Our estimate is consistent measure of server work performed, and converted it into its byte equivalent. many times greater than totals from previous Performance data was taken from server test results submitted to benchmark standard studies. Why? In large bodies by hardware vendors. We adjusted this data based on the date of availability of the part the difference lies in test system and other factors. our use of a very inclusive definition of information - we measured the flow of information, not just the fraction of information many digital data storage devices, from personal that is retained. A zettabyte is 10 bytes, or 1,000 storage media DVDs, flash drives and the like - to billion gigabytes. See Appendix: Counting Very mammoth capacity, enterprise storage systems. Data Large Numbers. stored and archived on storage systems is defined as data at rest. Data at rest requires data processing Our second report conveys our findings for and output to constitute what we define as enterprise information. We define enterprise server information. Conversely, paper documents, records, information as the flows of data processed by image libraries and the like have long been defined as computer servers as inputs plus the flows delivered information in printed or image form, such as that
by servers as outputs.3 Note that our definitions of work and consumer information are different and complementary. Work information is data processed for productive use for example, to guide an immediate action, or to use as context for a future action. Consumer information is data processed and delivered for consumptive use to delight, to entertain, to enjoy. How much work information is processed by computer servers annually in companies worldwide? And why are we using computer servers to estimate the flow of data processed? Servers are the digital workhorses of the modern firm. Servers host the companys work applications, process the data flows, and manage the data traffic going in and out of the firms storage systems. Small companies may have tens of servers of varying sizes and capacities; large enterprises may have tens of thousands of servers. We do not include in our definition information that people may see or hear while at work that is not processed by servers.
stored in case files in a law library, or customer files archived for long-term storage in an outside storage facility. We will add these sources in the future.
Section 1 introduces our concepts and measurement methods. Section 2 looks at different types of servers and how to count them. Section 3 considers server workloads, charts server performance measured by industry benchmarks, and calculates world server capacity. Section 4 summarizes total annual server information and different contributions to that total.
Notes
3.01 terabytes
12.0 gigabytes
Section 5 discusses some interesting factors in enterprise information growth, server performance and data intensive computing platforms.
A few highlights from our findings (Table 1): 9.57 zettabytes of information was processed by servers in companies worldwide in 2008. That amounts to: 3.0 terabytes of information per worker per year, or 12 gigabytes per worker per day (based on the ILO and CIA Factbooks estimate of 3.18 billion people in the world labor force in 2008).4 63 terabytes of information per company per year (based on Dun & Bradstreets 151 million world businesses registered with D&BS D-UN-S system in 2008).5 Two thirds of the world total of 9.57 zettabytes of information was processed by low-end, entry-level servers costing less than $25,000 USD per machine. The remaining third was processed by midrange and high-end servers, costing between $25,000 and $500,000 (for midrange servers), and over $500,000 (for high-end servers). Our report is divided into five sections:
Data are collections of numbers, characters, images or other outputs from devices that represent physical quantities as artificial signals intended to convey meaning. Artificial because data is created by machines such as sensors, barcode readers, or computer keyboards. Digital data has the desirable properties that it is easy to capture, create, communicate and store: so easy in fact, that increasingly we are flooded by it. Information is a subset of data, data being the lowest level of abstraction from which information and knowledge are derived. In its most restrictive technical meaning, information can be thought of as an ordered sequence of signals.6 Information processing refers to the capacity of computers and other information technology (IT) machinery to process data into information. Past high-level studies of enterprise information have generally measured data of only two kinds: the data that gets stored on physical storage media, and communications data that is in flow, transmitted over local-area or wide-area networks in the firm.7 Unlike data, information has the further property that it must have meaning for its intended use.8 People define that meaning, whether it is the information required for an immediate decision or the collection of background information for a judgment or action to be taken in the future. The amount of human involvement increases as we move from a focus on data to one of information we store data on computers; we use computers to create and manage information. (Table 2)
10
Which Bytes?
Our analysis estimates the amount of enterprise information by counting the number of bytes processed and delivered to end users or to applications accessed by end users. Why bytes? And what is the relationship between bytes and information? We utilize a set of benchmarks (TPC, SPEC, VMmark) representative of enterprise workloads. We compute the total number of bytes delivered by deriving how many bytes are processed or delivered by transactions and applications defined within each of the standardized benchmarks. Total bytes are each benchmarks measure of how much work the server has performed. How well this definition of information matches information in a real enterprise environment depends upon how well the selected benchmarks represent the transaction and application work performed in companies.
typically trades off the number of requests the server must process, with the number of bytes that the server must transfer (disk I/Os). Virtual Machine processing. Virtualization is an important software technology deployed in most companies today. The basic principle is that virtualization allows a single physical machine to run multiple virtual machines, sharing the resources of the single machine across multiple applications. Different virtual machines can run different operating systems and applications on the same physical computer. We include virtual machine processing of multiple workloads in our analysis. Application processing. Our analysis includes some, but not all application processing done
Our byte total is based on the analysis of performance data from four standard industry workloads: Online transaction processing (OLTP). The OLTP workload processes clerical data entry and retrieval processes in a real time transaction processing environment. OLTP workloads require very high server performance, are optimized for a common set of transactions, and in large firms support thousands of concurrent users. Web server processing (WWW or Web). Web server workloads process documents (in the form of Web pages) to the Web clients requesting them a typical application would be a user searching for information using a Web browser. Web server performance
11
Proc ess devi ed data ces as in delivere d form ation to edge flow s
Edge Servers
Database Servers
Storage Devices
User Devices
ta da es t c pu vi In de rs cal e Us n lo o
on application servers. Examples of application server workloads not directly measured would include customer relationship management (CRM) and human resources management (HRM), and business analytics. We will add these sources in the future. Our definition of information emphasizes the flow of data processing and data outputs. We count all instances of data processing and every flow delivered as output. Our definition expands on many other definitions of enterprise information.9 An alternative approach, for example, could go to the opposite extreme: only counting data that is stored on some media somewhere in the firm - printed material, digital images or digital video whether that data is subsequently used or not.10
12
Web Services
Edge Servers
Virtual Machines
We divide this into the components: Total bytes per year = World server capacity, expressed in transactions per minute Bytes per transaction Annual load factor (hours of operation x fraction of full load) Where: World server capacity in transactions per minute = $ spent / $ per measured transactions per minute We focus on transactions as the unit of work performed by servers for both heuristic and practical reasons. Workload transactions are common to all enterprises regardless of company size, industry sector, or technology complement. All companies process orders, make payments, pick from inventory, and deliver products and services to their customers. In support of these activities,
company IT departments run application and email servers, manage file and print servers, and provide Web access and Web services. The diverse mix of transaction work performed by servers has been classified into workloads, shorthand for application-level transactions and data processed by servers. (Figure 3) The most important of these workloads have been simulated in benchmarks designed to test server performance and derive comparative price-performance measures. We use results from three of the most extensively applied industry benchmarks, TPC-C, SPECweb2005, and VMmark. Each benchmark, explained in Section 3, simulates one or more enterprise workloads and computes results for server transaction performance, which we convert into byte equivalents. All told, we analyzed price, performance and capacity data for over 250 servers tested from 2004 to 2009 using one or more of the benchmarks.
13
Midrange servers are machines costing less than $500,000. Server systems in this price range encompass many configurations, including multi-core, multi-processor tower systems, blade servers and small mainframe servers. Midrange servers are housed in server
Web Servers:
Database Servers:
Database Servers
Edge Servers
Application Servers
Web Services (Search) Email and Messaging Firewall and Security File and Print
Edge Servers:
Web Servers
Virtual Machines
Web Services and eCommerce Application Processing File and Print Email and Messaging Customer Relationship Management Financial Management Supply Chain Management
Virtual Machines:
Application Servers:
market into three price ranges, according to the price (stated as factory revenue) of the manufacturers entry-level system in each price range.11 Entry-level servers are machines priced less than $25,000. Servers sold in this category in 2008 were dual core, single or dual processor machines with a minimum of frills typically they would be deployed in non-critical business application areas, configured for low-cost general computing. An example workload would be a file and print server.
closets, server rooms and company datacenters, and run a diverse mix of workloads, including transaction processing, Web services and email, online transaction processing and virtual machines. More expensive midrange servers would be deployed in medium and critical business application areas and would be managed by professional IT staff. High-end servers are machines costing over $500,000. These systems are large, complex, multi-core, multi-processor mainframe servers
14
Computer Transactions
In computer programming, a transaction is an activity or request involving a sequence of data exchange and data processing. In a database management system, transactions are sequences of operations that read or write database elements. Orders, purchases, changes, additions and deletions are typical business transactions. An example of an order-entry transaction would be a catalog merchandise order phoned in by a customer and entered into a computer by a telephone sales representative. The order transaction involves checking an inventory database, confirming that the item is available, placing the order, confirming that the order has been successfully placed, and advising the customer of the expected time of shipment. As a rule, the entire sequence is viewed as a single transaction, and all of the steps must be completed before the transaction is successful and the database is updated.
located in mid-tier and corporate datacenters. They almost always are deployed exclusively to business critical application workloads where very high performance and very high reliability are required. Example workloads are online transaction processing (OLTP) and online analytic processing (OLAP). Much of our research has gone into estimating the amount of information processed by each server class for the installed base of servers worldwide in 2008. For this measure of information we used bytes the number of bytes processed as input and bytes delivered as output. When measured in bytes, our results show that servers are processing an enormous quantity of information work in firms today. Entry-level servers (lower performance but far more numerous) processed 6.31 zettabytes of information in 2008, 66 percent of all enterprise information created worldwide (Table 3). Midrange servers processed 2.8 zettabytes or approximately 29 percent. Highend servers processed 451 exabytes of enterprise information, or approximately 4.7 percent of the total. Our estimate of 9.57 zettabytes is many times greater than that found in previous studies. A March 2008 study by IDC reported that the total worldwide digital universe in 2007 was approximately 281 exabytes, and would not reach one zettabyte until 2010.12 According to IDC, companies created, captured or replicated about 35% of the total digital universe, approximately 14 exabytes, coming directly from servers in corporate datacenters. Why is there such a huge discrepancy in our numbers? Potentially there are many possible factors, and IDC probably did not include throughput data (data processed and output) in its estimates.13 We comment further on how our results compare with other information studies later in this report.
Midrange
2.80
High-end
.451
Total
9.57
Server Workloads
A server workload is the amount of work that a server produces or can produce in a specified period of time. But what do we mean by work? Technically, workload refers to both the request stream presented by clients (the work) as well as the server response to the requests (the load). An example would be users submitting requests to a Web server to search and display Web pages showing product information for an eCommerce purchase. How quickly the server responds to the requests defines the load. Typical server workloads would include eCommerce transactions, database transactions, Web server and email server transactions, and file-and-print. We analyzed server performance data for ten simulated enterprise workloads.
15
a Web site. A shared server may be accessed by multiple users for multiple purposes. Dedicated servers are more common in larger businesses where computing resources are customized for specific needs. Dedicated servers provide faster data access, allow higher network traffic rates, and can be more closely controlled (for example, for performance, security, or backup). Servers are configured by the work tasks they perform. For our purposes, we are most interested in the following server types: Database Server: A computer server dedicated to database storage and retrieval. The database server holds the database management system and accesses the companys data storage devices. Application Server: Application servers are dedicated to running one or more software applications. Web Server: Web servers host internal or external Web sites, serving Web pages back and forth to users. Mail Server: Mail servers host the companys email system. File Server: File servers house applications that are configured to send and receive files within applications. Think of them as a superset of the server types above. A database server may be part of a file server, for example. Server benchmark testing is organized by server type and by simulated enterprise workload. Benchmark results are used in industry to compare the performance and price-performance of different servers and workloads. For example, the Transaction Processing Performance Council benchmarks database servers using the TPC-C benchmark. Since 2001, over 250 performance tests of database servers have been conducted. As we will explain, we make use of TPC-C and other server benchmarks to compute the amount of server work performed.
16
We report in Table 4 data on the installed base of servers published by Koomey (2007) for the years 2000-2005.17 The estimated total installed base of servers worldwide, including shipments and retirements, is shown in Table 4. Entry-level servers dominate the installed base, representing over 90% of the total number of servers worldwide on a unit basis. Midrange servers comprise most of the rest. Highend servers represent only a few tenths of one percent of the total number of servers on a unit basis. Depending on the server class chosen, the U.S has about 30 to 40 percent of the servers in the world.
Our analysis has relied on annual and quarterly sales data published by IDC and Gartner Group (all data in U.S. dollars). We used IDCs publicly released data for annual worldwide sales and quarterly, year-over-year percentage increases or decreases in sales for three classes of servers entry-level, midrange, and high-end, as our baseline dataset.18 IDC reported total worldwide server sales were $53.3 billion dollars in 2008. (Table 5) Shipments came in at just over 8.1 million units. Entry-level server sales were $29.3 billion; midrange sales were $11.7 billion, and high-end server sales were $12.3 billion. Reflecting recession effects, the market contracted 14 percent in the final quarter of 2008, to $13.5 billion.19 Worldwide server unit shipments declined 12 percent compared to the same quarter in 2007. Overall, the 2008 market declined approximately 3.3% to $53.3 billion dollars. Unit shipments, however, grew slightly to 8.1 million units. Table 5 presents annual sales in U.S. dollars for all server classes for the years 2004-2008. The
17
Table 4: Installed Base, Shipments and Retirements of Servers for the World and U.S., 2000-2005 (in 000s)
World Totals Year
2000 2001 Installed Base 2002 2003 2004 2005 2000 2001 Shipments 2002 2003 2004 2005 2000 2001 Retirements 2002 2003 2004 2005
U.S. Total
14,114 17,555 18,492 20,125 24,746 27,282 4,223 4,198 4,397 5,237 6,275 7,017 1,905 757 3,461 3,603 1,655 4,481
Entry-level
12,240 15,596 16,750 18,523 23,441 25,959 3,926 3,981 4,184 5,017 6,083 6,822 1,631 626 3,030 3,243 1,165 4,304
Midrange
1,808 1,890 1,683 1,540 1,238 1,264 283 206 204 211 184 187 264 125 411 355 485 161
High-end
65.6 69.1 59 62.3 66 59.4 13 10.4 9.4 8.8 8.6 8.5 10 6.9 19.6 5.5 4.9 15.1
Entry-level
4,927 5,907 6,768 7,578 8,658 9,897 1,659 1,492 1,714 2,069 2,517 2,721 300 513 853 1,259 1,437 1,482
Midrange
663 701 574 530 432 387 111 66 67 76 53 62 116 28 194 120 151 106
High-end
23 22.5 23.1 21.4 23.3 22.2 4.8 3.6 3.1 2.9 2.8 2.6 5 4.1 2.5 4.6 0.9 3.7
Total
5,613 6,630 7,365 8,130 9,113 10,306 1,774 1,562 1,784 2,148 2,572 2,786 420 545 1,049 1,383 1,589 1,592
Source: Koomey (2007) and IDC. Units are in 000s. Notes: 1 Installed base is measured at the end of the year (December 31). 2 Installed base and shipments include both enterprise and scientific servers. The data does not include server upgrades. 3 Retirements are calculated from the installed base and shipments data. 2000 Retirements calculated using the 1999 installed base and year 2000 shipments. 4 World includes the U.S.
2004
$24.40 $12.80 $12.20 $49.5
2005
$27.30 $12.80 $11.60 $51.80
2006
$28.50 $12.20 $12.00 $52.80
2007
$30.80 $12.60 $11.60 $55.10
2008
$29.30 $11.70 $12.20 $53.30
Total
$140.50 $62.30 $59.90 $262.70
units reported in Table 5 are current dollars spent. Current dollars are appropriate for our purposes because price-performance ratios each year are based on current dollars. Midrange and highend server revenue was relatively flat over our target years 2004-2008. Entry-level server revenue increased from 24.4 billion in 2004 to 30.8 billion in 2007. Midrange and entry-level server revenue fell in 2008; high-end server revenue increased slightly over the same period.
Source: HMI? 2010. Data compiled from IDC Quarterly Server Tracking Reports, 2004-2009.
18
Server capacity and performance can be defined both technically and operationally. Technically it refers to the servers theoretical capacity what is the servers hardware capacity to input, process and output work, measured in a maximum transaction rate or in total bytes? Operationally, capacity refers to the ability of a server configuration to meet future resource needs. A typical capacity concern of datacenter managers is whether the server, storage and network resources will be in place to handle an increasing number of requests as the number of users and transactions increase. Planning for future increases is an ongoing task in a datacenter managers life capacity planning. For our purposes, we are interested in analyzing a snapshot of the installed capacity and the utilized capacity of all world servers in 2008. We define installed capacity as the sum of the maximum performance ratings of all installed servers in
19
SPEC Midrange
Performance
SPEC Entry-level
of server capacity how many hours are servers actually working? What is their average load factor? What workloads are they processing? Recalling our car example, if we think of a server as a delivery truck delivering bytes, how many hours is a delivery truck operated in a year (hours worked)? What is the trucks average speed compared to its maximum speed (load factor)? How many packages can it carry at a time (bytes)? Figure 6 illustrates our modeling approach.
TPC-C High-end
TPC-C Midrange
TPC-C Entry-level
2008. Utilized capacity is defined as the sum of the measured performance of all installed servers that year, adjusted by server load factors and the hours the servers are available for use. Since neither number can be directly measured, we estimate both. Our capacity model expresses installed capacity as the maximum number of transactions per minute theoretically possible for all servers added together in 2008. What do we mean by this? Imagine for a moment that every server in the world was running at its maximum performance rating, and we had a way to accurately count the number of transactions that every server processed. Installed capacity would be the total number of transactions processed in a year. This sum is a theoretical maximum, not a realistic one. Defining installed capacity in this way is akin to saying that the maximum performance capacity for all automobiles in the world is the sum of their top speeds driving flat-out down the highway. It is an interesting number, but not a realistic one. Instead, we need to adjust our theoretical maximum by taking into account the estimated utilization
20
Server availability
Workload Allocation
Total Bytes
all types
reported (this workload is intended to simulate the use of a standby server for peak load or back-up processing as would be typical in an operational environment).26 TPC-C (Online Transaction Processing) The TPC-C benchmark simulates a large wholesale outlets inventory management system. The test system is made up of a client system, which simulates users entering and receiving screen based transaction data, a database server system, which runs the database management system (DBMS), and the storage subsystem, which provides the required disk space for database and processing needs.27 The performance of the test system is measured when it is tasked with processing numerous short business transactions concurrently (Figure 7). The TPC-C workload involves a mix of five concurrent transactions of different types and complexity either executed on-line or queued for deferred execution:28 New Order: a new order entered into the database (approx 45%) Payment: a payment recorded as received from a customer (approx 43%)
Order Status: an inquiry as to whether an order has been processed (approx 5%) Stock Level: an inquiry as to what stocked items have a low inventory (approx 5%) Delivery: an item is removed from inventory and the order status is updated (approx 5%) TPC-C publishes two sets of results: raw performance, measured in transactions per minute, and price-performance, where the cost of the test system is divided by the transaction rate. We adjusted the published costs to reflect realistic hardware configurations.
Midrange
TPC-C SPECweb2005 VMmark
High-end
TPC-C
21
SPECweb2005 (Web server) SPECweb2005 is a benchmark published by the Standard Performance Evaluation Corporation (SPEC) for measuring a systems ability to act as a Web server. The benchmark is designed around three workloads: banking, e-commerce, and support. SPECweb2005 reports a performance score for each of the three workloads, measured in the number of simultaneous user sessions the system is able to support while meeting quality of service (QOS) requirements. An overall, weighted score is also reported.29 (Figure 8) The three workloads are designed to simulate enterprise applications and contain the following tasks: - SPECweb2005_Banking The banking load emulates a user session where the banking site transfers encrypted and non-encrypted information with simulated users. Typical user requests include log-on/log-off, bank balance inquiry, money transfers, etc. - SPECweb2005_Ecommerce The e-commerce load emulates an e-commerce site where customers
browse product information and place items in a shopping cart for purchase. Simulated activity includes customer scanning of product web-pages, viewing specific products, placing orders in a shopping cart and completing the purchase. - SPECweb2005_Support The support workload emulates a vendor support site that provides downloads such as driver updates and documentation. The load simulates customers viewing and downloading product and support documentation. VMmark (virtual machine workloads) VMmark, published by VMware, is the first virtual machine benchmark in the industry.30 It is designed to measure the performance of virtualized servers using a collection of sub-tests derived from benchmarks developed by the Standard Performance Evaluation Corporation (SPEC). VMmark test workloads include: Database server, Mail server, Java server, Web server (using a version of SPECweb2005), File server, and a Standby (idle) server. The unit of server work measured is called a
Payment (43%)
Client System
Delivery (5%)
Sources: Transaction Processing Performance Council (TPC) Hewlett Packard, An overview of the TPC-C benchmark on HP ProLiant servers and server blades, August 2007.
Storage Subsystem
22
Tile. Each Tile represents one group of six virtual machines, each machine running one workload: (Figure 9) Mail server This workload simulates a mail server in a company data center. Java server This workload simulates Java performance, important in many multi-tiered enterprise applications. Web server This workload simulates Web server performance. A modified version of SPECweb2005 is used.31 Database server The database server workload simulates an online transaction workload, similar to a light version of TPC-C.32 File server This workload simulates the performance of a file server. File servers are computers responsible for the central storage and management of data files so that other devices on the same network can access the files.
Standby server The standby workload simulates a stand-by or idle server, used in computing environments to handle new workloads, or workloads with unusual peak load behavior. VMmark reports a performance metric for each workload, and the total number of Tiles the system is able to run within quality of service (QOS) requirements.33
eCommerce
Request Type index search browse browse product line product detail customize1 customize2 customize3 cart login shipping billing confirm Total
Prime Client
Web Server
Storage Subsystem
23
Mail Server
File Server
Standby Server
Test Server
Source: VMware
OLTP Database
Web Server
Notes: OLTP (online transaction processing) File Server and Standby Server not included in calculations.
24
Storage Device
Figure 11 also illustrates our server measurement points and their corresponding benchmarks. Core OLTP transactions are measured by TPC-C; Web services applications are measured by SPECweb2005; and VM servers, which can be in either environment, are measured by VMmark. We do not measure application servers directly there is no single benchmark that addresses a representative subset of applications running on general-purpose servers in a typical company.35 While some fraction of Web-based application transactions are measured in VMmark and SPECweb2005, omitted are server transactions that support middleware, packaged software application suites such as PeopleSoft, SAP and business intelligence programs such as SAS. We would need to know how these programs scale with respect to a database transaction, or a Web eCommerce transaction, to estimate how their inclusion would affect our capacity and information calculations. We continue to research this area.
$ per measured transactions per minute = price of test server hardware divided by measured server performance To derive capacities, we need 1) the dollars spent for a specific server class, 2) benchmark tests that report the price-performance of the server class by year, and 3) the dollar value of servers allocated to a particular benchmark. As workload allocations by companies are not directly measurable, we rely on our own estimates, guided by expert interviews, industry data and our own judgment. Table 7 presents our workload allocations. We use these percentages in our capacity calculations. We estimate, for example, that just over a third of all server work processed in companies is made up of core database transactions. The remaining two-thirds of server work is processed on Web servers, with a quarter of that work virtualized.36 (Figure 12)
SPECweb2005
65% 65% 63% 37% 30%
VMmark
25% 30%
25
CORE
SPEC TPC-C
Web Application Servers Database Servers
PROCESSED OUTPUT
Edge Servers
OUTPUT
VMmark
User Devices
PROCESSED OUTPUT
EDGE
Midrange servers show different capacity trends. If all midrange servers sold in 2004 processed only SPECweb2005 workloads, their core capacity was 26.9 billion transaction requests per minute. By 2008, the corresponding capacity for SPECweb2005 workloads was 274 billion transaction requests per minute, or better than a ten-fold increase in four years. Several factors could account for the much higher growth rate of midrange server capacity compared with that of entry-level servers. Factors include: a higher proportion of multiprocessor, multi-core midrange servers were sold (more processors and more cores positively affects benchmark performance); midrange server price-performance improved faster than entry-level server price performance; or midrange server test configurations may have been able to take greater advantage of the other resources in the test system positively affecting test performance. Our server capacity assumptions, methodology and calculations are complex and we will not attempt
26
to explain them in this report. For interested readers, we have completed a background technical working paper which explains our key assumptions, describes our methodology in much greater detail, and gives sample calculations.37
Percentage of Core and Edge Measured Workloads
SPEC
25%
30%
VM
TPC
Online transaction processing, measured by TPC-C, accounts for almost 45% of all server bytes processed in 2008. Web services and general computing, processed by entry-level and midrange servers and including bytes processed by virtual machines, account for the rest. Midrange and highend servers process a disproportionate share of total
Class Server
Entry-level
2004
80.80 16.90 4.40 125.10 26.90 ---------
2005
98.10 19.20 4.30 148.00 53.80 ---------
2006
191.10 35.30 4.80 609.50 83.90 ---------
2007
184.00 73.30 7.50 848.00 175.20 --296.10 56.50 ---
2008
392.30 108.00 10.10 956.70 274.00 --523.30 76.70 ---
TPC-C
SPEC
rpmSPEC (billions)
VMmark
apmVM (billions)
Midrange High-end
NOTES: Each number is transaction capacity if all class servers were performing a single benchmark. tpmC = transactions per minute C
rpmSPEC = requests per minute SPEC apmVM = actions per minute VMmark --- = Workload not allocated to server class, or no benchmark test available
27
Entry-level
Midrange High-end
there are inherent uncertainties in many of our assumptions, and they are subject to change based upon improved methodology and data. Interested readers should consult our background technical working paper for details on key assumptions, our methodology for converting benchmark transactions into bytes, and for a complete description of how we derived total 2008 server bytes.
Total Server Information in 2008: 9.57 zettabytes bytes high-end servers make up only about twotenths of one percent of all installed servers (0.22%), but process 5% of total annual bytes. Midrange servers, far more numerous, make up approximately 5% of all installed servers, and process 29% of total annual bytes. In contrast, the ubiquitous entry-level servers make up over 94% of all the installed servers in the world, and process two-thirds of all the bytes. (Figure 14) The magnitude of OLTP transaction processing almost half of all bytes reflects the growing importance of workload specific computing in recent years. The model reverses the general purpose computing model that has dominated enterprise computing for over a decade. We discuss workload specific computing and other important computing trends in Section 5.
28
Midrange
1.21 0.89 0.69 2.8 29.30%
High-end
0.45
Row Total
4.24 2.55 2.77
% by Workload
44.40% 26.70% 29.00% 100%
0.45 4.70%
9.57 100%
inclusive definition of information. We included estimates of the amount of data processed as input and delivered by servers as output. This is a different emphasis than estimating the amount of stored data (data at rest) or counting the first instance of new information being created (the first airing of a radio program, or the first release of a new television show).
29
and with all types of work being completed in firms. We will continue to research a more extensive capacity measure - one including additional IT equipment such as storage and network devices.
Entry-level
VM
Midrange
SPEC
Server Class
High-end TPC-C
Benchmark
Grand Total = 9.57 1021 bytes to be weighted by average server load factors and available hours of use that approximate how an average server works in a company. We estimated server loads in practical terms how hard the servers get used based on our own data and judgment, and vetted our estimates with industry experts. Our load estimates do not reduce to a single measure of CPU, or memory, or I/O utilization. Rather, they are estimates of average utilization relative to the benchmarks and workloads used in our calculations. Our capacity per dollar measure was especially helpful when confronting the practicalities of estimating total server information. Enterprise computing environments are context-specific. Computing and application workloads vary widely across firms even among those of similar company size and industry sector. To make sense out of a complex environment, it was necessary to define a common yardstick that we could use with all servers
30
vendors are already exploiting solid-state disks (SSDs) and investigating other largescale memory technologies. For example, the National Science Foundations (NSF) next supercomputer, called Gordon, is designed to fuse traditional High Performance Computing (HPC) with HPD, or High Performance Data processing. When fully configured and deployed, Gordon will feature 245 teraflops of total compute power (one teraflop or TF equals a trillion calculations per second), 64 terabytes (TB) of DRAM (digital random access memory), and 256 TB of flash memory, about one-quarter of a petabyte.41 Trends in large memory system deployments and processing are not addressed in our current analysis. Shared-nothing platforms, which consist of multiple independent nodes in parallel, have been prevalent since the mid-1980s as viable architectures for scalable, highly data parallel processing. More recently, sharednothing systems using commodity hardware have proven effective for massive scale, data parallel applications, such as web indexing by Google. Systems of this type, with thousands of nodes, are in use at large, Internet-scale businesses. Our current analysis does not address the special cases of Google, Microsoft, and other Internet-scale businesses, either in server counts or in estimating capacities using workload analyses. While their inclusion would almost certainly not change the order of magnitude of our world analysis, an analysis of US enterprise server information would require further investigation. Also, shared-nothing platforms are a key component in cloud computing environments. There is a strong possibility that cloud computing will become an important, if not key part of the solution for enterprise computing in the future. Therefore, incorporating shared-nothing architectures into our analysis will help address a significant component of enterprise information. Database Machines. Another area that is re-emerging is that of computer architectures designed for database intensive computing. The legacy in this area reaches all the way back almost 30 years ago to the field of database machines and other specialized hardware designs to support specific classes of database applications. A number of factors, ranging
from the need to efficiently deal with the data deluge, relative flexibility and improved costs of hardware design and fabrication, and the need for energy conservation are leading towards a re-examination of architectures, with the goals of systems optimized for massive data processing. In the past, the area of database machines led to the development of hardware/software systems such as Teradata, and to parallel software systems like IBM DB2 Parallel Edition, a shared-nothing commercial database system.42 Currently, the release of Oracles Exadata Database Machine points in the direction of specialized hardware design and optimization attuned to extreme database performance.43 In highperformance computing environments, some scientific experiments that expect to generate very large amounts of data are investigating hardware embedding of processing algorithms to deal with the continuous data rates from high-resolution instruments.44 In scientific computing and large-scale data warehouses, it may soon become necessary to think of provisioning datasets with hardware. The dataset becomes the first order object with the computing platform dependent on it, rather than the current practice, which is the reverse.
5.3 Back to the Future: Data Discovery, Data Generation, Data Preservation
Data intensive computing is about the data, and necessarily requires a deep engagement between the business users on the one hand (sales managers, supply chain managers, financial analysts, etc.), and IT and technical experts on the other (the firms IT professionals, technical specialists in vendor companies, etc.). This alignment does not happen without the required investments in time and resources made by senior business, line management and IT management. With the vast proliferation of available data, there is increasing need for innovative search techniques that assist users with data discovery. It should be possible, for example, for users to specify the type of data they are looking for and have a system respond with useful results as well as recommendations for guiding the next search step. Current business intelligence and general search applications do not
31
provide this kind of capability. Another example would be the increasing need for integration of very heterogeneous data, given the need for addressing complex issues. In medicine something as conceptually simple as a lifetime personal medical chart, or a database of all test results for a family, would be examples. Traditional methods of data integration require significant manual intervention to actually integrate the data (e.g. by creating integrated database views, etc) and do not scale. Novel tools and techniques are needed to facilitate such integration. One approach is referred to as ad hoc data integration, an approach that allows users to control, on-the-fly, which data are to be integrated. However, this approach requires a significant semantic infrastructure to be in place, and that is very rarely the case. Finally, a longer-term issue in both enterprise and research environments is data archiving and digital data preservation. In research settings, preserving scientific data is generally deemed to have intrinsic value. In enterprise settings, business policy and statutory regulations require preservation of data for a number of years after use. Federal agencies including NIH and NSF have recently announced needs for major data plans to address issues of data archiving and data preservation. And longterm data archiving and data preservation is a growing challenge for business organizations, beyond current retention policies - typically seven years. There are many industries financial services, insurance, exploration and geological sciences, engineering or entertainment where arbitrary data age limits make little sense. We have not addressed data and information storage in this analysis, but we will do so in the future. The issues are complex involving technical as well as policy considerations. Nonetheless, in the future digital data archiving and preservation will require as much enthusiasm in research and in industry settings as we have provided to data generation and data processing.
32
APPENDIX
Counting Very Large Numbers
Byte (B) Kilobyte (KB) Megabyte (MB) = = = 1 byte 10 bytes
3
= = =
1 1,000 1,000,000
= = =
One character of text One page of text One small photo One hour of High-Definition video, recorded on a digital video camera at its highest quality setting, is approximately 7 Gigabytes The largest consumer hard drive in 2008 AT&T carried about 18.7 Petabytes of data traffic on an average business day in 2008
106 bytes
Gigabyte (GB)
109 bytes
1,000,000,000
Terabyte (TB)
1012 bytes
1,000,000,000,000
Petabyte (PB)
1015 bytes
1,000,000,000,000,000
= =
= =
1,000,000,000,000,000,000 1,000,000,000,000,000,000,000
Approximately all of the hard drives in home computers in Minnesota, which has a population of 5.1M
33
ENDNOTES
1. In December of 2007, Steven Lohr of The New York Times asked
whether information overload was a $650 billion drag on the US economy, citing estimates he found in analyst and industry sources. Steven Lohr, Is Information Overload a $650 Billion Drag on the Economy? The New York Times, Bits, December 20, 2007.
Report on American Consumers, Global Information Industry Center (GIIC), University of California San Diego, December 2009.
http://hmi.ucsd.edu/howmuchinfo.php
calculations. By overhead, we refer to the amount of processing resources used by system software, such as the operating system, transaction processing (TP) monitor or database manager. In communications, data that is not part of the user data, but is stored or transmitted with it. Examples would include data for error checking, channel separation, or addressing information.
4. The World Factbook, U.S. Central Intelligence Agency, available at:
in most studies of organizational information. Classically, organizational information has been defined by measuring the communications volume in a given firm, or by estimating the amount of stored data in the organization. Lyman and Varian did not separate out enterprise information in their 2000 and 2003 reports. Rather, they estimated the amount of new information stored annually on digital storage devices, and added to that the growth of printed office documents. Their total was 18.4 exabytes. See Lyman and Varian, Section IX B.2. Copies of Information Stored/Published on Hard Drives - Accumulated Stock, in Peter Lyman and Hal Varian, How Much Information? 2003 Report, University of California, Berkeley.
11. IDCs Server Taxonomy reports server sales in three price ranges:
its Worldbase global commercial database in 2008. In 2010 the current number is 160 million. Available at:
Entry-level servers (servers priced less than $25,000), Midrange servers ($25,000 to $499,999), and High-end servers ($500,000 or more). The revenue data is stated as factory revenue for a server system. Factory revenue represents those dollars recognized by multi-user system and server vendors for units sold through direct and indirect channels and includes the following embedded components: Frame or cabinet and all cables, processors, memory, communications boards, operating system software, other bundled software and initial internal and external disks in the shipped server. Note that IDC publishes server data using alternate price categories. We have used the price ranges defined here to be consistent with previous research published by the EPAs Energy Star Program, Report to Congress on Server and Data Center Energy Efficiency Public Law 109-431, August 2, 2007 Available at:
http://www.dnb.co.uk/about-dnb.asp http://www.dnb.co.uk/dnb-database.asp
information information is a subset of data. Data is created by machines, such as sensors, barcode readers or computer keyboards, and transformed by other machines, such as cable routers (location change), storage devices (time shift), and computers (symbol and meaning change). See Sections 1.1-1.3 in Roger E. Bohn and James E. Short, How Much Information? 2009 Report on American Consumers, Global Information Industry Center, University of California San Diego, December 2009.
7. Researchers have defined data in motion as kinetic data, using
12. John Gantz et. al., The Diverse and Exploding Digital Universe: An
Updated Forecast of Worldwide Information Growth Through 2011, IDC White Paper, March 2008.
13. IDC describes its methodology as:
the analogy between kinetic and potential energy. Kinetic data, or data moving around the firm, is accessible for use by many applications and therefore can provide greater value. In contrast, stored data (or data at rest) has potential information value. But it first must be made kinetic and processed in an application to deliver value to users.
8. Drucker defined information as data endowed with relevance and
purpose. Peter F. Drucker, The Coming of the New Organization, Harvard Business Review 66 (January-February 1988), pp. 45-53.
Develop a forecast for the installed base of devices or applications that could capture or create digital information. Estimate how many units of information files, images, songs, minutes of video, phone calls, packets of information were created in a year. Convert these units to megabytes using assumptions about resolutions, digital conversion rates, and usage patterns. Estimate the number of times a unit of information is replicated, either to share or store. The latter can be a small number, for example, the number of spreadsheets shared, or a large number, such as the number of movies written onto DVDs or songs uploaded onto a peer-to-peer network. A complete presentation is at: http://www.emc.com/collateral/ analyst-reports/expanding-digital-idc-white-paper.pdf
of information in Chapter 1, Information and Its Discontents: An Introduction, in Thomas H. Davenport, Information Ecology (New York: Oxford University Press, 1997).
34
gartner.html
company servers. The 1 million machine estimate was published in a Gartner research briefing on 26 June 2007. Gartner Research Brief, Look Beyond Googles Plan to Become Carbon Neutral, ID Number: G00149834, Publication Date: 26 June 2007. See also Google: One Million Servers and Counting at http://www.pandia.com/sew/48115. None of the large, public Internet companies report information on
explosion/datacenter-challenges-include-social-networks-risingenergy-costs-614
David Cappuccio: Rising use of social networks, rising energy costs, and a need to understand new technologies such as virtualization and cloud computing are among the top issues IT leaders face in the evolving datacenter the 650 percent enterprise data growth over the next five years poses a major challenge, in part because 80 percent of the new data will be unstructured. See http://www.infoworld.com/d/data-
their installed server base or datacenters. However, there is an active blogsphere with estimates popping up frequently. Some may even be right. Examples: See http://www.idg.no/bransje/bransjenyheter/ article57876.ece; http://www.pandia.com/sew/481-gartner.html
Worldwide Quarterly Server Tracker, information at <http://www.idc. com/getdoc.jsp?containerId=IDC_P348. Gartner releases public data on its quarterly market review through Gartner Press Releases, at
16. Both IDC and Gartner track server information. IDC publishes the
installed base of servers using vendor data on shipments and equipment lifetimes derived from vendor reports and market surveys. Equipment lifetimes are based on an analysis of server service contracts and other information. The data includes both enterprise and scientific (HPC or high performance) servers, and excludes server upgrades.
in the EPA EnergyStar Server Report as background. We do not use this data in our calculations. See Report to Congress on Server and Data Center Energy Efficiency: Public Law 109-431, U.S. Environmental Protection Agency ENERGY STAR Program, August 2, 2007. Also see Jonathan Koomey, Estimating Total Power Consumption by Servers in the U.S. and the World, Final Report, February 15, 2007.
18. IDC Quarterly Server Tracker, at http://www.idc.com Note that
Server performance has been analyzed at the CPU level, at the system level, and at the applications level. While the application level is closest to the tasks that users are actually performing, data at this level are the hardest to measure and results to generalize. The following references have examples using different metrics and workloads: Martin Pinzger, Automated Web Performance Analysis, with a Special Focus on Prediction, Proceedings of iiWAS2008, November 24-26, 2008, Linz, Austria; Gaurav Banga and Peter Druschel, Measuring the capacity of a web server, in USENIX Symposium on Internet Technologies and Systems, pages 61-71, Monterey, CA, December 1997; Alaa R. Alameldeen, Carl J. Mauer, Min Xu, Pacia J. Harper, Milo M.K. Martin, Daniel J. Sorin, Mark D. Hill and David A. Wood, Evaluating Nondeterministic Multi-threaded Commercial Workloads, Proceedings of the Computer Architecture Evaluation using Commercial Workloads (CAECW-02) February 2, 2002; Paul Barford and Mark Crovella, Generating Representative Web Workloads for Network and Server Performance Evaluation, SIGMETRICS 96 Madison, WI, USA; Pradeep Padala, Xiaoyun Zhu, Zhikui Wang, Sharad Singhal, Kang G. Shin, Performance Evaluation of Virtualization Technologies for Server Consolidation, HP Laboratories Palo Alto, HPL-2007-59, April 11, 2007; David Mosberger and Tai Jin, httperfm: A Tool for Measuring Web Server Performance, HP Research Labs, Hewlett-Packard Co. Palo Alto CA 94304; and Henning Schulzrinne, Sankaran Narayanan, Jonathan Lennox, and Michael Doyle, SIPstone - Benchmarking SIP Server Performance, Working Paper, Columbia University 2002.
24. Jonathan G. Koomey, Christian Belady, Michael Patterson, Anthony
IDC and Gartner report worldwide server sales slightly differently. IDC reports factory revenue. Gartner adds revenue from the distributor channel to factory revenue. We have used IDC data for consistency.
19. Ibid. IDC Worldwide Quarterly Server Tracker. 20. CNET News, Google uncloaks once-secret server, April 1, 2009.
Santos, and Klaus-Dieter Lange, Assessing Trends Over Time in Performance, Cost, and Energy Use for Servers, Final Report, August 17, 2009. Section Data and Methods.
25. Our complete doubling time analysis is presented in a background
http://news.cnet.com/8301-1001_3-10209580-92.html
21. Nielsenwire, Twitters Tweet Smell of Success, March 18, 2009. RANK Site Feb 08 Feb 09 % growth
technical working paper that accompanies this report. See James E. Short, Roger E. Bohn and Chaitan Baru, How Much Information? 2010, Report on Enterprise Information, Background Technical Working Paper No. 01, November 2010.
26. It is important to note the differences among the benchmarks in
1 2 3
http://blog.nielsen.com/nielsenwire/online_mobile/twitters-tweetsmell-of-success/
22. Jon Brodkin, Datacenter challenges include social networks, rising
energy costs: Researcher warns that projected 650 percent growth in enterprise data over the next five years poses a major challenge to IT leaders, InfoWorld, December 2, 2009. Brodkin quotes Gartner analyst
how the actual tests are run: TPC-C tests a single workload, OLTP database transactions, and reports a single performance metric, tpmC. SPECweb2005 tests three workloads, Banking, Ecommerce and Support. SPEC runs each test consecutively and reports results for each workload, and a weighted, total result. VMmark tests six workloads, all running concurrently. Results are published for each workload, and for the number of tiles the test system can successfully process meeting QOS requirements. By definition, VMmark benchmarks running concurrently would not be able to achieve comparable results if they were run individually on a test system where all processing and other system resources were devoted to a single workload.
35
27. The TPC-C benchmark was approved in July 1992. The first result
was published in September 1992. The test system recorded a transaction per minute C of 54 (tpmC), and a cost per tpmC of $188,562. This compares with a typical cost per tpmC today under $0.25. Since 1992, the TPC-C test specification has been updated, and for some test years results are not comparable. All test results used in this report are comparable using TPC-C software Version 5.
28. The TPC-C benchmark models a typical online transaction
and in large companies, there are many thousands of these devices. Progressively, client devices are becoming more and more dependent on the edge environment for computing resources and support. For example, for accessing data and applications, or for accessing email and voice communications. We do not address client devices in this analysis, but will do so in the future.
35. There is no industry-standard benchmark for application servers. The
processing (OLTP) environment. The benchmark simulates a large wholesale outlets inventory management system. TPC-C involves a mix of five concurrent transactions of different types and complexity either executed on-line or queued for deferred execution. The model consists of a number of warehouses, each with ten (or more) terminals representing point-of-sale or point-of-inquiry stations. Transactions are defined around entering and delivering orders, recording payments, checking the status of orders, and monitoring the stock level at the warehouses. Two transactions model behind-the scenes warehouse activity - the stocking level inquiry and the delivery transaction. The stocking level inquiry scans a warehouse inventory for items that are out of stock or nearly so. The delivery transaction collects orders and marks those that have been delivered. While the TPC-C benchmark portrays the activity of a wholesale supplier, the benchmark is not limited to the activity of any particular business segment. Instead, according to TPC, it represents any industry that must manage, sell, or distribute a product or service. See Raab, Walt K. and Amitabh, Shah, Overview of the TPC Benchmark C: The Order-Entry Benchmark at http://www.tpc.org/tpcc/detail.asp
29. The SPECweb2005 documentation page is available at www.spec.org A concise description of the benchmark is also available at
most-widely adopted application benchmark, SPECjAppServer2004, is a multi-tier benchmark for measuring the performance of Java 2 Enterprise Edition application servers. SPECjAppServer2004 has undergone several software revisions since its introduction in 2004, and current results are not compatible with results from previous software releases. As a result, there are an insufficient number of test system results for our purposes. Over time, however, as SPEC stabilizes the test software and the number of tests increase, we may be able to incorporate results from this benchmark. The great majority of performance testing in the applications area, however, is done by the vendors themselves for their customers. All of the major vendors Oracle, SAP, Teradata have elaborate configuration benchmarks to size customer systems. Results are customer specific. SAP for example, publishes Standard Performance Benchmarks for all of their major software products. Oracle and Teradata do the same thing. SAPs application benchmark page is at:
http://www.sap.com/solutions/benchmark/index.epx
36. Comparable benchmark data for all server classes does not exist for
https://sp.ts.fujitsu.com/dmsp/docs/benchmark_overview_ specweb2005.pdf
30. The VMmark documentation and FAQ page are at
all test years between 2004 and 2008. There are gaps, for example, in server models and machine configurations tested by TPC-C and those tested by SPECweb2005. We have made several adjustments to address machine comparability across test years. Second, while virtualization has been deployed in companies for years, VMmark began testing and publishing benchmark results in 2007. Therefore, 2007 is the first year we can include VMmark results in our calculations. Third, while we are not able to include application server measurements in our calculations at this time, some of the application server processing taking place is included in the TPC-C, SPECweb and VMmark test results all of these workloads are applications themselves. But we do not have direct measurements for some workloads that may be important, Decision Support / Business Intelligence for example. We continue to research this area.
37. See background technical working paper. 38. Lyman and Varian did not separate out information delivered at work
SPECweb2005.
greater detail. See James E. Short, Roger E. Bohn and Chaitan Baru, How Much Information? 2010, Report on Enterprise Server Information, Background Technical Working Paper No. 01, November 2010.
33. VMmark can be set up to run in a single tile or in multiple tiles
from household or personal information received out of work. Peter Lyman and Hal R. Varian, How Much Information, 2003. Available at:
http://www.sims.berkeley.edu/how-much-info-2003
the decision is left to the test engineers. As a practical matter, engineers estimate the maximum number of tiles the test system will be able to run successfully, and start the test in that range. They then add one tile at a time until the test run fails. The test is then rerun with the maximum successful number of tiles recorded. VMmark test procedures are described in the benchmark technical discussion and FAQ page available at: http://www.vmmark.com
34. Client devices sit at the edge of the Edge computing environment.
39. John Gantz et. al., The Diverse and Exploding Digital Universe: An
Updated Forecast of Worldwide Information Growth Through 2011, IDC White Paper, March 2008.
40. John Gantz and David Reinsel, The Digital Universe Decade Are
41. UC San Diego News Center, Whats Next for High Performance
Client devices include all of the digital devices that are in employee hands -mobile phones, notebook computers, storage devices and so on. Of course, much critical work activity takes place on client devices,
Computing: Fusing High-Performance Data with High-Performance Computing Will Speed Research, February 24, 2010. http://ucsdnews.
ucsd.edu/newsrel/supercomputer/02-24NextForHPC.asp
36
S. Padmanabhan, and W. Wilson, DB2 Parallel Edition, IBM Systems Journal, April 1995.
servers, 14 storage servers, optimized network switches and I/O bandwidth, and user rack storage capacities of 100 TB per rack. Oracle Datasheet: Oracle Exadata Database Machine X2-8.
http://www.oracle.com/ocom/groups/public/@otn/documents/ webcontent/173705.pdf
44. San Diego Supercomputer Center News, NSF Awards $20 Million to
News%20Items/PR110409_gordon.html
Global Information Industry Center UC San Diego 9500 Gilman Drive, Mail Code 0519 La Jolla, CA 92093-0519 http://hmi.ucsd.edu/howmuchinfo.php