You are on page 1of 19

Storage Basics Part I: An Introduction

by JOSHUA TOWNSEND on DECEMBER 28, 2009 12 COMMENTS in GENERAL IT, STORAGE, STORAGE BASICS, VMW ARE I am increasingly finding that both my SMB and Enterprise customers are uneducated on the fundamentals of storage sizing and performance. As a result, storage is often overlooked as a performance bottleneck despite it being a vital component to consider in a virtualization implementation. Storage will only increase in importance as hosts are getting bigger, data volumes increase, and more workloads are virtualized. For some reason, most people can grasp the importance of CPU and memory performance constraints but storage performance is often overlooked and can be hard to explain to business users or executives. Case in point I have recently been called into some environments that were not performing well these environments happened to be running Microsoft SQL, but could just have well been running any application or collection of virtual machines. Fingers were being pointed in all directions: at applications, at the virtualization layer, at a lack of memory, and DBAs were insisting that there were too few CPUs. The situation was getting political and emotional when I walked into it. A few minutes with Windows Perfmon was all I needed to identify storage performance as the root cause of the firestorm that had been ignited. Using a bit of data, I was able to turn the discussion from an emotional fight to a simple problem of physics and mathematics (and a bit of simple math could have avoided the problem in the first place). I have seen this play out a few too many times and so decided to write-up this multi-part series on the basics of storage with a focus on storage performance. That said, a little math and physics is where we will start as we look at the basic building block of a storage environment: a hard disk drive. Wikipedia defines a hard disk drive as a non-volatile storage device that stores digitally encoded data on rapidly rotating platters with magnetic surfaces. Your computer, server, or VMware cluster uses hard drives to read and write data. Wikipedia also covers the history and atomic structure of a hard drive pretty well. For our purposes, the take away is that hard drives are physical objects, and as such, follow the laws of physics (duh) in the following measurable ways: 1.) Capacity, which is measured in bits or bytes and exponents there of (MB, GB, TB, PB). This is how much data will fit on your disk, from simple text files to virtual disks, and everything in between. For example, if you have a 500GB SQL database, you darn well better have a hard

drive that has a capacity of at least 500GB. This is a pretty simple concept, so Ill leave it there for now. 2.) Performance, which is measured in a couple ways: - at the disk itself in Input-Output Per Second (IOPS) a measure of how many read and write commands a disk can complete in a second - interface throughput, measured in MBps or Gbps a measure of the peak rate that a volume of data can be read from or written to disk - latency the amount of time between when you ask a disk (or storage system if you want to read ahead) to do something and when it can actually do it, very closely related to IOPS as youll read in a forthcoming article in this series. Each disk, array, and storage system has its own fixed set of measurements given a specific configuration. Knowing the physical capabilities of your storage system as measured in the above ways, and your systems storage requirements will go a long way towards a successful design and implementation of your storage environment. The remaining parts of this series will take a look at these performance characteristics a bit more in-depth and explain what happens as you introduce factors like RAID, cache, data reduction techniques such as snapshots and deduplication, and varying workloads. Please keep in mind that while I have designed and implemented a variety of DAS, NAS, and SAN technologies from a host of vendors including Dell, EMC, IBM, and NetApp, I am by no means a storage expert. The information I will provide is generalized, over-simplified, and does not consider varying approaches from different storage vendors. Nonetheless, I hope you find this useful information if you are designing a solution, troubleshooting a performance issue or preparing to make a storage purchase.

Storage Basics Part II: IOPS


by JOSHUA TOWNSEND on DECEMBER 29, 2009 13 COMMENTS in GENERAL IT, STORAGE, STORAGE BASICS, VMW ARE, VMWARE HOW TO In Part I of this series, I discussed the important of storage performance in a virtual environment (really any environment, virtual or not, where you want acceptable performance), and introduced some of the basic measures of a storage environment. In Part II, we will look more closely at what may be the most important storage design consideration in a VMware server-consolidation enviornments, many SQL environments, and VDI environments to name a few: IOPS. If we stick with a single-disk-centric approach as we did in Part I, IOPS is quite simply a measure of how many read and write commands a disk can complete in a second. IOPS is an important measure of performance in a shared storage environment (such as VMware) and in high-transaction-rate workloads like SQL. Because hard drives are forced to abide by the laws of physics, the IOPS capabilities of a disk are consistent and predictable given a specific configuration. The formula for calculating IOPS for a given disk is pretty straight forward (please show your work): IOPS = 1000/(Seek Latency + Rotational Latency) Exact latencies vary by disk type, quality, number of platters, etc. You can look up the tech specs for most drives on the market. As an example, I have randomly chosen the technical specifications of the Seagate Cheatah 15k.7 SAS drive. This particular drive has the following performance characteristics: - Average (rotational) latency: 2.0msec - Average read seek (latency): 3.4msec - Average write seek (latency): 3.9msec Using the read latency number, the math works out like this: 1000 - = 185 maximum read IOPS 2.0+3.4 The maximum write IOPS will be a bit less (~169IOPS) because of the higher write seek latency. Writing is more expensive than reading and therefore slower.

Fortunately, there are some widely accepted working numbers, so you do not have to use this formula for each and every disk you might consider using. Because rotational latency is based on the rotational speed, we can use the published Rotations Per Minute (RPM) rating of the drive to guess-timate the IOPS capabilities. Typical spindle speeds (measured in RPM) and their equivalent IOPS are in the table below. RPMIOPS 7,200 10,000 15,000 SSD 80 130 180 2500 6000

While not a traditional spinning disk, I have also included Solid State Disks (SSDs) for reference as SSDs are starting to see increased market adoption. I have seen a wide range of sizing IOPS for SSD depending on the technology, type (SLC, MLC, etc.) Check outhttp://en.wikipedia.org/wiki/Solid-state_drive for an introduction, and ask your vendors for more in-depth technical information. If you are brand-new to this (and you are still reading, congrats!), you can see how many IOPS your Windows computer is asking for by opening Performance Monitor and looking at the Disk Transfers/sec counter under Physical Disk. This is a sum of the Disk Reads/sec and Disk Writes/sec counters as you can see in the screenshot below:

If you are after some stats for your VMware ESX environment, check out esxtop and looking for CMDS/s in the output. I published a couple articles on using esxtop here and here. The

numbers from PerfMon and esxtop get you pretty close but can be skewed by a few things well discuss in later posts. Now that was fun and all, but lets get real: Single-disk configurations are uncommon in servers. As such, well part ways with our Simple Jack single disk approach to storage and begin to look at more real-world multi-disk enterprise-class storage configurations. A discussion of IOPS in a multi-disk array is a great way to start. From a very elementary perspective, you can combine multiple hard drives together to aggregate their performance capabilities. For example, two 15k RPM disks working together to server a workload could provide a theoretical 360 IOPS (180 + 180). This also scales out so ten 15k RPM disks could provide 1800 IOPS, and 100 15k RPM disks could provide 18,000 IOPS. Designing your environment so that your storage can deliver sufficient IOPS to the requesting workload is of utmost importance. If you are working on a storage design, arm yourself with data from perfmon, top, iostat, esxtop, and vscsiStats. I typically gather at least 24 hours of performance data from systems under normal conditions (a few days to a week may be good if you have varying business cycles) and take the 95th percentile as a starting point. So from a very simple approach, if your data and calculations show a 1800 IOPS demand at the 95th percentile, you ought to have at least ten 15k RPM disks (or twenty-three 7.2k RPM SATA disks) to achieve performance goals. Its amazing how some simple data and a pretty little Excel spreadsheet can help you understand and justify the right hardware for the job. Now before you go and start filling out that PO form for a nice new storage system based on these numbers there are a few more things we ought to discuss. RAID, cache, and advanced storage technologies will skew these numbers and need to be understood. Stay tuned to future articles in this series for more on those topics and more. Finally, there has been a bunch of activity in the VMware ecosystem of vendors, bloggers, and twittering-type-folks around storage performance. As this here post sat in my drafts folder, Duncan Epping posted this gem of an article that pretty much included all of the content of this article, as well as future ones in my series: http://www.yellow-bricks.com/2009/12/23/iops/. Do yourself a favor and read his post and the comments from his readers both are filled with a ton of great information, including some vendor-specific implementations. I was led to Duncans article by a post by Chad Sakac on his blog:http://virtualgeek.typepad.com/virtual_geek/2009/12/whats-what-in-vmware-view-and-vdiland.html. This is also a great read that covers some of the same information with a focus on VMware View/VDI and is also worth a few minutes of your time. Also check outhttp://vpivot.com/2009/09/18/storage-is-the-problem/ for a rubber-meets-the-road post from

Scott Drummonds on the importance of storage performance vis-a-vis IOPS in a VMwarevirtualized SQL environment.

Storage Basics Part III: RAID


by JOSHUA TOWNSEND on JANUARY 6, 2010 15 COMMENTS in STORAGE, STORAGE BASICS, VMW ARE This is the third in a multi-part series on storage basics. Ive had some good feedback from folks in the SMB space saying that the first couple posts in this series have been beneficial, so well be sticking with some basic concepts for another post or two before we dive into some nitty-gritty details and practical applications of these concepts in a VMware environment. In the second post of this series I introduced the concept of IOPS and explained how the physical characteristics of a hard disk drive determine the theoretical IOPS capability of a disk. I then noted that you can aggregate disks to achieve a greater number of IOPS for a particular storage environment. Today, we will look at just how you combine multiple disks and the performance impact of doing so. Remember that we are keeping this simple; the concepts I present here may not apply to that fancy new SAN you just purchased with your end-of-year money or the cheap little SATA controller on your desktops motherboard (not that theres anything wrong with it) were more in the middle ground of direct attached storage (DAS) as we firm up concepts. Enterprise servers and storage systems have the ability to combine multiple disks into a group using Redundant Array of Independent Disks (RAID) technology. Well assume a hardware RAID controller is responsible for configuring and driving storage IO to the connected disks. RAID controllers typically have battery-backed cache (well talk cache in a future post), an interconnect where the drives plug in, such as SCSI or SAS (well talk about these too in a future post), and hold the configuration of the RAID set including stripe size and RAID level. The controller also does the basic work of reading and writing on RAID set mirroring, striping, and parity calculations. There are several different types of RAID level rather than rehash the details of them, read the Wikipedia entry on RAID and then come back here. Ok, great. So you now know that RAID is implemented to increase performance through the aggregation of multiple disks, and to increase reliability though mirroring and parity. Now lets consider the performance implications of some basic RAID levels. As with many things in the IT industry, there are trade-offs: security vs. usability, brains vs. brawn, and now performance vs. reliability. As we increase reliability in a RAID array through mirroring and parity, performance can be impacted. This is where the more disks = more IOPS bit starts to fall apart. The exact impact depends on the RAID type. Here are some examples of how RAID impact the maximum theoretical IOPS using the most common RAID levels, where: I = Total IOPS for Array (note that I show Read and Write separately)

i = IOPS per disk in array (based on spindle speed averages from Part II: IOPS) n = Number of disks in array r = Percentage of read IOPS (calculated from the Average Disk Reads/Sec divided by total Average Disk Transfers/Sec in your Windows Perfmon) w = Percentage of write IOPS (calculated from the Average Disk Writes/Sec divided by total Average Disk Transfers/Sec in your Windows Perfmon) RAID0 (striping, no redundancy) This is basic aggregation with no redundancy. A single drive error/failure could render your data useless and as such it is not recommended for production use. It does allow for some simple math: I =n*i Because there is no mirroring or parity overhead, theoretical maximum Read and Write IOPS are the same. RAID 1 & RAID10 (mirroring technologies): Because data is mirrored to multiple disks Read I = n*i For example, if we have six 15k disks in a RAID10 config, we should expect a theoretical maximum number of IOPS for our array to be 6*180 = 1080 IOPS Write I = (n*i)/2 RAID5 (striping with a single parity disk) Read I = (n-1)*i Example: Five 15k disks in a RAID 5 (4 + 1) will yield a maximum IOPS of (5-1)*180 = 720 READ IOPS. We subtract 1 because one of the disks holds parity bits, not data.

Write I = (n*i)/4 Example: Five disks in a RAID 5 (4 + 1) will yield a maximum IOPS of (5*180)/4 = 225 WRITE IOPS Again, these formulas are very basic and have little practical value. Furthermore, it is seldom that you will find a system that is doing only reads or only writes. More often, as is the case with typical VMware environments, reads and writes are mixed. An understanding of your workload is key to accurately sizing your storage environment for performance. One of the workload characteristics (well explore some more in the future) that you should consider in your sizing is the percentage of read IOPS vs. the percentage of write IOPS. A formula like this gets you close if you want to do the math for a mixed read/write environment in a RAID5 set: I = (n*i)/(r+4 *w) Example: a 60% read/40% write workload with five 15k disks in a RAID5 would provide (5*180)/(.6+4*.4) = 409 IOPS.

The previous examples have all been from the perspective of the storage system. If we take a look at this from the server/OS/application side, something interesting shows up. Lets say you fired up Windows perfmon and collected Physical Disk Transfers/sec counters every 15 seconds for 24 hours and analyzed the data in Excel to find the 95th Percentile for total average IOPS (this is a pretty standard exercise if you are buying enterprise storage array or SAN). Lets say that you find that the server in question was asking for 1000 IOPS at the 95th Percentile (lets stick with our 60% read/40% write workload). And finally, lets say we put this workload on a RAID5 array. Thats saying a lot of stuff, but what does it all mean? Because RAID5 has a write penalty factor of 4 (again, Duncan Eppings posted a great article here which I referenced in Part II that describes this in a slightly different way) we can tweak the previous formula to show the IOs to the backend array given a specific workload. I = Target workload IOPS f = IO penalty r = % Read

w = % Write IO = (I * r) + (I * w) * f Our example then looks like this (remember work inside parenthesis first, and then My Dear Aunt Sally): (1000 * .6) + ((1000 * .4) * 4) = 2200 Simply stated, this means that for every 1000 IOPS that our workload requests from our storage system, the backing array perform 2200 IOs, and it better do it quickly or you will start to see latency and queuing (we call this performance degradation, boys and girls!). Again, this is a very simplistic approach neglecting factors like cache, randomness of the workload, stripe size, IO size, and partition alignment which can all impact requirements on the backend. Ill cover some of those later. As you can hopefully see, the laws of physics combined with some simple math can provide some pretty useful numbers. A basic understanding of your array configuration against your workload requirements can go a long way in preventing storage bottlenecks. You may also find that as you consider the cost per disk against various spindle speeds, capacities and RAID levels that you are better off buying smaller, faster, fewer, more, slower. disks depending on your requirements. The geekier amongst us could even take these formulas and some costs per disk and hit up Excel Goal Seek to find the optimal level, but thats more than this little blog can do for you today. Before I wrap up this post, I want to leave you with a few more links that I have bookmarked around the topics of IOPS and RAID over the past several years:

DB sizing for Microsoft Operations Manger, includes a nice chart with formulas similar to the ones I provided in this article: http://blogs.technet.com/jonathanalmquist/archive/2009/04/06/how-can-i-gaugeoperations-manager-database-performance.aspx

An Experts Exchange post with some good info in the last entry on the page (subscription required) http://www.expertsexchange.com/Storage/Storage_Technology/Q_22669077.html

A Microsoft TechNet article with storage sizing for Exchange a bit dated but still applicable: http://technet.microsoft.com/en-us/library/aa997052(EXCHG.65).aspx A simple whitepaper from Dell on their MD1000 DAS array easy language to help the less technical along: http://support.dell.com/support/edocs/systems/md1120/multlang/whitepaper/SAS%20 MD1xxx.pdf

A great post that uses some math to show performance and cost trade-offs of RAID level, disk type, and spindle speed. http://www.yonahruss.com/architecture/raid-10-vs-raid-5performance-cost-space-and-ha.html

Another nifty post that looks at cost vs. performance vs capacities of various disks speeds in an array http://blogs.zdnet.com/Ou/?p=322

Storage Basics Part IV: Interface


by JOSHUA TOWNSEND on JANUARY 26, 2010 5 COMMENTS in STORAGE, STORAGE BASICS, VMW ARE In parts I, II, and III of the Storage Basics series we looked at the basic building blocks of modern storage systems: hard disk drives. Specifically, we looked at the performance characteristics of disks in terms of IOPS and the impact of combining disks into RAID sets to improve performance and resiliency. Today we will have a quick look at another piece of the puzzle that impacts storage performance: the interface. The interface, for lack of a better term, can describe several things in a storage conversation. It can be let me break it down for you (remember, were keeping it simple here). At the most basic level (assume a direct-attached setup), interface can be used to describe the physical connections required to connect a hard drive to a system (motherboard/controller/array). The interface extends beyond the disk itself, and includes the controller, cabling, and disk electronics necessary to facility communications between the processing unit and the storage device. Perhaps a better term for this would be intra-connect as this is all relative to the storage bus. Common interfaces include IDE, SATA, SCSI, SAS, and FC. Before data reaches the disk platter (where it is bound by IOPS), it must pass through the interface. The standards bodies that define these interfaces go beyond the simple physical form factor; they also define the speed and capabilities of the interface, and this is where we find another measure of storage performance: throughput. The speed of the interface is the maximum sustained throughput (transfer speed) of the interface and is often measured in Gbps or MBps. Here are the interface speeds for the most common storage interfaces:

Interface IDE SATA SCSI SAS FC 100MBps or 133MBps 1.5Gbps or 3.0Gbps (6.0Gbps is coming)

Speed

160MBps (Ultra-160) and 320MBps (Ultra-320) 1.5Gbps or 3.0Gbps (6.0Gbps is coming) 1Gb, 2Gb, 4Gb, or 8Gb (Duplex throughput rates are 200MBps, 400MBps, 800MBps, and 1600MBps respectively)

If we take these speeds at face value, we see that a 320MBps SCSI and a 2Gbps FC are not too different. If you dig a bit deeper you will soon find that simple speed ratings are not the end of the story. For example, FC throughput can be impacted by the length and type of cable (fiber channel can use twisted pair copper in addition to fiber optic cables). Also, topologies can limit speeds serial connected topologies are more efficient than parallel on the SCSI side, and arbitrated loops can incur a penalty on the FC side. The specifications of each interface type also define capabilities such as the protocol that can be used, the number of devices allowed on a bus, and the command set that can be used in communications on a storage system. For example, SATA native command queuing (NCQ) can offer a performance increase over parallel ATAs tagged command queuing with other factors held constant. Because of this, you might also see some performance implications of connecting a SATA drive to a SAS backplane, as the SAS backplane translates SAS commands to SATA. If we move away from the direct-connect model, and into a shared storage environment that you might use in a VMware-virtualized environment, the interface takes on an additional meaning. You certainly still have the bus interface that connects your disks to a backplane. Modern arrays typically use SAS or FC backplanes. If you have multiple disk enclosures, you also have an interface that connects each disk shelf to the controller/head/storage processor, or to an adjacent tray of disks. For example, EMC Clariions use a copper fiber channel cable in a switched fabric to connect disk enclosures to the backend of the storage processors. If we move to the front-end of the storage system, interface describes the medium and protocol used by initiating systems (servers) when connecting to the target SAN. Typical frontend interface mediums on a SAN are Fiber Channel (FC) and Ethernet. Front-end FC interfaces come in the standard 2Gb, 4Gb, or 8Gb speeds, while Ethernet is 1Gbps or 10Gbps. Many storage arrays support multiple front-end ports which can be aggregated for increased bandwidth, or targeted by connecting systems using multi-pathing software for increased concurrency and failover. Various protocols can be sent over these mediums. VMware currently supports Fiber Channel Protocol (FCP) on FC, and iSCSI and NFS on Ethernet. FC and iSCSI are block-based protocols that utilize encapsulated SCSI commands. NFS is a NAS protocol. Fiber Channel over Ethernet (FCoE) is also available on several storage arrays, sending FCP packets across Ethernet. Determining which interface to use on both the front-end and back-end of your storage environment requires an understanding of your workload and your desired performance levels. A post on workload characterization is coming in this series, so I wont get too deep

now. I will, however, provide a few rules of thumb. First, capture performance statistics: using Windows Perfmon, look at Physical Disk|Disk Read Bytes/sec or Disk Write Bytes/sec), or check out stats in your vSphere Client if you are already virtualized.

If you require low latency, use fiber channel. If your throughput is regularly over 60MBps, you should consider fiber channel connected hosts. iSCSI or NFS are often a good fit for general VMware deployments.

There is a ton of guidance and performance numbers available when it comes to choosing the right interconnect for a VMWare deployment, and a ton of variables that impact performance. Start with this whitepaper from VMware:http://www.vmware.com/resources/techresources/10034. For follow up reading, check out Duncan Eppings post with a link to a NetApp comparison of FC, iSCSI, and NFS:http://www.yellow-bricks.com/2010/01/07/fc-vs-nfs-vs-iscsi/. If you are going through a SAN purchase process, ask your vendor to assist you in collecting statistics for proper sizing of your environment. Storage vendors (and their resellers) have a few cool tools for collecting and analyzing statistics dont be afraid to ask questions on how they use those tools to recommend a configuration for you. Ive kept this series fairly simple. Next up in this series is a look at cache, controllers and coalescing. With the next post well start to get a bit more complex and more specific to VMware and Tier 1 workloads, both virtual and physical. Thanks for reading!

Storage Basics Part VI: Storage Workload Characterization


by JOSHUA TOWNSEND on APRIL 8, 2010 6 COMMENTS in STORAGE, STORAGE BASICS, VMW ARE Most of what I covered in Storage Basics Parts 1 through 5 was at a very elementary level. The math I used to do IOPS calculations, for example, is only true under very certain conditions. RAID controllers implement caching and other techniques that skew the simple math that I provided. I mentioned that the type of interface that you ought to use on your storage array should not be randomly chosen. In fact, choosing the right array with the appropriate components and characteristics can only be done when you enlighten your decision with a characterization of workloads it will be running. The character of your storage workload can be broken down into several traits random vs. sequential I/O, large vs. small I/O request size, read vs. write ratio, and degree of parallelism. The traits of your particular workload dictate how it interacts with the components of your storage system and ultimately determine the performance of your environment under a given configuration. There is an excellent whitepaper available from VMware entitled Easy and Efficient Disk I/O Workload Characterization inVMware ESX Server that is authoritative on this subject. If you want to get down and dirty with the topic, its a good read. Im aiming for something a bit less academic. With that said, lets break down workload characterization a bit so as to better understand how it will impact your real-world systems. Random vs. Sequential Access In Part II of this series we looked at the formula for calculating IOPS capabilities for a single disk. That formula goes something like this: IOPS = 1000/(Seek Latency + Rotational Latency) Youll recall that we divide into 1000 to remove milliseconds from the equation, leaving (Seek Latency + Rotational Latency) as the important part of the equation. Rotational latency is based on the spindle speed of the disk 7.2k, 10k, or 15k RPM for standard server or SAN disks. If we consider the same Seagate Cheetah 15k drive from Part II, we see that rotational latency is 2.0ms. The only way to change rotational latency is to buy faster (or slower) disks. This essentially leaves seek latency as the only variable that we can adjust. Youll also recall that seek latency was the larger of the latencies (3.4ms for read seeks, and 3.9ms for write seeks) and counts more against IOPS capability than does rotational latency. Seeking is the most expensive operation in terms of performance. It is next to impossible to adjust seek latency on a disk because it is determined by the speed of the servos that move the heads across the platter. We can, however, send workloads with different degrees of randomness to the platter. The more sequential a workload is, the less time

that will be spent in seek operations. A high degree of sequentiality ultimately leads to faster disk response and higher throughput rates. Sequential workloads may be candidates for slower disks or RAID levels. Conversely, workloads that are highly randomized ought to be placed on fast spindles in fast RAID configurations. Youll notice that I said it was next to impossible to adjust seek latency on a disk. While not common, some storage administrators employ a method know as short stroking when configuring storage. Short stroking uses less than the full capacity of the disk by placing data at the beginning of the disk where access is faster, and not placing data at the end of the disk where seeks times are greater. This results in a smaller area on the disk platter for heads to travel over, effectively reducing seek time at the expense of capacity. While not applicable to all workloads, storage arrays, or file systems, fragmentation can cause higher degrees of randomness leading to degraded performance. This is the prime reason some vendors recommend that you regularly defragment your file system. It should be noted that a VMware VMFS file system is resilient against the forces of fragmentation. Whereas a Windows NTFS partition may hold hundreds, thousands or tens of thousands of files of different sizes, accessed randomly throughout the systems cycle of operations, a VMFS datastore typically holds no more than a couple hundred files. Additionally, most of the files on a VMFS datastore are created contiguously if you are using thick-provisioned virtual disks (VMDK). Thin-provisioned VMDKs are slightly more susceptible to fragmentation, but do not typically suffer a high enough degree of fragmentation to register a performance impact. See this VMware whitepaper for more on VMFS fragmentation: Performance Study of VMware vStorage Thin Provisioning.

Examples of sequential workloads include backup-to-disk operations and the writing of SQL transaction log files. Random workloads may include collective reads from Exchange Information Stores or OLTP database access. Workloads are often a mix of random and sequential access, as is the case with most VMware vSphere implementations. The degree to which they are random or sequential dictates the type of tuning you should perform to obtain the best possible performance for your environment. I/O Request Size I/O request size is another important factor in workload characterization. Generally speaking, larger reads/writes are more efficient than smaller I/O to a certain point. The use of larger I/O requests (64KB instead of 2KB, for example) can result in faster throughput and reduced processor time. Most workloads do not allow you to adjust your I/O request size. However,

knowing your I/O request size can help with appropriate configuration of certain parameters such as array stripe size and file system cluster size. Check with your storage vendor for more information as it pertains to your specific configuration. If you are in a Windows shop, you can use perfmon counters such as Avg. Disk Bytes/Read to determine average I/O size. If you are running a VMware-virtualized workload, you can take advantage of a great tool vscsiStats to identify your I/O request size. More on vscsiStats later in this article. Read vs. Write Every workload will display a differing amount of read and write activity. Sometimes a specific workload, say Microsoft Exchange, can be broken down into sub-workloads for logging (writeheavy) and reading the database (read-heavy). Understanding the read-to-write ratio may help with designing the underlying storage system. For example, a write-heavy workload may perform better on a RAID10 LUN than a RAID5 array due to the write penalty associated with RAID5. The ratio of read:write may also dictate caching strategies. The read:write ratio, when combined with a degree of randomness measure, can be quite useful in architecting your storage strategy for a given application or workload. Parallelism/Outstanding I/Os Some workloads are capable of performing multi-threaded I/O. These types of workloads can place a higher amount of stress on the storage system and should be understood when designing storage, both in terms of IOPS and throughput. Multipathing may help with multithreaded I/O workloads. A typical VMware vSphere environment is a good example of a workload capable of queuing up outstanding I/O. Measuring the Characteristics of Your Workload So how do we actually characterize storage workloads? Start with the application vendor many have published studies that can shed light on specific storage workloads in a standard implementation. If you are interested in measuring your own for planning/architecture reasons, or performance troubleshooting reasons, read on. There are several tools to measure storage characteristics, depending on your operating system and storage environment. Standard OS performance counters, such as Windows Performance Monitor (perfmon) can reveal some of the characteristics. Array based tools such as NaviAnalyzer on EMC gear can also reveal statistics on the storage end of the equation. One of the most exciting tools for storage workload characterization comes from VMware in the form of vscsiStats. vscsiStats is a tool that has been included in VMware ESX server since

version 3.5. Because all I/O commands pass through the Virtual Machine Monitor (VMM), the hypervisor can inspect and report on the I/O characteristics of a particular workload, down to a unique VM running on an ESX host. There is a ton of great information on using vscsiStats, so I wont re-hash it all here. I recommend starting with Using vscsiStats for Storage Performance Analysis as it contains an overview and usage instructions. If you want to dig a bit deeper into vscsiStats, read both Storage Workload Characterization and Consolidation in Virtualized Environments and vscsiStats: Fast and Easy Disk Workload Characterization on VMware ESX Server. vscsiStats can generate an enormous amount of data which is best viewed as a histogram. If youre a glutton for punishment, the data can be reviewed manually on the COS. To extract vscsiStat output data, use the -c option to export to a .csv file. From there you can analyze the data and create histograms using Excel. Paul Dunn has a nifty Excel macro for analyzing and reporting on vscsiStats output here. Gabrie van Zanten more detailed instructions for using Pauls macro here. Here are a couple histogram examples that I just generated from a test VM.

vscsiStats is only included with ESX, not ESXi. However, Scott Drummond was kind enough to post a download of vscsiStats for ESXi on his Virtual Pivot blog: http://vpivot.com/2009/10/21/vscsistats-for-esxi/. Using vscsiStats on ESXi requires dropping into Tech Support Mode (unsupported) and enabling ESXi for scp to transfer the binary to the ESXi server. VMware esxtop can display some information but is limited in scope and does not currently support NFS. A community-supported python script called nfstop can parse vscsiStats data and display esxtop-like data per VM on screen. Experiment

If you are interested in generating workloads with various characteristics, check out Iometer and Bonnie++. These tools will allow you to generate I/O that you can monitor with the tools I covered in this article. Put it to Use If you are provisioning a new workload or expanding an existing, invest some time in understanding your storage workload characteristics and convey those characteristics to your storage team. A request for storage that includes the workload characteristics I discussed here, as well as expected IOPS requirements, will go much further in ensuring performance for your applications physical or virtual than simply asking for a certain capacity of disk.

You might also like