You are on page 1of 20

Introduction to Storage Technologies

Worldwide Field Readiness

Introduction to Storage Technologies

Overview

This document is an introduction to Disk Storage technologies and its terminology. It discusses various different types of storage, focuses on the ubiquitous storage method the hard disk drive, then discusses how this type of storage has evolved into a fault-tolerant system of networked storage. Different terminology is described in some detail, such as a Redundant Array of Independent Disks (RAID), Network-attached Storage (NAS) and Storage Area Networks (SANs), and the different infrastructure needs.

Target Audience
This document has been written for information technology (IT) specialists who are responsible for planning and designing infrastructures that include an existing storage infrastructure and therefore want to quickly understand the salient concepts and terminology. These specialists include consultants, internal IT architects, and others who may need this information.

Table of Contents
Overview ................................................................................................................................................................................. 1 Target Audience .................................................................................................................................................................. 1 General .................................................................................................................................................................................... 4 Types of Storage.................................................................................................................................................................. 4 Hard Disk ............................................................................................................................................................................. 4 Host Controller Card ........................................................................................................................................................... 6 Types of Disk Interfaces ...................................................................................................................................................... 6 Abstraction & Storage Networks ............................................................................................................................................ 7 Fault Tolerance and RAID.................................................................................................................................................... 7 Directly Attached Storage (DAS) ......................................................................................................................................... 8 Storage Networking Protocols ............................................................................................................................................ 9 Fibre Channel Infrastructure ............................................................................................................................................. 10 Fibre Channel Host Bus Adapters ..................................................................................................................................... 11 iSCSI over Gigabit Ethernet ............................................................................................................................................... 12 Storage Architectures ........................................................................................................................................................... 14 DAS, NAS & SAN ................................................................................................................................................................ 14 Hybrid ................................................................................................................................................................................ 15 Tiered storage ................................................................................................................................................................... 15 Storage Admission Tier (SAT) ............................................................................................................................................ 16 File Area Networks (FAN).................................................................................................................................................. 16 Network Storage Fault-tolerance ......................................................................................................................................... 16 SAN Multipath I/O ............................................................................................................................................................. 16 Storage Replication ........................................................................................................................................................... 17 RPO & RTO ........................................................................................................................................................................ 17 Snapshots .......................................................................................................................................................................... 18 Terminology .......................................................................................................................................................................... 19 Thin Provisioning & Over-Allocation ................................................................................................................................. 19 LUN Masking ..................................................................................................................................................................... 19 Data De-duplication .......................................................................................................................................................... 19

General
Types of Storage
There are many types of storage media: Flash, which has become a cheap form of fast storage, especially in consumer products. Optical storage, which comes in the form of CDs, DVDs and BluRay etc. These are slow for data access, but still very useful for archives and movies. Magnetic Tape backup systems, which are still in use in corporate IT centers, but are sl ow, and arent good for random access. Random access refers to the ability to (effectively) access any piece of data by its address (e.g. block number on a hard disk see below) instantly. Magnetic or hard disks, which are discussed at length below, are a ubiquitous form of high-volume, high-speed, random-access storage.

A solid-state drive (SSD) is a data storage device that uses solid-state memory to store persistent data. Unlike flash-based memory cards, an SSD emulates a hard disk drive, thus easily replacing it in most applications. An SSD using SRAM or DRAM (instead of flash memory) is often called a RAM drive. The advantage over (magnetic) disk drives is speed, but the cost per gigabyte is 4 to 5 times that of disk drives, and at the moment the amount of storage is much less per unit. These types of storage can either be static or removable thanks to the ubiquity of USB and firewire. In this paper well be exclusively talking about magnetic disk (as in hard-drive) storage.

Hard Disk

Hard disks are composed of: Multiple spinning magnetic platters that contain the magnetically encoded data. The platters spin around an axle called a spindle, and sometimes a single drive (like the one shown) can be referred to as such. Read/write heads that float above the surface of the platters, usually 2 (top and bottom) for each platter. The heads all move in unison. A controller board that drives the heads and can convert I/O requests (commands) into head movement and read/write operations. An interface that will be joined to a host adapter board, which for an internal (to a computer) drive, is a separate unit.

The drive will need power which, for an internal drive, is usually supplied by the computers own power supply.

The disk drive is covered and hermetically sealed, since heads are designed to float above the disk platter with less than a micron of space.

Each surface of the disk platter is divided into areas as shown. In the context of data, the word block can have many different meanings, but in this context a block is the smallest unit of data that is read and written. A block is also called (more formally) a track sector or simply sector as shown by (C). A series of sectors makes up a track (A), and there are several tracks on a surface. At the lowest level when a block (or segment) is being read or written, there are several things that identify where that block goes: The surface (identified by the head number), the track that the head should move to, and the sector that should be read or written. This lowest level of data s torage is called block-level storage and implies that the data is composed of a series of bits, with the drive having no notion of format, or what the data belongs to. In the operating system (OS), there will be device drivers, file systems and applications that impose the meaning on, and keep tabs on the individual blocks of data. This high level of data storage is called file-level storage. Seek time is one of the three delays associated with reading or writing data on a disk drive. The others are the rotational delay of the disk, and transfer time. Their sum is the access time. In order to read or write data in a particular sector, the head of the disk needs to be physically moved to the correct place. This process is known as seeking, and the time it takes for the head to move to the right place is the seek time. Seek time for a given disk varies depending on how far the head's destination is from its origin at the time of each read or write; usually one discusses a disk's average seek time.

Host Controller Card

For disk drives that are to be installed internally to a computer (such as a server), the interface from the disk will be cabled to a Host Controller Card (or simply controller). Depending on the type of controller, this can usually accommodate multiple disk drives. You may also be able to plug in external devices to an external-facing interface. The controller will usually plug straight into a slot on the computer motherboard and draw its power from there. This will also let the CPU talk to the host adapter and disks through the system bus. Sometimes you may find the controller actually integrated into the computers motherboard.

Types of Disk Interfaces

The predominant interfaces for disk drives are: Advanced Technology Attachment (ATA): This is more common in home computing, and can support different kinds of devices, such as hard drives and DVD burners. There is a restriction of 2 devices per cable. This is also known as a parallel ATA, since there is a wire for each bit of information in the interface cable, making it very wide. The disk drives that are attached to an ATA host adapter are usually called IDE drives (Integrated Drive Electronics). Serial ATA (SATA): Is an enhancement to ATA, which allows for changing drives without shutting down (hot swap), faster transfer speeds, and thinner cabling.

Generally disks that attach either through ATA or SATA, have their disk platters spinning at a constant speed of 7,200 revolutions per minute (RPM). Remember that the disk spin speed is one important measure of the disks access time. The other common interfaces are: Small Computer System Interface (SCSI): An interface standard that is not compatible with ATA or IDE drives. Modern versions of SCSI affords up to 16 devices per cable including the host adapter. Although the layout looks like ATA, none of the components are interchangeable. Serially Attached SCSI (SAS): A point-to-point serial protocol that replaces the parallel SCSI bus technology mentioned above. It uses the standard SCSI command set, but is currently not faster than parallel SCSI. In the future, speeds are expected to double, and there will also be the ability to use certain (slower) SATA drives on a SAS bus.

SCSI disks usually spin at 10,000 or 15,000 RPM. Because of this, and the more complicated electronics, SCSI components are much more expensive than S/ATA. However, SCSI disks are renowned for their speed of access, and data transfer.

Abstraction & Storage Networks


Fault Tolerance and RAID

Because disk drives are sophisticated mechanical devices, when they fail they tend to take all the data with them. RAID defines several types of redundancy and efficiency enhancements by clistering commonly available disks. For example: RAID 0: Striped set no parity. Striping is where each successive block of information is written to alternate disks in the array. RAID 0 still suffers from a single disk failure in the array, but is often used to get the increased readspeed. The increase in read-speed comes from being able to simultaneously move the disk read/write heads for the different drives containing the sequential block to be read. Write speeds may also improve, since each sequential blocks can be written at the same time to the different disks in the array. RAID 1: Mirroring, no parity. Mirroring is where each block is duplicated across all disks in the array. Here, any one disk failure will not impact data integrity. Better read speeds are achieved by using the drive whose read/write head is closest to the track containing the block to be read. There is generally no improvement in write speeds. RAID 5: Striped set with distributed parity. The advantage here is that the data from one drive can be rebuilt with the parity information contained on the other drives. RAID 5 can only afford 1 drive to fail.

These are the most common RAID levels, but there are other RAID levels, and indeed combinations of levels that can be configured. http://en.wikipedia.org/wiki/RAID RAID can be implemented in the host controller, or built into the operating system. Either way, with RAID we are beginning to see an abstraction of a physical disk into a logical one. For example, with RAID 1, if we decided to use 2 identical 100GB disks for mirroring, this would ultimately end up as a 100GB (not 200GB) logical disk to the OS. So: Traditionally disk drives are either called Physical Volumes (PV) or Logical Volumes (LV), depending on where, in the infrastructure, youre talking about. A PV can be split up into partitions, where each partition can also look, to the operating system, like an individual PV. A LUN (logical unit number) comes from the SCSI protocol, but more generally refers to an LV in storage terminology. On some systems, Physical Volumes can be pooled into Volume Groups (VG), from which Logical Volumes can be created. In this case a Logical Volume may stretch across many different sizes and types of Physical Disks, and take advantage of RAID. In a Linux system, this software management of disk storage is called the Logical Volume Manager (LVM).

Directly Attached Storage (DAS)

It didnt take long to see the appearance of disk cabinets or Disk Arrays, connected to servers via an external SCSI cable, that were separately managed. In some cases these storage cabinets could be connected to multiple servers, so they could share the storage (perhaps for fault tolerance). Also, being able to hot swap failed disks and have the unit rebuild that disk from parity on the other disks was an expected feature. This lead to the acronym DAS, or Directly Attached Storage (actually the acronym was coined in more recent times to distinguish it from other technologies). The main technologies used with DAS are SCSI with a specialized Host Bus Adapters (HBA) installed in the servers. (More on HBAs later). A DAS afforded multiple server-access (up to 4), for clustering but the main disadvantage was that DAS ended up yielding an island of information.

Storage Networking Protocols

Since the length of a SCSI cable is very limited, there came a need for low-level access to storage over networks. In effect, the equivalent of stretching the permissible distance of the SCSI cable to much larger distances. This led to the advancements in Storage Networking Protocols. These protocols are the same block-level SCSI commands that go over the interface cables of a disk, and have no knowledge of how clusters of blocks are aggregated (or used) by the OS to give us a file system. This gives us a network of disk appliances, where each appliance is a fault-tolerant disk array with its own management interface. The two predominant networking protocols used for Storage Networks are the Fibre Channel Protocol (FCP) and iSCSI (over Gigabit Ethernet). In these cases, both the Fibre Channel and the Gigabit Ethernet infrastructures are used to carry SCSI commands over the network. iSCSI uses TCP/IP, whereas FCP has its own 5-layer stack definition.

Gigabit Interface Converter (GBIC)

Often, in the physical implementation, port connections are made through a Gigabit Interface Converter (GBIC). A GBIC is a standard for transceivers, commonly used with Gigabit Ethernet and Fibre Channel (explained below). By offering a standard, hot swappable electrical interface, a one gigabit Ethernet port, for example, can support a wide range of physical media, from copper to long-wave single-mode optical fiber, at lengths of hundreds of kilometers.

10

Fibre Channel
Fibre or Fiber?
Fibre Channel was originally designed to support fiber optic cabling only. When copper support was added, the Fibre Channel committee decided to keep the name in principle, but to use the UK English spelling (Fibre) when referring to the standard. Fibre Channel can use either optical fiber (for distance) or copper cable links (for short distance at low cost). However, fiber-optic cables enjoys a major advantage in noise immunity

Fibre Channel Infrastructure

Fibre Channel, or FC, is a gigabit-speed network technology primarily used for storage networking, using Fibre Optics. There are 3 topologies that can be used: Point-to-Point (FC-P2P). Two devices are connected back to back. This is the simplest topology, with limited connectivity. Arbitrated loop (FC-AL). In this design, all devices are in a loop or ring, similar to token ring networking. Adding or removing a device from the loop causes all activity on the loop to be interrupted. The failure of one device causes a break in the ring. Fibre Channel hubs exist to connect multiple devices together and may bypass failed ports. A loop may also be made by cabling each port to the next in a ring. A minimal loop containing only two ports, while appearing to be similar to FC-P2P, differs considerably in terms of the protocol. Switched fabric (FC-SW). All devices or loops of devices are connected to Fibre Channel switches, similar conceptually to modern Ethernet implementations. The switches manage the state of the fabric, providing optimized interconnections.

FC-SW is the most flexible topology, enabling all servers and storage devices to communicate with each other. It also provides for failover architecture if a server or disk array fails. FC-SW involves one or more intelligent switches, each providing multiple ports for nodes. Unlike FC-AL, FC-SW bandwidth is fully scalable, i.e. there can be any number of

11

8Gbps (Gigabits per second) transfers operating simultaneously through the switch. In fact, if using full-duplex, each connection between a node and a switch port can use 16Gbps bandwidth. Because switches can be cascaded and interwoven, the resultant connection cloud has been called the fabric. http://en.wikipedia.org/wiki/Fibre_Channel

Fibre Channel Host Bus Adapters

Fibre Channel HBAs are available for all major open systems, computer architectures, and buses. Some are OS dependent. Each HBA has a unique Worldwide Name (WWN, or WWID for Worldwide Identifier), which is similar to an Ethernet MAC address in that it uses an Organizationally Unique Identifier (OUI) assigned by the IEEE. However, WWNs are longer (8 bytes). There are two types of WWNs on an HBA: a node WWN (WWNN), which is shared by all ports on a host bus adapter, and a port WWN (WWPN), which is unique to each port. Some Fibre Channel HBA manufacturers are Emulex, LSI, QLogic and ATTO Technology.

Fibre Ports
The basic building block of the Fibre Channel is the port: N_Port: This is a node port that is not loop capable. It is used to connect an equipment port to the fabric. NL_Port: This is a node port that is loop capable. It is used to connect an equipment port to the fabric in a loop configuration through an L_Port or FL_Port. FL_Port: This is a fabric port that is loop capable. It is used to connect an NL_Port to the switch in a public loop configuration. L_Port: This is a loop-capable node or switch port.

E_Port: This is an expansion port. A port is designated an E_Port when it is used as an inter-switch expansion port (ISL) to connect to the E_Port of another switch, to enlarge the switch fabric. F_Port: This is a fabric port that is not loop capable. It is used to connect an N_Port point-point to a switch. G_Port: This is a generic port that can operate as either an E_Port or an F_Port. A port is defined as a G_Port after it is connected but has not received a response to loop initialization or has not yet completed the link initialization procedure with the adjacent Fibre Channel device. U_Port: This is a universal porta more generic switch port than a G_Port. It can operate as either an E_Port, F_Port, or FL_Port. A port is defined as a U_Port when it is not connected or has not yet assumed a specific function in the fabric. MTx_Port: CNT port used as a mirror for viewing the transmit stream of the port to be diagnosed.

12

MRx_Port: CNT port used as a mirror for viewing the receive stream of the port to be diagnosed.
SD_Port: Cisco SPAN port used for mirroring another port for diagnostic purposes.

Fibre Channel Zoning


Zoning allows for finer segmentation of the Fibre Channel fabric. Zoning can be used to instigate a barrier between different environments. Only the members of the same zone can communicate within that zone and all other attempts from outside are rejected. Zoning could be used for: Separating LUNs between Windows and other operating systems to avoid data corruption Security Test & maintenance Managing different user groups and objectives Zoning can be implemented in one of two ways: Hardware: Hardware zoning is based on the physical fabric port number. The members of a zone are physical ports on the fabric switch. It can be implemented in the following configurations: One-to-one One-to-many Many-to-many Software: Software zoning is implemented by the fabric operating systems within the fabric switches. They are almost always implemented by a combination of the name server and the Fibre Channel Protocol. When a port contacts the name server, the name server will only reply with information about ports in the same zone as the requesting port. A soft zone, or software zone, is not enforced by hardware (i.e. hardware zoning). Usually, the zoning software also allows you to create symbolic names for the zone members and for the zones themselves. Dealing with the symbolic name or aliases for a device is often easier than trying to use the WWN address.

iSCSI
iSCSI over Gigabit Ethernet
Ethernet has evolved into the most widely implemented physical and link layer protocol today. Fast Ethernet increased speed from 10 to 100 megabits per second (Mbit/s). Gigabit Ethernet was the next iteration, increasing the speed to 1000 Mbit/s. In the marketplace full-duplex with switches is the norm. There are four different physical layer standards for gigabit Ethernet: Optical fiber (1000BASE-X) Twisted pair cable (1000BASE-T) Balanced copper cable (1000BASE-CX). iSCSI (RFC3720) is a mapping of the regular SCSI protocol over TCP/IP, more commonly over Gigabit Ethernet. Unlike Fibre Channel, which requires special-purpose cabling, iSCSI can be run over long distances using an existing network infrastructure. TCP/IP uses a client/server model, but iSCSI uses the terms initiator (for the data consumer) and target (for the LUN). A Software initiator: Uses code to implement iSCSI, typically as a device driver. A hardware initiator mitigates the overhead of iSCSI, TCP processing and Ethernet interrupts, and therefore may improve the performance of servers that use iSCSI. An iSCSI host bus adapter (HBA) implements a hardware

13

initiator and is typically packaged as a combination of a Gigabit Ethernet NIC, some kind of TCP/IP offload technology (TOE) and a SCSI bus adapter (controller), which is how it appears to the operating system.

iSCSI Naming & Addressing


Each initiator or target is known by an iSCSI Name which is independent of the location of the initiator and target. iSCSI Names are used to provide: An initiator identifier for configurations that provide multiple initiators behind a single IP address. A target identifier for configurations that present multiple targets behind a single IP address and port. A method to recognize multiple paths to the same device on different IP addresses and ports. An identifier for source and destination targets for use in third-party commands. An identifier for initiators and targets to enable them to recognize each other regardless of IP address and port mapping on intermediary firewalls. The initiator presents both its iSCSI Initiator Name and the iSCSI Target Name to which it wishes to connect in the first login request of a new session. The only exception is if a discovery session is to be established; the iSCSI Initiator Name is still required, but the SCSI Target Name may be ignored. The default name "iSCSI" is reserved and is not used as an individual initiator or target name. iSCSI Names do not require special handling within the iSCSI layer; they are opaque and case-sensitive for purposes of comparison. iSCSI provides three name-formats: iSCSI Qualified Name (IQN), format: iqn.yyyy-mm.{reversed domain name} o iqn.2001-04.com.acme:storage.tape.sys1.xyz o iqn.1998-03.com.disk-vendor.diskarrays.sn.45678 o iqn.2000-01.com.gateways.yourtargets.24 o iqn.1987-06.com.os-vendor.plan9.cdrom.12345 o iqn.2001-03.com.service-provider.users.customer235.host90 Extended Unique Identifier (EUI), format: eui.{EUI-64 bit address} o eui.02004567A425678D T11 Network Address Authority (NAA), format: naa.{NAA 64 or 128 bit identifier} o naa.52004567BA64678D IQN format addresses occur most commonly, and are qualified by a date (yyyy-mm) because domain names can expire or be acquired by another entity. iSCSI nodes (i.e. the machine that contains the LUN targets) also have addresses. An iSCSI address specifies a single path to an iSCSI node and has the following format: <domain-name>[:<port>] Where <domain-name> can be either an IP address, in dotted decimal notation or a Fully Qualified Domain Name (FQDN or host name). If the <port> is not specified, the default port 3260 will be assumed.

iSCSI Security
To ensure that only valid initiators connect to storage arrays, administrators most commonly run iSCSI only over logicallyisolated backchannel networks. For authentication, iSCSI initiators and targets prove their identity to each other using the CHAP protocol, which includes a mechanism to prevent cleartext passwords from appearing on the wire. Additionally, as with all IP-based protocols, IPsec can operate at the network layer. Though the iSCSI negotiation protocol is designed to accommodate other

14

authentication schemes, interoperability issues limit their deployment. An initiator authenticates not to the storage array, but to the specific storage asset (target) it intends to use. For authorization, iSCSI deployments require strategies to prevent unrelated initiators from accessing storage resources. Typically, iSCSI storage arrays explicitly map initiators to specific target LUNs.

iSCSI Zoning
Though there really isnt a zoning protocol associated with iSCSI, VLANs can be leveraged to accomplish the segregation needed. http://en.wikipedia.org/wiki/Vlan

Storage Architectures
DAS, NAS & SAN

The emergence of these Storage Networking Protocols has led to the development of different types of storage architectures, depending on the needs:

15

We talked earlier about the Directly Attached Storage (DAS). Network Attached Storage (NAS): First conceived by Novell, but more commonly used in MS LAN Manager (CIFS), and NFS (predominant in the UNIX/Linux worlds), all serve up file shares. These days its more common to see a NAS appliance, which is essentially a self-contained computer connected to a network, with the sole purpose of supplying file-based data storage services to other devices on the network. Due to its multiprotocol nature, and the reduced CPU and OS layer, a NAS appliance as such has its limitations compared to the FC/GbE systems. This is known as file-level storage. Storage Area Network (SAN) is an architecture to attach remote storage devices (such as disk arrays, tape libraries and optical jukeboxes) to servers in such a way that, to the OS, the devices appear as locally attached. That is, the storage acts to the OS like it was attached with an interface cable to a locally installed host adapter. This is known as block-level storage.

Interestingly, Auspex Systems was one of the first to develop a dedicated NFS appliance for use in the UNIX market. A group of Auspex engineers split away in the early 1990s to create the integrated NetApp filer, which supported both CIFS for Windows and NFS for UNIX, and had superior scalability and ease of deployment. This started the market for proprietary NAS devices.

Hybrid
What if the NAS uses the SAN for storage? A NAS head refers to a NAS which does not have any on-board storage, but instead connects to a SAN. In effect, it acts as a translator between the file-level NAS protocols (NFS, CIFS, etc.) and the block-level SAN protocols (Fibre Channel Protocol, iSCSI). Thus it can combine the advantages of both technologies.

Tiered storage
Tiered storage is a data storage environment consisting of two or more kinds of storage delineated by differences in at least one of these four attributes: Price, performance, capacity and function. In mature implementations, the storage architecture is split into different tiers. Each tier differs in the: Type of hardware used Performance of the hardware Scale factor of that tier (amount of storage available) Availability of the tier and policies at that tier A very common model is to have a primary tier with expensive, high performance and limited storage. Secondary tiers typically comprise of less expensive storage media and disks and can either host data migrated (or staged) by Lifecycle Management software from the primary tier or can host data directly saved on the secondary tier by the application servers and workstations if those storage clients did not warrant primary tier access. Both tiers are typically serviced by a backup tier where data is copied into long-term and offsite storage. In this context, you may hear two terms: ILM Information Lifecycle Management refers to a wide-ranging set of strategies for administering storage systems on computing devices. http://en.wikipedia.org/wiki/Information_Lifecycle_Management HSM Hierarchical Storage Management is a data storage technique which automatically moves data between high-cost and low-cost storage media. HSM systems exist because high-speed storage devices, such as hard disk drive arrays, are more expensive (per byte stored) than slower devices, such as optical discs and magnetic tape drives. http://en.wikipedia.org/wiki/Hierarchical_storage_management

16

Storage Admission Tier (SAT)


The goal of Storage virtualization is to turn multiple disk arrays, made by different vendors, scattered over the network, into a single monolithic storage device, which can be managed uniformly. The Storage Admission Tier (SAT) is a tier put in front of the primary tier, as the way into the storage. This affords a way to manage access and policies in a way that can virtualize the storage. SAT should conform to the Virtualize, Optimize & Manage paradigm (VOM): Virtualize: At the SAN layer, the amalgamation of multiple storage devices as one single storage unit greatly simplifies management of storage hardware resource allocation. At the NAS layer, the same degree of virtualization is needed to make multiple heterogeneous file server shares appear as at a more logical level, abstracting the NAS implementations from the application tier. Optimize: Can include things like compression, data de-duplication (http://en.wikipedia.org/wiki/Data_deduplication) and organizational decisions of data placement (which tier should the data be placed?) Management: To control policies, security and access control (including rights management) from the entry and exit point of the data to and from the storage network.

File Area Networks (FAN)


The combination of the Storage Access Tier (SAT), the Tiered Storage Model and NAS/SAN are known as the File Area Network (FAN). As of this writing, the concept of FAN cannot be seen in any mainstream products, but the concept is introduced for completeness.

Network Storage Fault-tolerance


SAN Multipath I/O

Multipath I/O is a fault-tolerance and performance enhancement technique whereby there is more than one physical path between a computer system and its mass storage devices through the buses, controllers, switches, and bridge devices connecting them.

17

In a well-designed SAN, it is likely that you will want a device to be accessed by the host application over more than one path in order to potentially obtain better performance, and to facilitate recovery in the case of adapter, cable, switch, or GBIC failure. Should one controller, port or switch fail, the servers OS can route I/O through the remaining controller transparently to the application, with no changes visible to the applications, other than perhaps incremental latency. But, the same logical volume within a storage device (LUN) may be presented many times to the server through each of the possible paths to that LUN. In order to avoid this and make the device easier to administrate and to eliminate confusion, multipathing software is needed. This is responsible for making each LUN visible only once from the application and OS point of view. In addition to this, the multipathing software is also responsible for failover recovery and load balancing: Failover recovery: In a case of the malfunction of a component involved in making the LUN connection, the multipathing software redirects all the data traffic onto other available paths. Load balancing: The multipathing software is able to balance the data traffic equitably over the available paths from the hosts to the LUNs.

There are different kinds of multipathing software available from different vendors.

Storage Replication

Depending on the details behind how the particular replication works, the application layer may or may not be involved. If blocks are replicated without the knowledge of file systems or applications built on top of the blocks being replicated, when recovering using these blocks, the file system may be in an inconsistent state. A Restartable recovery implies that the application layer has full knowledge of the replication, and so the replicated blocks that represent the applications are in a consistent state. This means that the application layer (and possibly OS) had a chance to quiesce before the replication cycle. A Recoverable recovery implies that some extra work needs to be done to the re plicated data before it can be useful in a recovery situation.

RPO & RTO


For replication planning, there are two important numbers to consider: Recovery Point Objective (RPO) describes the acceptable amount of data loss measured in time. For example: Assume that the RPO is 2-hours. If there is a complete replication at 10:00am and the system dies at 11:59am without a new replication, the loss of the data written between 10:00am and 11:59am will not be recovered from the replica. This amount of time data has been lost has been deemed acceptable because of the 2 hour RPO. This is the case even if it takes an additional 3 hours to get the site back into production. The production will

18

continue from the point in time of 10:00am. All data in between will have to be manually recovered through other means. The Recovery Time Objective (RTO) is the duration of time and a service level within which a business process must be restored after a disaster in order to avoid unacceptable consequences associated with a break in business continuity. The RTO attaches to the business process and not the resources required to support the process.

Snapshots

Even though snapshots where talked about in the context of replication, snaphots have their uses on the local systems as well. Typically a snapshot is not a copy, since that would take too long, but its a freezing of all the blocks in a LUN making them read-only at that point in time. Any logical block that needs to be updated, is allocated a new physical block, thus preserving the original snapshot blocks as a backup. Any new blocks are what take up new space, and are allocated for the writes after the snapshot took place. Allocating space in this manner can take substantially less space than taking a whole copy. Deleting of a snapshot can be done in the background, essentially freeing any blocks that have been updated since the snapshot. Snapshotting can be implemented in the management tools of the storage array, or built into the OS (such as Microsoft's Volume Snapshot Service VSS http://en.wikipedia.org/wiki/Volume_Shadow_Copy_Service). As with RAID, the advantage of building this functionality at the block-level is that it can be abstracted from the file systems that are built on top of the blocks. Being at this low level also has a drawback, in that when the snapshot is taken, the file systems (and hence applications) may not be in a consistent state. There is usually a need to quiesce the running machine (virtual or otherwise) before a snapshot is made. This implies that all levels (up to the application) should be aware that they reside on a snapshot-capable system.

19

Terminology
Thin Provisioning & Over-Allocation
[Thin provisioning is called sparse volumes in some contexts] In a storage consolidation environment, where many applications are sharing access to the same storage array, thin provisioning allows administrators to maintain a single free space buffer pool to service the data growth requirements of all applications. This avoids the poor utilization rates, often as low as 10%, that occur on traditional storage arrays where large pools of storage capacity are allocated to individual applications, but remain unused (i.e. not written to). This traditional model is often called fat provisioning. On the other hand, over-allocation or over-subscription is a mechanism that allows server applications to be allocated more storage capacity than has been physically reserved on the storage array itself. This allows flexibility in growth and shrinkage of application storage volumes, without having to predict accurately how much a volume will grow or contract. Physical storage capacity on the array is only dedicated when data is actually written by the application, not when the storage volume is initially allocated.

LUN Masking
Logical Unit Number Masking or LUN masking is an authorization process that makes a Logical Unit Number available to some hosts and unavailable to other hosts. The security benefits are limited in that with many HBAs it is possible to forge source addresses (WWNs/MACs/IPs). However, it is mainly implemented not as a security measure per se, but rather as protection against misbehaving servers from corrupting disks belonging to other servers. For example, Windows servers attached to a SAN will under some conditions corrupt non-Windows (Unix, Linux, NetWare) volumes on the SAN by attempting to write Windows volume labels to them. By hiding the other LUNs from the Windows server, this can be prevented, since the Windows server does not even realize the other LUNs exist. (http://en.wikipedia.org/wiki/LUN_masking)

Data De-duplication
This is an advanced form of data compression. Data de-duplication software as an appliance, offered separately or as a feature in another storage product, provides file, block, or sub-block-level elimination of duplicate data by storing pointers to a single copy of the data item. This concept is sometimes referred to as data redundancy elimination or single instance store. The effects of de-duplication primarily involve the improved cost structure of disk-based solutions. As a result, businesses may be able to use disks for more of their backup operations and be able to retain data on disks for longer periods of times, enabling restoration from disks.

20

Notice The information in this publication is subject to change without notice. THIS PUBLICATION IS PROVIDED AS IS WITHOUT WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NONINFRINGEMENT. CITRIX SYSTEMS, INC. (CITRIX), SHALL NOT BE LIABLE FOR TECHNICAL OR EDITORIAL ERRORS OR OMISSIONS CONTAINED HEREIN, NOR FOR DIRECT, INCIDENTAL, CONSEQUENTIAL OR ANY OTHER DAMAGES RESULTING FROM THE FURNISHING, PERFORMANCE, OR USE OF THIS PUBLICATION, EVEN IF CITRIX HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES IN ADVANCE. This publication contains information protected by copyright. Except for internal distribution, no part of this publication may be photocopied or reproduced in any form without prior written consent from Citrix. The exclusive warranty for Citrix products, if any, is stated in the product documentation accompanying such products. Citrix does not warrant products other than its own. Product names mentioned herein may be trademarks and/or registered trademarks of their respective companies. Copyright 2008 Citrix Systems, Inc., 851 West Cypress Creek Road, Ft. Lauderdale, Florida 33309-2009 U.S.A. All rights reserved.

Version History
Author Olivier Withoff Principal Technical Readiness Engineer Worldwide Field Readiness and Productivity Version 1.0 Change Log Initial Document Date August 27th, 2008

851 West Cypress Creek Road

Fort Lauderdale, FL 33309

954-267-3000

http://www.citrix.com

Copyright 2008 Citrix Systems, Inc. All rights reserved. Citrix, the Citrix logo, Citrix ICA, Citrix MetaFrame, and other Citrix product names are trademarks of Citrix Systems, Inc. All other product names, company names, marks, logos, and symbols are trademarks of their respective owners.

You might also like