You are on page 1of 22

Introduction to Storage

A-1

Introduction to Storage

A-2

DAS - Direct Attached Storage Products used include devices such as vanilla SCSI hard drives and on-board RAID arrays DAS connects directly to a single server. Clients on the network must have access to this server to use the storage device.

Server handles storage, retrieval of data files and applications such as email or databases
DAS uses a lot of CPU power and requires even more CPU resources for sharing with other machines. Purchasing too much storage in advance leads to an imbalance of storage between servers and a waste of storage resources. Lacks features like snapshots, replications, and management SAN Storage Area Network A network whose primary purpose is to transfer data between computer systems and storage elements as well as between storage elements. A SAN consists of a communication infrastructure that provides physical connections and a management layer that organizes the connections, storage elements, and computer systems so that data transfer is secure and robust.
Introduction to Storage I/O

The term SAN is usually (but not necessarily) identified with block services rather than file access services.

A-3

DAS characteristics Difficult to manage Limited functionality Poor asset utilization Trapped or captive storage Limited scalability Dell EqualLogics advantage: Intelligent storage platform Storage virtualization, load-balancing Multi-server access DAS is local to server Enterprise data services all included The financial, operational, and management benefits of SAN

Introduction to Storage

A-4

How a NAS Operates NAS devices identify data by file name and byte offsets, and transfer file data or metadata - file's owner, permissions, creation data, etc. NAS devices provide file sharing. File systems are managed by the NAS processor. Performance can be affected by file assembly/disassembly from block I/O operations. Not all applications are supported such as some back-up and anti-virus agents. Booting off NAS requires complex software and is not supported by many operating systems including Microsoft Windows. Characteristics Suitable for applications like file sharing (NFS/CIFS) Limited support for applications (database) Can be susceptible to viruses/attacked Performance issues: file access vs. block access Requires specialized back-up software support (NDMP) Major Players: Network Appliance

EMC (Celerra)
Blue Arc Dell EqualLogics Advantage:
Introduction to Storage

PS Series array is a SAN that allows any flavor of NAS (if desired) by utilizing low cost NAS heads/gateways.

A-5

A SAN is a high-performance network dedicated to storage. It provides anyto-any connectivity for the resources on the SAN; any server can potentially talk to any storage device, and communication between storage and SAN devices (switches, hubs, routers, bridges) is enabled. SANs employ fiber optic and copper connections to create dedicated networks for servers and their storage systems. Due to its fast performance, high reliability, and long reach of fiber optics, the SAN makes it practical to locate the storage systems away from the servers. This opens the door to storage clustering, data sharing, and disaster planning applications, while managing the users storage systems centrally. SAN formal definition from the Storage Network Industry Association (SNIA): A network whose primary purpose is the transfer of data between computer systems and storage elements and among storage elements. A SAN consists of a communication infrastructure which provides physical connections and a management layer which organizes the connections, storage elements, and computer systems so that data transfer is secure and robust.--SNIA Technical Dictionary, copyright Storage Networking Industry Association, 2000 SANs address data by LUNs and Logical Block Addresses (LBAs). While SAN devices can be shared by many client servers, the data is typically not shared except within clusters. File systems are managed by client servers not by the SAN devices. Performance is optimized for block I/O operations. Applications run on the client servers and therefore are all supported. Servers Introductioncan boot off a SAN. The servers can be free of any local disks, greatly to Storage

A-6

SAN and NAS are complementary solutions not really competing solutions. SAN uses block I/O between server and storage. Data is stored and retrieved on disk and tape devices Blocks are the atomic unit of data recognition and protection: Represented in binary 1s and 0s Fixed size A unit of application data that is transferred within a single sequence is a set of data frames with a common sequence_ID corresponding to one message element, block, or information unit.

NAS uses file I/O between server and storage.


Works within a file system File system refers to structures and software used to organize and manage data and programs on the hard disk Operating system dependent Files are made up of: Ordered sequence of data bytes, Symbolic name identifier Set of properties, e.g. ownership and access permissions Application must read/write the file in order for operations to occur. Files may be created and deleted. Most file systems expand or contract during their lifetimes. Introduction to Storage
A-7

Most customers that currently have DAS are now considering moving to a SAN. The arguments for moving from DAS to SAN are that SANs: Facilitate easy and dynamic storage growth with little or no disruption to users and applications. Reduce significantly the total cost of ownership (TCO). Better utilize storage and allow for storage consolidation. Provide better management and control. Provide the ability to utilize new technologies such as:

Clustering
Better and more efficient backup utilities Replication and snapshots

Introduction to Storage

A-8

Introduction to Storage

Fibre Channel (FC) SANs Fibre Channel solves cabling problems associated with parallel SCSI disk arrays Installation base in high-end data centers and large enterprise networks Expensive technology Expensive skills required to implement, design, manage, etc. Interoperability among FC solutions is an issue Characteristics Costly to deploy, often requires professional services Complex to setup, configure and troubleshoot, Connectivity to servers is very expensive (up to 10 times Gig-E), Requires support of a separate network (fiber-based) Interoperability issues. Major Players: EMC/DELL HP IBM iSCSI SAN utilizes iSCSI (standards-based IP technology) iSCSI is a globally adopted standard (an IP-based protocol) Simple to use, manage, and deploy Cost effective with excellent bandwidth

A-9

Fibre Channel factors to be considered Costly Added cost per port Added costs for multi-pathing software, i.e. EMCs Powerpath @ $2,500/host Redundancy for HA requires multiple fabrics Disruptive to changes Requires professional services to manage New technology requires huge amounts of retraining iSCSI factors to consider Low cost Lower cost per port GE switches are cheaper than FC Lower cost of HBA if used Zero cost for software initiator in most OSs Easy to manage. No professional services required. Highly redundant and HA at a low cost Easy to scale and no disruptions Security built in with TCP/IP CHAP Radius LDAP Etc Native and scalable over long distances without added boxes as Introduction to Storage

A-10

SAN technology is finding favor with users for a number of reasons: SANs can be readily and cost-effectively expanded to support more users, more raw storage, more storage devices, more parallel data paths, and more widely distributed user populations. Though SAN connectivity is not unlimited, the 126 nodes on a SAN loop are a good start. On a SAN fabric, the total number of nodes is usually written as a power of 2 (actually, 224), resulting in over 16 million nodes. This connectivity enables storage for many servers to be consolidated on a small number of shared storage devices, which reduces costs and eases management of capital assets. SAN provides higher utilization of storage compared to DAS. By providing a separate channel for data, a SAN can offload back-up traffic from the LAN. Beyond LAN-free back-up, there is serverless back-up where the data is moved across the SAN from one storage unit directly to another. Since the SAN can allow any server to access any storage unit, it provides a foundation for server clustering and eventually data sharing. The application and server operating system, however, must also support the arrangement. Finally, many SANs are being employed in disaster recovery plans where the long reach of the SAN can be used to deliver data to a remote mirror.

Introduction to Storage

A-11

SAN users can add storage facilities as required without disrupting ongoing operations. For instance, if increased disk capacity causes backup times to exceed the time window available, then additional backup units can be added to the SAN and become immediately available to all servers and the backup application. Once server clustering is established, the number of servers devoted to a given application can be increased or decreased dynamically to match demand. This can be done manually or under control of a transaction monitor. The storage systems performance can be measured and tuned for optimum throughput, balancing the load among servers and storage devices. Bandwidth can also be temporarily applied to high-priority bulk data transfers typical of data warehouse and data mining applications.

Introduction to Storage

A-12

Emphasis on highly available, if not entirely nonstop, applications has drawn attention to storage as a potential point-of-failure. Even with fully redundant storage solutions (such as RAID mirroring), as long as storage is accessible only through the server, the server itself must also be made fully redundant. In a SAN, the storage system is independent of the application server and can be readily switched from one server to another. In principle, any server can provide failover capability for any other, resulting in more protected servers at lower cost than in traditional 2-4 server cluster arrangements. A central pool of storage can be managed cost-effectively using SAN tools. Central management makes it easier to measure and predict demands for service and it leverages management tools and training across a broader base of systems. Backups proceed independent of the load on application servers. Most obvious of these are backup and archiving operations, but the list also includes expansion and reallocation of storage resources and operation of support services such as location and name resolution.

Introduction to Storage

A-13

SAN is comprised of three major areas as follows: Host/ Servers Each having one or more iSCSI Host Bus Adapters (HBAs) or Network Interface Cards (NICs) Applications Databases File servers Network Gigabit Ethernet Gigabit Ethernet swtiches Cat5E or Cat 6 cabling WAN capable Storage iSCSI-based Disk arrays Tape systems

Introduction to Storage

A-14

SAN components Initiator/ Server HBA Provides physical connection and access point to the SAN storage elements and computer systems NIC Server based Network Interface Card, provides physical connectivity to the SAN. Target/Storage Disk Tape Traffic between initiator and target Usually identified with block I/O services rather than file access services.

Small Computer System Interface (SCSI)


Define I/O buses primarily intended for connecting storage subsystems or devices to hosts through HBAs. Originally intended primarily for use with small (desktop and desk-side workstation) computers, SCSI has been extended to serve most computing needs and is, arguably, the most widely implemented I/O bus in use today.
Introduction to Storage A-15

Servers can be of any variety, although UNIX and NT systems are most common. HBAs are needed to connect the server to the SAN. Disk storage systems can be RAID or JBOD. In addition, SANs often include tape systems or optical backup devices. SAN connections are most often provided by fiber optic cables and connections through hubs and switches. However, small scale SANs can be implemented using copper connections. SAN management software allows the user to control the SAN and the storage systems it supports.

Introduction to Storage

A-16

Every device connected to a SAN requires an interface or adapter board. Some units have built-in interfaces while others rely on HBAs or server-based NIC cards designed for PCI. Adapter boards will have either integrated (soldered) components or plug-in modules. The two module types are:

SFPs. Modules or cartridges that plug into external slots on system adapter boards and into switches. They are available in copper and fiber versions. Some adapters are capable of full duplex operation for performance up to 200 MB/sec.
SNMP and MIB support for ease of remote management

Introduction to Storage

A-17

RAID (Redundant Array of Inexpensive Disks) is a disk clustering technology that has been available on larger systems for many years. Depending on how you configure the array, you can have the data mirrored (duplicate copies on separate drives), striped (interleaved across several drives), or parity-protected (extra data written to identify errors). You can use these techniques in combination to deliver the balance of performance and reliability that the user requires. Because of the high capacity (and cost) of RAID storage systems, they are good candidates for sharing across a SAN. Although you can certainly have a SAN without RAID, the two technologies are often used hand-in-hand. The four most common RAID levels are 0, 1, 3, and 5. The following lists and defines RAID levels: Level 0: Provides data striping (spreading out blocks of each file across multiple disks) but no redundancy. Improves performance but does not deliver fault tolerance. Level 1: Provides disk mirroring; data is written to two duplicate disks simultaneously. Level 3: Same as Level 0, but also reserves one dedicated disk for error correction data. Provides good performance and some level of fault tolerance. Level 5: Provides data striping at the byte level and stripe error correction information. Results in excellent performance and Introduction to Storage good fault tolerance.

A-18

RAID 0 is not a fault tolerant RAID solution. If one drive fails, all data within the entire array is lost. It is used where raw speed is the only (or major) objective. Provides the highest storage efficiency of all array types. Made by grouping two or more physical disks together to create a virtual disk. This virtual disk appears as one physical disk to the host. Each physical drive's storage space is partitioned into stripes. RAID 1 provides complete protection and is used in applications containing mission-critical data. It uses paired disks where one physical disk is partnered with a second physical disk. Each physical disk contains the same exact data to form a single virtual drive. RAID 5 uses parity information interspersed across the drive array. RAID 5 requires a minimum of 3 drives. One drive can fail without affecting the availability of data. In the event of a failure, the controller regenerates the lost data of the failed drive from the other surviving drives. RAID 6 is an extension of RAID 5 which allows for additional fault tolerance by using a second independent distributed parity scheme (dual parity) Data is striped on a block level across a set of drives, just like in RAID 5, and a second set of parity is calculated and written across all the drives RAID 6 provides extremely high data fault tolerance and can Introduction to Storage sustain multiple simultaneous drive failures. It requires N+2
A-19

RAID 10 consists of multiple sets of mirrored drives. These mirrored drives are then striped together to create the final virtual drive. Pros: High levels of reliability Can handle multiple disk failures Provides highest performance with data protection

Striping multiple mirror sets can create larger virtual drives


Cons: Like RAID 1, writes the information twice and thus incurs a minor performance penalty when compared to writing to a single disk Requires an additional disk to make up each mirror set By implementing RAID 10, the result is an extremely scalable mirror array capable of performing reads and writes significantly faster (since the disk operations are spread over more drive heads). RAID 50 combines the block striping and parity of RAID 5 with the straight block striping of RAID 0. RAID 50 is a RAID 0 array striped across RAID 5 elements. Pros: Ensures that if one of the disks in any parity group fails, its contents can be extracted using the information on the remaining functioning disks in its parity group Offers better data redundancy than the simple RAID types (i.e., RAID 1 & 5) Can improve the throughput of read operations by allowing reads to be performed concurrently on multiple disks in the set Cons: Slower performance than RAID 10 Slower write than read performance (write penalty) Improves on the performance of RAID 5 through the addition of RAID 0, particularly during writes

Introduction to Storage

A-20

Introduction to Storage

A-21

Introduction to Storage

A-22

You might also like