Professional Documents
Culture Documents
Bertrand Dufrasne Giacomo Chiapparini Attila Grosz Mark Kremkus Lisa Martinez Markus Oscheka Guenter Rebmann Christopher Sansone
ibm.com/redbooks
International Technical Support Organization IBM XIV Storage System: Concepts, Architecture, and Usage January 2009
SG24-7659-00
Note: Before using this information and the product it supports, read the information in Notices on page ix.
First Edition (January 2009) This edition applies to the XIV Storage System (2810-A14) with Version 10.0.0 of the XIV Storage System software (5639-XXA) and the XIV Storage System GUI and Extended Command Line Interface (XCLI) Version 2.2.43.
Copyright International Business Machines Corporation 2009. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
Contents
Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .x Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi The team that wrote this book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi Become a published author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv Chapter 1. IBM XIV Storage System overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 System components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.2 Key design points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.3 The XIV Storage System Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 2 3 4
Chapter 2. XIV logical architecture and concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1 Architectural overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Massive parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.1 Grid architecture over monolithic architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.2 Logical parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3 Full storage virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3.1 Logical system concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.3.2 System usable capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.3.3 Storage Pool concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.3.4 Capacity allocation and thin provisioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.4 Reliability, availability, and serviceability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.4.1 Resilient architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.4.2 Rebuild and redistribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.4.3 Minimized exposure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Chapter 3. XIV physical architecture and components . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 IBM XIV Storage System Model A14 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Hardware characteristics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 IBM XIV hardware components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 The rack and the UPS modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Data and Interface Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.3 SATA disk drives. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.4 The patch panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.5 Interconnection and switches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.6 Hardware needed by support and IBM SSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Redundant hardware. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Power redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 Switch/interconnect redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Hardware parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 4. Physical planning and installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Ordering IBM XIV hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Feature codes and hardware configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 2810-A14 Capacity on Demand ordering options . . . . . . . . . . . . . . . . . . . . . . . . . 45 46 46 47 48 50 57 59 60 61 61 62 62 62 63 64 64 64 66
Contents
iii
4.3 Physical planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Site requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Basic configuration planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Network connection considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.1 Fibre Channel connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.2 iSCSI connections. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.3 Mixed iSCSI and Fibre Channel host access . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.4 Management connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.5 Mobile computer ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.6 Remote access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Remote Copy connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6.1 Remote Copy links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6.2 Remote Target Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7 Planning for growth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7.1 Future requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8 IBM XIV installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.1 Physical installation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.2 Basic configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.3 Complete the physical installation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
67 67 69 70 70 73 74 74 75 75 76 76 76 77 77 77 77 78 78
Chapter 5. Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.1 IBM XIV Storage Management software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 5.1.1 XIV Storage Management software installation . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.2 Managing the XIV Storage System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.2.1 Launching the Management Software GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 5.2.2 Log on to the system with XCLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 5.3 Storage Pools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 5.3.1 Managing Storage Pools with XIV GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 5.3.2 Manage Storage Pools with XCLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.4 Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 5.4.1 Managing volumes with the XIV GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 5.4.2 Managing volumes with XCLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 5.5 Host definition and mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 5.5.1 Managing hosts and mappings with XIV GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 5.5.2 Managing hosts and mappings with XCLI. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 5.6 Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Chapter 6. Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Physical access security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 User access security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 Role Based Access Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.2 Manage user rights with the GUI. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.3 Managing users with XCLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.4 LDAP and Active Directory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Password management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Managing multiple systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Event logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.1 Viewing events in the XIV GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.2 Viewing events in the XCLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.3 Define notification rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 126 126 126 128 134 138 139 140 141 142 143 145
Chapter 7. Host connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 7.1 Connectivity overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 7.1.1 Module, patch panel, and host connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 iv
IBM XIV Storage System: Concepts, Architecture, and Usage
7.1.2 FC and iSCSI simplified access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.3 Remote Mirroring connectivity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Fibre Channel (FC) connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.1 Preparation steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.2 FC configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.3 Zoning and VSAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.4 Identification of FC ports (initiator/target) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.5 IBM XIV logical FC maximum values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 iSCSI connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.1 iSCSI configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.2 Link aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.3 IBM XIV Storage System iSCSI setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.4 Identifying iSCSI ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.5 Using iSCSI hardware or software initiator (recommendation) . . . . . . . . . . . . . . 7.3.6 IBM XIV logical iSCSI maximum values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.7 Boot from iSCSI target . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Logical configuration for host connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.1 Required generic information and preparation . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.2 Prepare for a new host: XIV GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.3 Prepare for a new host: XCLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 8. OS-specific considerations for host connectivity . . . . . . . . . . . . . . . . . . . 8.1 Attaching Microsoft Windows host to XIV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.1 Windows host FC configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.2 Windows host iSCSI configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.3 Management volume LUN 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Attaching AIX hosts to XIV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.1 AIX host FC configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.2 AIX host iSCSI configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.3 Management volume LUN 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.1 Support issues that distinguish Linux from other operating systems . . . . . . . . . 8.3.2 FC and multi-pathing for Linux using PROCFS . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.3 FC and multi-pathing for Linux using SYSFS . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.4 Linux iSCSI configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 Sun Solaris . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4.1 FC and multi-pathing configuration for Solaris . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4.2 iSCSI configuration for Solaris . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5 VMware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5.1 FC and multi-pathing for VMware ESX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5.2 ESX Server iSCSI configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 9. Performance characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1 Performance concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1.1 Full disk resource utilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1.2 Caching considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1.3 Data mirroring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1.4 SATA drives compared to FC drives. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1.5 Snapshot performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1.6 Remote Mirroring performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 Best practices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.1 Distribution of connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.2 Host configuration considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
150 151 152 152 153 155 156 159 159 160 162 162 164 167 168 169 170 170 173 177 179 180 181 185 192 193 194 199 203 204 204 204 210 216 218 218 221 223 223 224 235 236 236 236 237 238 238 238 239 239 239
Contents
9.2.3 XIV sizing validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3 Performance statistics gathering with XIV. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3.1 Using the GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3.2 Using the XCLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 10. Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1 System monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1.1 Monitoring with the GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1.2 Monitoring with XCLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1.3 SNMP-based monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1.4 XIV SNMP setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1.5 Using IBM Director . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2 Call Home and remote support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2.1 Setting up Call Home . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2.2 Remote support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2.3 Repair flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 11. Copy functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Snapshots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.1 Architecture of snapshots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.2 Volume snapshots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.3 Consistency Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.4 Snapshot with Remote Mirror . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.5 Windows Server 2003 Volume Shadow Copy Service . . . . . . . . . . . . . . . . . . . 11.1.6 MySQL database backup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Volume Copy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.2 Performing a Volume Copy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.3 Creating an OS image with Volume Copy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 12. Remote Mirror . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1 Remote Mirror . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1.1 Remote Mirror overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1.2 Boundaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1.3 Initial setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1.4 Coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1.5 Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1.6 Disaster Recovery. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1.7 Reads and writes in a Remote Mirror . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1.8 Role switchover. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1.9 Remote Mirror step-by-step illustration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1.10 Recovering from a failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 13. Data migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2 Handling I/O requests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3 Data migration stages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3.1 Initial configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3.2 Testing the configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3.3 Activate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3.4 Migration process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3.5 Synchronization complete . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3.6 Delete the data migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
240 240 240 246 249 250 250 255 260 262 264 273 273 281 282 285 286 286 288 300 310 311 312 317 318 318 319 323 324 324 325 325 331 332 333 333 333 334 351 353 354 355 356 357 359 359 359 360 360
vi
Related publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IBM Redbooks publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Other publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Online resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . How to get IBM Redbooks publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Help from IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363
Contents
vii
viii
Notices
This information was developed for products and services offered in the U.S.A. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785 U.S.A. The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk. IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. COPYRIGHT LICENSE: This information contains sample application programs in source language, which illustrate programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs.
Notices
ix
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. These and other IBM trademarked terms are marked on their first occurrence in this information with the appropriate symbol ( or ), indicating US registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at http://www.ibm.com/legal/copytrade.shtml The following terms are trademarks of the International Business Machines Corporation in the United States, other countries, or both:
AIX 5L AIX Alerts BladeCenter DB2 Universal Database DB2 DS4000 DS6000 DS8000 i5/OS IBM NetView POWER Redbooks Redbooks (logo) System Storage System x System z Tivoli TotalStorage XIV
The following terms are trademarks of other companies: Disk Magic, and the IntelliMagic logo are trademarks of IntelliMagic BV in the United States, other countries, or both. Snapshot, and the NetApp logo are trademarks or registered trademarks of NetApp, Inc. in the U.S. and other countries. SUSE, the Novell logo, and the N logo are registered trademarks of Novell, Inc. in the United States and other countries. Oracle, JD Edwards, PeopleSoft, Siebel, and TopLink are registered trademarks of Oracle Corporation and/or its affiliates. QLogic, and the QLogic logo are registered trademarks of QLogic Corporation. SANblade is a registered trademark in the United States. VMware, the VMware "boxes" logo and design are registered trademarks or trademarks of VMware, Inc. in the United States and/or other jurisdictions. Java, MySQL, RSM, Solaris, Sun, Sun StorEdge, and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Active Directory, ESP, Microsoft, MS, SQL Server, Windows Server, Windows Vista, Windows, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. Intel Xeon, Intel, Intel logo, Intel Inside logo, and Intel Centrino logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States, other countries, or both. UNIX is a registered trademark of The Open Group in the United States and other countries. Linux is a trademark of Linus Torvalds in the United States, other countries, or both. Mozilla, Firefox, as well as the Firefox logo are owned exclusively by the Mozilla Foundation . All rights in the names, trademarks, and logos of the Mozilla Foundation, including without limitation Other company, product, or service names may be trademarks or service marks of others.
Preface
This IBM Redbooks publication describes the concepts, architecture, and implementation of the IBM XIV Storage System (2810-A14). The XIV Storage System is designed to be a scalable enterprise storage system that is based upon a grid array of hardware components. It can attach to both Fibre Channel Protocol (FCP) and IP network Small Computer System Interface (iSCSI) capable hosts. This system is a good fit for clients who want to be able to grow capacity without managing multiple tiers of storage to maximize performance and minimize cost. The XIV Storage System is well suited for mixed or random access workloads, such as the processing of transactions, video clips, images, and e-mail, and industries, such as telecommunications, media and entertainment, finance, and pharmaceutical, as well as new and emerging workload areas, such as Web 2.0. The first chapters of this book provide details about several of the unique and powerful concepts that form the basis of the XIV Storage System logical and physical architecture. We explain how the system was designed to eliminate direct dependencies between the hardware elements and the software that governs the system. In subsequent chapters, we explain the planning and preparation tasks that are required to deploy the system in your environment, which is followed by a step-by-step procedure describing how to configure and administer the system. We provide illustrations about how to perform those tasks by using the intuitive, yet powerful XIV Storage Manager GUI or the Extended Command Line Interface (XCLI). We explore and illustrate the use of snapshots and Remote Copy functions. The book also outlines the requirements and summarizes the procedures for attaching the system to various host platforms. This IBM Redbooks publication is intended for those people who want an understanding of the XIV Storage System and also targets readers who need detailed advice about how to configure and use the system.
xi
Attila Grosz is a Field Technical Sales Specialist at the IBM Systems and Technology Group in Budapest, Hungary. He is a member of the CEMAAS STG Systems Architect team. He is responsible for System Storage presales technical support within STG. He has 10 years of experience with storage in Open Systems environments, including AIX, Linux, and Windows. Attila has worked at IBM since 1999, in various divisions. He holds a Communication-Technical Engineering degree from the University of Godollo, Hungary. Mark Kremkus is a Senior Accredited I/T Specialist based in Austin, Texas. He has seven years of experience providing consultative sales support for the full spectrum of IBM Storage products. His current area of focus involves creating and presenting Disk Magic studies for the full family of DS4000, DS6000, DS8000, and SAN Volume Controller (SVC) products across a broad range of open and mainframe environments. He holds a Bachelor of Science degree in Electrical Engineering from Texas A&M University, and graduated with honors as an Undergraduate Research Fellow in MRI technology. Lisa Martinez is a Senior Software Engineer working in the DS8000 System Test Architecture in Tucson, Arizona. She has nine years of experience in Enterprise Disk Test. She holds a Bachelor of Science degree in Electrical Engineering from the University of New Mexico and a Computer Science degree from New Mexico Highlands University. Her areas of expertise include Open Systems and IBM System Storage DS8000 including Copy Services, with recent experience in System z. Markus Oscheka is an IT Specialist for Proof of Concepts and Benchmarks in the Enterprise Disk High End Solution Europe team in Mainz, Germany. His areas of expertise include setup and demonstration of IBM System Storage and TotalStorage solutions in various environments, such as AIX, Linux, Windows, Hewlett-Packard UNIX (HP-UX), and Solaris. He has worked at IBM for seven years. He has performed many Proof of Concepts with Copy Services on DS6000/DS8000, as well as Performance-Benchmarks with DS4000/DS6000/DS8000. He has written extensively in various IBM Redbooks publications and acts as the co-project lead for these IBM Redbooks publications, including DS6000/DS8000 Architecture and Implementation and DS6000/DS8000 Copy Services. He holds a degree in Electrical Engineering from the Technical University in Darmstadt. Guenter Rebmann is an IBM Certified Specialist for High End Disk Solutions, working for the EMEA DASD Hardware Support Center in Mainz, Germany. Guenter has more than 20 years of experience in large system environments and storage hardware. Currently, he provides support for the EMEA Regional FrontEnd Support Centers with High End Disk Subsystems, such as the ESS, DS8000, and previous High End Disk products. Since 2004, he has been a member of the Virtual EMEA Team (VET) Support Team. Christopher Sansone is a performance analyst located in IBM Tucson. He currently works with DS8000, DS6000, and XIV storage products to generate marketing material and assist with performance issues related to these products. Prior to working in performance, he worked in several development organizations writing C code for Fibre Channel storage devices, including DS8000, ESS 800, TS7740, and Virtual Tape Server (VTS). Christopher holds a Masters degree in Electrical Engineering from NTU and a Bachelors degree in Computer Engineering from Virginia Tech.
xii
Figure 1 The team: Lisa, Attila (back), Christopher, Bert, Markus, Guenter, Mark, and Giacomo
Special thanks to: John Bynum Worldwide Technical Support Management IBM US, San Jose For their technical advice and support, many thanks to: Rami Elron Aviad Offer Thanks to the following people for their contributions to this project: Barbara Reed Darlene Ross Helen Burton Juan Yanes John Cherbini Richard Heffel Jim Sedgwick Brian Sherman Dan Braden Rosemary McCutchen Kip Wagner Maxim Kooser Izhar Sharon Melvin Farris Dietmar Dausner
Preface
xiii
Comments welcome
Your comments are important to us. We want our books to be as helpful as possible. Send us your comments about this book or other IBM Redbooks publications in one of the following ways: Use the online Contact us review IBM Redbooks publications form found at: ibm.com/redbooks Send your comments in an e-mail to: redbooks@us.ibm.com Mail your comments to: IBM Corporation, International Technical Support Organization Dept. HYTD Mail Station P099 2455 South Road Poughkeepsie, NY 12601-5400
xiv
Chapter 1.
1.1 Overview
The XIV Storage System architecture is designed to deliver performance, scalability, and ease of management while harnessing the high capacity and cost benefits of Serial Advanced Technology Attachment (SATA) drives. The system employs off-the-shelf products as opposed to traditional offerings, which need more expensive components that use proprietary designs.
Figure 1-1 IBM XIV Storage System 2810-A14 components: Front and rear view
All of the modules in the system are linked through an internal redundant Gigabit Ethernet network, which enables maximum bandwidth utilization and is resilient to at least any single component failure. The system and all of its components come pre-assembled and wired in a lockable rack.
Massive parallelism
The system architecture ensures full exploitation of all system components. Any I/O activity involving a specific logical volume in the system is always inherently handled by all spindles. The system harnesses all storage capacity and all internal bandwidth, and it takes advantage of all available processing power, which is as true for host-initiated I/O activity as it is for system-initiated activity, such as rebuild processes and snapshot generation. All disks, CPUs, switches, and other components of the system contribute at all times.
Workload balancing
The workload is evenly distributed over all hardware components at all times. All disks and modules are utilized equally, regardless of access patterns. Despite the fact that applications might access certain volumes more frequently than other volumes, or access certain parts of a volume more frequently than other parts, the load on the disks and modules will be balanced perfectly. Pseudo-random distribution ensures consistent load-balancing even after adding, deleting, or resizing volumes, as well as adding or removing hardware. This balancing of all data on all system components eliminates the possibility of a hot-spot being created.
Self-healing
Protection against double disk failure is provided by an efficient rebuild process that brings the system back to full redundancy in minutes. In addition, the XIV Storage System extends the self-healing concept, resuming redundancy even after failures in components other than disks.
True virtualization
Unlike other system architectures, storage virtualization is inherent to the basic principles of the XIV Storage System design. Physical drives and their locations are completely hidden from the user, which dramatically simplifies storage configuration, letting the system lay out the users volume in the optimal way. The automatic layout maximizes the systems performance by leveraging system resources for each volume, regardless of the users access patterns.
Thin provisioning
The system enables thin provisioning, which is the capability to allocate storage to applications on a just-in-time and as needed basis, allowing significant cost savings compared to traditional provisioning techniques. The savings are achieved by defining a logical capacity that is larger than the physical capacity. This capability allows users to improve storage utilization rates, thereby significantly reducing capital and operational expenses by allocating capacity based on total space consumed, rather than total space allocated.
We discuss these key design points and underlying architectural concepts in detail in Chapter 2, XIV logical architecture and concepts on page 7.
Chapter 2.
Hardware elements
In order to convey the conceptual principles that comprise the XIV Storage System architecture, it is useful to first to provide a glimpse of the physical infrastructure. The primary components of the XIV Storage System are known as modules. Modules provide processing, cache, and host interfaces and are based on standard Intel and Linux systems. They are redundantly connected to one another through an internal switched Ethernet fabric. All of the modules work together concurrently as elements of a grid architecture, and therefore, the system harnesses the powerful parallelism inherent to a distributed computing environment, as shown in Figure 2-1. We discuss the grid architecture in 2.2, Massive parallelism on page 10.
Ethernet Switches
UPS units
Although externally similar in appearance, Data and Interface/Data Modules differ in functions, interfaces, and in how they are interconnected.
Figure 2-1 IBM XIV Storage System major hardware elements
Data Modules
At a conceptual level, the Data Modules function as the elementary building blocks of the system, providing physical capacity, processing power, and caching, in addition to advanced system-managed services that comprise the systems internal operating environment. The equivalence of hardware across Data Modules and the Data Module ability to share and manage system software and services are key elements of the physical architecture, as depicted in Figure 2-2 on page 9.
Interface Modules
Fundamentally, Interface Modules are equivalent to Data Modules in all aspects, with the following exceptions: In addition to disk, cache, and processing resources, Interface Modules are designed to include both Fibre Channel and iSCSI interfaces for host system connectivity as well as Remote Mirroring. Figure 2-2 conceptually illustrates the placement of Interface Modules within the topology of the XIV IBM Storage System architecture. The system services and software functionality associated with managing external I/O reside exclusively on the Interface Modules.
Ethernet switches
The XIV Storage System contains a redundant switched Ethernet fabric that conducts both data and metadata traffic between the modules. Traffic can flow in the following ways: Between two Interface Modules Between an Interface Module and a Data Module Between two Data Modules Note: It is important to realize that Data Modules and Interface Modules are not connected to the Ethernet switches in the same way. For further details about the hardware components, refer to Chapter 3, XIV physical architecture and components on page 45.
Interface and Data Modules are connected to each other through an internal IP switched network. Figure 2-2 Architectural overview
Note: Figure 2-2 depicts the conceptual architecture of the system only; do not misinterpret the number of connections and such as a precise hardware layout.
Monolithic subsystems
Conventional storage subsystems utilize proprietary, custom-designed hardware components (rather than generally available hardware components) and interconnects that are specifically engineered to be integrated together to achieve target design performance and reliability objectives. The complex high performance architectural aspects of redundant monolithic systems generally effectuate one or more of the following characteristics: Openness: Components that need to be replaced due to a failure or a hardware upgrade are generally manufacturer-specific due to the custom design inherent to the system. The system cannot easily leverage newer hardware designs or components introduced to the market. Performance: Even in a N+1 clustered system, the loss of a clustered component not only might have a significant impact on the way that the system functions, but might also impact the performance experienced by hosts and host applications. Upgradability and scalability: The ability to upgrade the system by scaling up resources: Though the system might remain operational, the process of upgrading system resources has the potential to impact performance and availability for the duration of the upgrade procedure.
10
Upgrades generally require careful and potentially time-consuming planning and administration, and might even require a degree of outage under certain circumstances. Although a specific layer of the vertically integrated monolithic storage subsystem hierarchy can be enhanced during an upgrade, it is possible that: It will result in an imbalance by skewing the ratio of resources, such as cache, processors, disks, and buses, thus precluding the full benefit of the upgrade by allowing certain resources, or portions thereof, to go unused. Architectural limitations of the monolithic system might prevent a necessary complementary resource from scaling. For example, a disk subsystem might accommodate an upgrade to the number of drives, but not the processors, resulting in a limitation of the performance potential of the overall system.
Generally, monolithic systems cannot be scaled out by adding computing resources: The major disadvantage of monolithic architectures is their proprietary nature, which impedes the adoption of new technologies, even partially. Monolithic architectures are harder to extend through external products or technologies, even though they typically contain all of the necessary ingredients for functioning. At a certain point, it is necessary to simply migrate data to a newer subsystem, because the upgradeability of the current system has been exhausted, resulting in: The need for a large initial acquisition or hardware refresh. The necessity of potentially time-consuming data migration planning and administration.
Interface
Interface
Interface
Controllers
Building blocks:
Disks Cache Controllers Interfaces Interconnects
11
Design principles
The XIV Storage System grid architecture, by virtue of its distributed topology and standard Intel and Linux building block components, ensures that the following design principles are possible: Performance: The relative effect of the loss of a given computing resource, or module, is minimized. All modules are able to participate equally in handling the total workload. This design principle is true regardless of access patterns. The system architecture enables excellent load balancing, even if certain applications access certain volumes, or certain parts within a volume, more frequently. Openness: Modules consist of standard ready to use components. Because components are not specifically engineered for the subsystem, the resources and time required for the development of newer hardware technologies are minimized. This benefit, coupled with the efficient integration of computing resources into the grid architecture, enables the subsystem to realize the rapid adoption of the newest hardware technologies available without the need to deploy a whole new subsystem. Upgradability and scalability: Computing resources can be dynamically changed: Scaled out by either adding new modules to accommodate both new capacity and new performance demands, or by tying together groups of modules Scaled up by upgrading modules Important: While the grid architecture of the XIV Storage System enables the potential for great flexibility, the current supported hardware configuration contains a fixed number of
modules.
Figure 2-4 on page 13 depicts a conceptual view of the XIV Storage System grid architecture and its design principles.
12
Design principles:
Massive parallelism Granular distribution Coupled disk, RAM and CPU Off-the-shelf components User simplicity
Interface
Interface
Interface
Interface
Interface
Switching
Data Module Data Module Data Module Data Module Data Module Data Module
Data Module
Figure 2-4 IBM XIV Storage System scalable conceptual grid architecture
Important: Figure 2-4 is a conceptual depiction of the XIV Storage System grid architecture, and therefore, is not intended to accurately represent numbers of modules, module hardware, switches, and so on.
Proportional scalability
Within the XIV Storage System, each module is a discrete computing (and capacity) resource containing all of the pertinent hardware elements that are necessary for a grid topology (processing, caching, and storage). All modules are connected through a scalable network. This aspect of the grid infrastructure enables the relative proportions of cache, processors, disk, and interconnect bandwidth to remain optimal even in the event that modules are added or removed: Linear cache growth: The total system cache size and cache bandwidth increase linearly with disk capacity, because every module is a self-contained computing resource that houses its own cache. Note that the cache bandwidth scales linearly in terms of both host-to-cache and cache-to-disk throughput, and the close proximity of cache, processor, and disk is maintained. Proportional interface growth: Interface Modules house iSCSI and Fibre Channel host interfaces and are able to access not only the local resources within the module, but the entire system. Adding modules to the system proportionally scales both the number of host interfaces and the bandwidth to the internal resources. Constant switching capacity: The internal switching capacity is designed to scale proportionally as the system grows, preventing bottlenecks regardless of the number of modules. This capability ensures that internal throughput scales proportionally to capacity. Embedded processing power: Because each module incorporates its own processing power in conjunction with cache and disk components, the ability of the system to perform processor-intensive tasks, such as aggressive prefetch caching, sophisticated cache updates, snapshot management, and data distribution, is always maintained regardless of of the system capacity.
Chapter 2. XIV logical architecture and concepts
13
Pseudo-random algorithm
The spreading of data occurs in a pseudo-random fashion. While a discussion of random algorithms is beyond the scope of this book, the term pseudo-random is intended to describe the uniform but random spreading of data across all available disk hardware resources while maintaining redundancy. Figure 2-5 on page 17 provides a conceptual representation of the pseudo-random distribution of data within the XIV Storage System. For more details about the topic of data distribution and storage virtualization, refer to 2.3.1, Logical system concepts on page 16. Note: The XIV Storage System exploits mass parallelism at both the hardware and software levels.
14
Consistent performance and scalability: Hardware resources are always utilized equitably, because all logical volumes always span all physical resources and are therefore able to reap the performance potential of the full subsystem: Virtualization algorithms automatically redistribute the logical volumes data and workload when new hardware is added, thereby maintaining the system balance while preserving transparency to the attached hosts. Conversely, equilibrium and transparency are maintained during the phase-out of old or defective hardware resources.
There are no pockets of capacity, orphaned spaces, or resources that are inaccessible due to array mapping constraints or data placement. Maximized availability and data integrity The full virtualization scheme enables the IBM XIV Storage Subsystem to manage and maintain data redundancy as hardware changes: In the event of a hardware failure or when hardware is phased out, data is automatically, efficiently, and rapidly rebuilt across all the drives and modules in the system, thereby preserving host transparency, equilibrium, and data redundancy at all times while virtually eliminating any performance penalty associated with conventional RAID rebuild activities. When new hardware is added to the system, data is transparently redistributed across all resources to restore equilibrium to the system. Flexible snapshots: Full storage virtualization incorporates snapshots that are differential in nature; only updated data consumes physical capacity: Many concurrent snapshots (Up to 16 000 volumes and snapshots can be defined.) Note that many concurrent snapshots are possible, because a snapshot uses physical space only after a change has occurred on the source.
15
Multiple snapshots of a single master volume can exist independently of each other. Snapshots can be cascaded, in effect, creating snapshots of snapshots.
Snapshot creation and deletion do not require data to be copied and hence occur immediately. While updates occur to master volumes, the systems virtualized logical structure enables it to elegantly and efficiently preserve the original point-in-time data associated with any and all dependent snapshots by simply redirecting the update to a new physical location on disk. This process, which is referred to as redirect on write, occurs transparently from the host perspective by virtue of the virtualized remapping of the updated data and minimizes any performance impact associated with preserving snapshots, regardless of the number of snapshots defined for a given master volume. Note: The XIV snapshot process uses redirect on write, which is more efficient than the copy on write that is used by other storage subsystems. Because the process uses redirect on write and does not necessitate data movement, the size of a snapshot is independent of the source volume size. Data migration efficiency: XIV supports thin provisioning. When migrating from a system that only supports regular (or thick) provisioning, XIV allows thick-to-thin provisioning of capacity. Thin-provisioned capacity is discussed in 2.3.4, Capacity allocation and thin provisioning on page 24. Due to the XIV pseudo-random and uniform distribution of data, the performance impact of data migration on production activity is minimized, because the load is spread evenly over all resources.
Logical constructs
The XIV Storage System logical architecture incorporates constructs that underlie the storage virtualization and distribution of data, which are integral to its design. The logical structure of the subsystem ensures that there is optimum granularity in the mapping of logical elements to both modules and individual physical disks, thereby guaranteeing an ideal distribution of data across all physical resources.
Partitions
The fundamental building block of logical volumes is known as a partition. Partitions have the following characteristics: All partitions are 1 MB (1024 KB) in size. A partition contains either a primary copy or secondary copy of data: Each partition is mapped to a single physical disk: This mapping is dynamically managed by the system through a proprietary pseudo-random distribution algorithm in order to preserve data redundancy and equilibrium. For more information about the topic of data distribution, refer to Logical volume layout on physical disks on page 19.
16
The storage administrator has no control or knowledge of the specific mapping of partitions to drives.
Secondary partitions are always placed onto a physical disk that does not contain the primary partition. In addition, secondary partitions are also in a module that does not contain its corresponding primary partition. Important: In the context of the XIV Storage System logical architecture, a partition consists of 1 MB (1024 KB) of data. Do not confuse this definition with other definitions of the term partition.
The diagram illustrates that data is uniformly and randomly distributed over all disks. Each 1 MB of data is duplicated in a primary and a secondary partition. For the same data, the system ensures that the primary partition and its corresponding secondary partition are not located on the same disk and are also not within the same module.
Figure 2-5 Pseudo-random data distribution1
Copyright 2005-2008 Mozilla. All Rights Reserved. All rights in the names, trademarks, and logos of the Mozilla Foundation, including without limitation, Mozilla, Firefox, as well as the Firefox logo, are owned exclusively by the Mozilla Foundation.
17
Logical volumes
The XIV Storage System presents logical volumes to hosts in the same manner as conventional subsystems; however, both the granularity of logical volumes and the mapping of logical volumes to physical disks fundamentally differ: As discussed previously, every logical volume is comprised of 1 MB (1024 KB) pieces of data known as partitions. The physical capacity associated with a logical volume is always a multiple of 17 GB
(decimal).
Therefore, while it is possible to present a block-designated (refer to Creating volumes on page 107) logical volume to a host that is not a multiple of 17 GB, the maximum physical space that is allocated for the volume will always be the sum of the minimum number of 17 GB increments needed to meet the block-designated capacity. Note that the initial physical capacity actually allocated by the system upon volume creation can be less than this amount, as discussed in Hard and soft volume sizes on page 25. The maximum number of volumes that can be concurrently defined on the system is limited by: The logical address space limit: The logical address range of the system permits up to 16 377 volumes, although this constraint is purely logical, and therefore, is not normally a practical consideration. Note that the same address space is used for both volumes and snapshots.
The limit imposed by the logical and physical topology of the system for the minimum volume size. The physical capacity of the system, based on 180 drives with 1 TB of capacity per drive and assuming the minimum volume size of 17 GB, limits the maximum volume count to 4 605 volumes. Again, a system with active snapshots can have more than 4 605 addresses assigned collectively to both volumes and snapshots, because volumes and snapshots share the same address space. Important: The logical address limit is ordinarily not a practical consideration during planning, because under most conditions, this limit will not be reached; it is intended to exceed the adequate number of volumes for all conceivable circumstances. Logical volumes are administratively managed within the context of Storage Pools, discussed in 2.3.3, Storage Pool concepts on page 22. Storage Pools are not part of the logical hierarchy inherent to the systems operational environment, because the concept of Storage Pools is administrative in nature.
Storage Pools Storage Pools are purely logical entities that enable storage administrators to manage
relationships between volumes and snapshots and to define separate capacity provisioning and snapshot requirements for separate applications and departments. Storage Pools are not tied in any way to specific physical resources, nor are they part of the data distribution scheme. We discuss Storage Pools and their associated concepts in 2.3.3, Storage Pool concepts on page 22.
18
Snapshots A snapshot represents a point-in-time copy of a volume. Snapshots are governed by almost
all of the principles that apply to volumes. Unlike volumes, snapshots incorporate dependent relationships with their source volumes, which can be either logical volumes or other snapshots. Because they are not independent entities, a given snapshot does not necessarily wholly consist of partitions that are unique to that snapshot. Conversely, a snapshot image will not share all of its partitions with its source volume if updates to the source occur after the snapshot was created. Chapter 11, Copy functions on page 285 examines snapshot concepts and practical considerations, including locking behavior and implementation.
Partition table
Mapping between a logical partition number and the physical location on disk is maintained in a partition table. The partition table maintains the relationship between the partitions that comprise a logical volume and its physical location on disk. Note: Both the distribution table and the partition table are redundantly maintained among the modules.
Volume layout
At a high level, the data distribution scheme is an amalgam of mirroring and striping. While it is tempting to think of this scheme in the context of RAID 1+0 (10) or 0+1, the low-level virtualization implementation precludes the usage of traditional RAID algorithms in the architecture. Conventional RAID implementations cannot incorporate dynamic, intelligent, and automatic management of data placement based on knowledge of the volume layout, nor is it feasible for a traditional RAID system to span all drives in a subsystem due to the vastly unacceptable rebuild times that can result. As discussed previously, the XIV Storage System architecture divides logical volumes into 1 MB partitions. This granularity and the mapping strategy are integral elements of the logical design that enable the system to realize the following features and benefits: Partitions are distributed on all disks using what is defined as a pseudo-random distribution function, which was introduced in 2.2.2, Logical parallelism on page 14. The distribution algorithms seek to preserve the statistical equality of access among all physical disks under all conceivable real-world aggregate workload conditions and associated volume access patterns. Essentially, while not truly random in nature, the distribution algorithms in combination with the system architecture preclude the occurrence of the phenomenon traditionally known of as hot-spots: The XIV Storage System contains 180 disks, and each volume is allocated across at least 17 GB (decimal) of capacity that is distributed evenly across all disks. Each logically adjacent partition on a volume is distributed across a different disk; partitions are not combined into groups before they are spread across the disks. The pseudo-random distribution ensures that logically adjacent partitions are never striped sequentially across physically adjacent disks. Refer to 2.2.2, Logical parallelism on page 14 for a further overview of the partition mapping topology.
19
Each disk has its data mirrored across all other disks, excluding the disks in the same
module.
Each disk holds approximately one percent of any other disk in other modules. Disks have an equal probability of being accessed from a statistical standpoint, regardless of aggregate workload access patterns. Note: When the number of disks or modules changes, the system defines a new data layout that preserves redundancy and equilibrium. This target data distribution is called the goal distribution and is discussed in Goal distribution on page 37. As discussed previously in IBM XIV Storage System virtualization benefits on page 15: The storage system administrator does not plan the layout of volumes on the modules. Provided there is space available, volumes can always be added or resized instantly with negligible impact on performance. There are no unusable pockets of capacity known as orphaned spaces. When the system is scaled out through the addition of modules, a new goal distribution is created whereby just a minimum of partitions is moved to the newly allocated capacity to arrive at the new distribution table. The new capacity is fully utilized within several hours and with no need for any administrative intervention. Thus, the system automatically returns to a state of equilibrium among all resources. Upon the failure or phase-out of a drive or a module, a new goal distribution is created whereby data in non-redundant partitions is copied and redistributed across the remaining modules and drives. The system rapidly returns to a state in which all partitions are again redundant, because all disks and modules participate in the enforcement of the new goal distribution.
20
Net usable capacity The calculation of the net usable capacity of the system consists of the total disk count, less
disk space reserved for sparing (which is the equivalent of one module plus three more disks), multiplied by the amount of capacity on each disk that is dedicated to data (that is 96%), and finally reduced by a factor of 50% to account for data mirroring. Note: The calculation of the usable space is: Usable capacity = [drive space x (% utilized for data) x [Total Drives - Hot Spare reserve]/2 Usable capacity = [1000 GB x 0.96 x [180-[12 + 3]]]/2 = 79 113 GB (decimal)
21
Consistency Groups
A Consistency Group is a group of volumes of which a snapshot can be made at the same point in time, thus ensuring a consistent image of all volumes within the group at that time. The concept of Consistency Group is ubiquitous among storage subsystems, because there are many circumstances in which it is necessary to perform concurrent operations collectively across a set of volumes, so that the result of the operation preserves the consistency among volumes. For example, effective storage management activities for applications that span multiple volumes, or for creating point-in-time backups, is not possible without first employing Consistency Groups. A notable practical scenario necessitating Consistency Groups arises when a consistent, instantaneous image of the database application (spanning both the database and the transaction log) is required. Obviously, taking snapshots of the volumes serially will result in an incongruent relationship among the volumes if the application issues writes to any of the application volumes while snapshots are occurring. This consistency between the volumes in the group is paramount to maintaining data integrity from the application perspective. By first grouping the application volumes into a Consistency Group, it is possible to later capture a consistent state of all volumes within that group at a given point-in-time using a special snapshot command for Consistency Groups. Issuing this type of a command results in the following process: 1. Complete and destage writes across the constituent volumes. 2. Instantaneously suspend I/O activity simultaneously across all volumes in the Consistency Group. 3. Create the snapshots.
22
4. Finally, resume normal I/O activity. The XIV Storage System manages these suspend and resume activities for all volumes within the Consistency Group. Note that additional mechanisms or techniques, such as techniques provided by the Microsoft Volume Shadowcopy Services (VSS) framework, might still be required to maintain full application consistency.
Snapshot reserve capacity is defined within each regular Storage Pool (not thinly provisioned Storage Pools) and is effectively maintained separately from logical, or
master, volume capacity. The same principles apply for thinly provisioned Storage Pools, which are discussed in Storage Pool-level thin provisioning on page 26, with the exception that space is not guaranteed to be available for snapshots due to the potential for hard space depletion, which is discussed in Depletion of hard capacity on page 32: Snapshots are structured in the same manner as logical volumes (also known as master volumes); however, a Storage Pools snapshot reserve capacity is granular at the partition level (1 MB). In effect, snapshots collectively can be thought of as being thinly provisioned within each increment of 17 GB of capacity defined in the snapshot reserve space. Note: The snapshot reserve needs to be a minimum of 34 GB. The system preemptively deletes snapshots if the snapshots fully consume the allocated available space. As discussed in the previous example, snapshots will only be automatically deleted when there is inadequate physical capacity available within the context of each Storage Pool independently. This process is managed by a snapshot deletion priority scheme, which is discussed in 11.1, Snapshots on page 286. Therefore, when a Storage Pools size is exhausted, only the snapshots that reside in the affected Storage Pool are deleted.
Chapter 2. XIV logical architecture and concepts
23
The space allocated for a Storage Pool can be dynamically changed by the storage administrator: The Storage Pool can always be increased in size. It is limited only by the unallocated space on the system. The Storage Pool can always be decreased in size. It is limited only by the space that is consumed by the volumes and snapshots that are defined within that Storage Pool. The designation of a Storage Pool as a regular pool or a thinly provisioned pool can be dynamically changed even for existing Storage Pools. Thin provisioning is discussed in-depth in 2.3.4, Capacity allocation and thin provisioning on page 24. The storage administrator can relocate logical volumes between Storage Pools without any limitations, provided there is sufficient free space in the target Storage Pool: If necessary, the target Storage Pool capacity can be dynamically increased prior to volume relocation, assuming there is sufficient unallocated capacity available in the system. When a logical volume is relocated to a target Storage Pool, sufficient space must be available for all of its snapshots to reside in the target Storage Pool as well. Note: When moving a volume into a Storage Pool, the size of the Storage Pool is not automatically increased by the size of the volume. Likewise, when removing a volume from a Storage Pool, the size of the Storage Pool does not decrease by the size of the volume.
Note: The system defines capacity using decimal metrics. Do not confuse, for example, 1 GB (decimal) with approximately 0.93 GB (binary), or put another way, 1 073 741 824 bytes (decimal) with 1 GB (binary).
24
Capacity acquisition and deployment can be more effectively deferred until actual application and business needs demand additional space, in effect facilitating an on-demand infrastructure.
Soft volume size The soft volume size is the size of the logical volume that is observed by the host, as defined
upon volume creation or as a result of a resizing command. The storage administrator specifies the soft volume size in the same manner regardless of whether the Storage Pool itself will be thinly provisioned. The soft volume size is specified in one of two ways, depending on units: In terms of GB: The system will allocate the soft volume size as the minimum number of discrete 17 GB increments needed to meet the requested volume size. In terms of blocks: The capacity is indicated as a discrete number of 512 byte blocks. The system will still allocate the soft volume size consumed within the Storage Pool as the minimum number of discrete 17 GB increments needed to meet the requested size (specified in 512 byte blocks); however, the size that is reported to hosts is equivalent to the precise number of blocks defined. Incidentally, the snapshot reserve capacity associated with each Storage Pool is a soft capacity limit, and it is specified by the storage administrator, though it effectively limits the hard capacity consumed collectively by snapshots as well. Note: Defining logical volumes in terms of blocks is useful when you must precisely match the size of an existing logical volume residing on another system.
Hard volume size The volume allocated hard space reflects the physical space allocated to the volume
following host writes to the volume and is discretely and dynamically provisioned by the system (not the storage administrator). The upper limit of this provisioning is determined by the soft size assigned to the volume. The volume consumed hard space is not necessarily equal to the hard volume allocated capacity, because the hard space allocation occurs in increments of 17 GB, while actual space is consumed at the granularity of the 1 MB partitions. Therefore, the actual physical space consumed by a volume within a Storage Pool is transient, because a volumes consumed hard space reflects the total amount of data that has been previously written by host applications: Hard capacity is allocated to volumes by the system in increments of 17 GB due to the underlying logical and physical architecture; there is no greater degree of granularity than 17 GB even if a only a few partitions are initially written beyond each 17 GB boundary. For more details, refer to 2.3.1, Logical system concepts on page 16.
25
Application write access patterns determine the rate at which the allocated hard volume capacity is consumed and subsequently the rate at which the system allocates additional increments of 17 GB up to the limit defined by the soft size for the volume. As a result, the storage administrator has no direct control over the hard capacity allocated to the volume by the system at any given point in time. During volume creation, or when a volume has been formatted, there is zero physical capacity assigned to the volume. As application writes accumulate to new areas of the volume, the physical capacity allocated to the volume will grow in increments of 17 GB and can ultimately reach the full soft volume size. Increasing the soft volume size does not affect the hard volume size.
Storage Pool must be defined independently of the physical, or hard, space allocated within the system for that pool. Thus, the Storage Pool hard size that is defined by the storage administrator limits the physical capacity that is available collectively to volumes and snapshots within a thinly provisioned Storage Pool, whereas the aggregate space that is assignable to host operating systems is specified by the Storage Pools soft size, which is described in Soft pool size on page 27. Important: Do not confuse the hard space associated with volumes with that associated with Storage Pools. The hard space associated with volumes derives from the physical space written by hosts, and the hard space associated with Storage Pools represents the physical space allocated for the Storage Pool within the system, which is independent of host writes. Whereas regular Storage Pools effectively segregate the hard space reserved for volumes from the hard space consumed by snapshots by limiting the soft space allocated to volumes, thinly provisioned Storage Pools permit the totality of the hard space to be consumed by volumes with no guarantee of preserving any hard space for snapshots. Logical volumes take precedence over snapshots and might be allowed to overwrite snapshots if necessary as hard space is consumed. The hard space that is allocated to the Storage Pool that is unused (or in other words, the incremental difference between the aggregate soft and hard volume sizes) can, however, be used by snapshots in the same Storage Pool. Careful management is critical to prevent hard space for both logical volumes and snapshots from being exhausted. Ideally, hard capacity utilization must be maintained under a certain threshold by increasing the pool hard size as needed in advance. Note: As discussed in Storage Pool relationships on page 23, Storage Pools control when and which snapshots are deleted when there is insufficient space assigned within the pool for snapshots. Note: The soft snapshot reserve capacity and the hard space allocated to the Storage Pool are consumed only as changes occur to the master volumes or the snapshots themselves, not as snapshots are created.
27
The designation of a Storage Pool as a regular pool or a thinly provisioned pool can be dynamically changed by the storage administrator: When a regular pool needs to be converted to a thinly provisioned pool, the soft pool size parameter needs be explicitly set in addition to the hard pool size, which will remain unchanged unless updated. When a thinly provisioned pool needs to be converted to a regular pool, the soft pool size is automatically reduced to match the current hard pool size. If the combined allocation of soft capacity for existing volumes in the pool exceeds the pool hard size, the Storage Pool cannot be converted. Of course, this situation can be resolved if individual volumes are selectively resized or deleted to reduce the soft space consumed.
Note: Unlike volumes, a thinly provisioned Storage Pools hard size and soft size are fully configured by the storage administrator.
28
Note: If the Storage Pools within the system are thinly provisioned, but the soft system size does not exceed the hard system size, the total system hard capacity cannot be filled until all Storage Pools are regularly provisioned. Therefore, we recommend that you define all Storage Pools in a non-thinly provisioned system as regular Storage Pools. The soft system size is a purely logical limit; however, you must exercise care when the soft system size is set to a value greater than the maximum potential hard system size. Obviously, it must be possible to upgrade the systems hard size to be equal to the soft size, so defining an unreasonably high system soft size can result in full capacity depletion. It is for this reason that defining the soft system size is not within the scope of the storage administrator role. There are conditions that might temporarily reduce the systems soft limit. For further details, refer to 2.4.2, Rebuild and redistribution on page 37.
29
Thin Provisioning System Hard and Soft Size with Storage Pools
The system allocates the amount of space requested by the administrator in increments of 17GB.
Regular Storage Pool
For a Thin Storage Pool, the system allocates the amount of soft space requested by the administrator independently from the hard space.
Thin Storage Pool Soft Size Unallocated
Logical View
...
System Soft Size 17GB
Physical View
...
Regular Storage Pool Thin Storage Pool Hard Size Unallocatd
For a Regular Storage Pool, the system allocates an amount of hard space that is equivalent to the size defined for the pool by the administrator.
For a Thin Storage Pool, the system allocates only the amount of hard space requested by the administrator. This space is consumed as hosts issue writes to new areas of the constituent volumes, and may require dynamic expansion to achieve the soft space allocated to one or more of the volumes
30
The final reserved space within the regular Storage Pool shown in Figure 2-7 is dedicated for the snapshot usage. The diagram illustrates that the specified snapshot reserve capacity of 34 GB is effectively deducted from both the hard and soft space defined for the regular Storage Pool, thus guaranteeing that this space will be available for consumption collectively by the snapshots associated with the pool. Although snapshots consume space granularly at the partition level, as discussed in Storage Pool relationships on page 23, the snapshot reserve capacity is still defined in increments of 17 GB. The remaining 17 GB within the regular Storage Pool have not been allocated to either volumes or snapshots. Note that all soft capacity remaining in the pool is backed by hard capacity; the remaining unused soft capacity will always be less than or equal to the remaining unused hard capacity.
Regular Provisioning Example Storage Pool with Volumes
Volume 1 Allocated Soft Space The block definition allows hosts to see a precise number of blocks. Even for block defined volumes, the system allocates logical capacity in increments of 17GB. For a Regular Storage Pool, the soft size and hard size are equal.
Snapshot Reserve
Unused
Logical View
Physical View
34GB
Unused
Snapshot Reserve
The consumed hard space grows as host writes accumulate to new areas of the volume.
In a Regular Storage Pool, the maximum hard space available to be consumed by a volume is guaranteed to be equal to the soft size that was allocated.
Figure 2-7 Volumes and snapshot reserve space within a regular Storage Pool
31
require up to an additional 17 GB of hard capacity to become fully provisioned, and therefore, at least 34 GB of additional hard capacity must be allocated to this pool in anticipation of this requirement. Finally, consider the 34 GB of snapshot reserve space depicted in Figure 2-8. If a new volume is defined in the unused 17 GB of soft space in the pool, or if either Volume 3 or Volume 4 requires additional capacity, the system will sacrifice the snapshot reserve space in order to give priority to the volume requirements. Normally, this scenario does not occur, because additional hard space must be allocated to the Storage Pool as the hard capacity utilization crosses certain thresholds.
Unused
Logical View
Pool Soft Size = 136GB 17GB 34GB 34GB 51GB 34GB 34GB 17GB
For a Thin Storage Pool, the pool soft size is greater than the pool hard size. The snapshot reserve limits the maximum hard space that can be consumed by snapshots, but for a Thin Storage Pool it does not guarantee that hard space will be available. Volume 3 Consumed Hard Space Volume 3 Allocated Hard Space Volume 4 Consumed Hard Space Volume 4 Allocated Hard Space Snapshots Consumed Hard Space Unused This is the physical space consumed collectively by the snapshots in the pool. Since snapshots are differential at the partition level, multiple snapshots can potentially exist within a single 17GB increment of capacity.
Physical View
The consumed hard space grows as host writes accumulate to new areas of the volume. The system must allocate new 17GB increments to the volume as space is consumed.
In a Thin Storage Pool, the maximum hard space consumed by a volume is not guaranteed to be equal to the size that was allocated, because it is possible for the volumes in the pool to collectively exhaust all hard space allocated to the pool. This will cause the pool to be locked.
Figure 2-8 Volumes and snapshot reserve space within a thinly provisioned Storage Pool
32
Snapshot deletion
As mentioned previously, snapshots in regular Storage Pools can be automatically deleted by the system in order to provide space for newer snapshots, or in the case of thinly provisioned pools, to permit more physical space for volumes. For example, if you had created a Storage Pool with a soft size of 350 GB, a hard size of 250 GB and a snapshot reserve of 200 GB, when the volume hard size reaches more than 250 GB - 200 GB = 50 GB, space for the volumes is taken from the snapshot reserve. And if the space was already consumed by snapshots, several snapshots will be deleted. The snapshot deletion order is based on the deletion priority and creation time, as explained in 11.1, Snapshots on page 286.
Volume locking
If more hard capacity is still required after all the snapshots in a thinly provisioned Storage Pool have been deleted, all the volumes in the Storage Pool are locked (you can specify two
possible behaviors for a locked volume: either no I/O at all, or read only), thereby preventing any additional consumption of hard capacity.
Important: Volume locking prevents writes to all volumes in the Storage Pool.
It is very important to note that thin provisioning implementation in the XIV Storage System manages space allocation within each Storage Pool, so that hard capacity depletion in one Storage Pool will never affect the hard capacity available to another Storage Pool. There are both advantages and disadvantages: Because Storage Pools are independent, thin provisioning volume locking on one Storage Pool never cascades into another Storage Pool. Hard capacity cannot be reused across Storage Pools, even if a certain Storage Pool has free hard capacity available, which can lead to a situation where volumes are locked due to the depletion of hard capacity in one Storage Pool, while there is available capacity in another Storage Pool. Of course, it is still possible for the storage administrator to intervene in order to redistribute hard capacity.
33
software elements, empower the XIV Storage System to realize unprecedented resiliency. The resiliency of the architecture encompasses not only high availability, but also excellent maintainability, serviceability, and performance under non-ideal conditions resulting from planned or unplanned changes to the internal hardware infrastructure, such as the loss of a module.
Availability
The XIV Storage System maximizes operational availability and minimizes the degradation of performance associated with nondisruptive planned and unplanned events, while providing for the capability to preserve the data to the fullest extent possible in the event of a disaster.
High reliability
The XIV Storage System not only withstands individual component failures by quickly and efficiently reinstating full data redundancy, but also automatically monitors and phases out individual components before data redundancy is compromised. We discuss this topic in detail in Proactive phase-out and self-healing mechanisms on page 43. The collective high reliability provisions incorporated within the system constitute multiple layers of protection from unplanned outages and minimize the possibility of related service actions.
Maintenance freedom
While the potential for unplanned outages and associated corrective service actions are mitigated by the reliability attributes inherent to the system design, the XIV Storage Systems autonomic features also minimize the need for storage administrators to conduct non-preventative maintenance activities that are purely reactive in nature, by adapting to potential issues before they are manifested as a component failure. The continually restored redundancy in conjunction with the self-healing attributes of the system effectively enable maintenance activities to be decoupled from the instigating event (such as a component failure or malfunction) and safely carried out according to a predefined schedule. In addition to the systems diagnostic monitoring and autonomic maintenance, the proactive and systematic, rather than purely reactive, approach to maintenance is augmented, because the entirety of the logical topology is continually preserved, optimized, and balanced according to the physical state of the system. The modular system design also expedites the installation of any replacement or upgraded components, while the automatic, transparent data redistribution across all resources eliminates the downtime, even in the context of individual volumes, associated with these critical activities.
High availability
The rapid restoration of redundant data across all available drives and modules in the system during hardware failures, and the equilibrium resulting from the automatic redistribution of data across all newly installed hardware, are fundamental characteristics of the XIV Storage System architecture that minimize exposure to cascading failures and the associated loss of access to data.
Consistent performance
The XIV Storage System is capable of adapting to the loss of an individual drive or module efficiently and with relatively minor impact compared to monolithic architectures. While traditional monolithic systems employ an N+1 hardware redundancy scheme, the XIV Storage System harnesses the resiliency of the grid topology, not only in terms of the ability to sustain a component failure, but also by maximizing consistency and transparency from the perspective of attached hosts. The potential impact of a component failure is vastly reduced, because each module in the system is responsible for a relatively small percentage of the systems operation. Simply put, a controller failure in a typical N+1 system likely results
34
in a dramatic (up to 50%) reduction of available cache, processing power, and internal bandwidth, whereas the loss of a module in the XIV Storage System translates to only 1/15th of the system resources and does not compromise performance nearly as much as the same failure with a typical architecture. Additionally, the XIV Storage System incorporates innovative provisions to mitigate isolated disk-level performance anomalies through redundancy-supported reaction, which is discussed in Redundancy-supported reaction on page 44, and flexible handling of dirty data, which is discussed in Flexible handling of dirty data on page 44.
Disaster recovery
Enterprise class environments must account for the possibility of the loss of both the system and all of the data as a result of a disaster. The XIV Storage System includes the provision for Remote Mirror functionality as a fundamental component of the overall disaster recovery strategy. Refer to Chapter 12, Remote Mirror on page 323.
35
both primary and secondary copies of the same data. Write cache protection
Each module in the XIV Storage System contains an local, independent space reserved for caching operations within its system memory. Each module contains 8 GB of high speed volatile memory (a total of 120 GB), from which 5.5 GB (and 82.5 GB overall) is dedicated for caching data.
Note: The system does not contain non-volatile memory space that is reserved for write operations. However, the close proximity of the cache and the drives, in conjunction with the enforcement of an upper limit for dirty, or non-destaged, data on a per-drive basis, ensures that the full destage will occur while operating under battery power.
36
Power on sequence
Upon startup, the system will verify that the battery charge levels in all universal power supplies exceed the threshold necessary to guarantee that a graceful shutdown can occur. If the charge level is inadequate, the system will not begin servicing host I/O until the charge level has exceeded the minimum required threshold.
Goal distribution
The process of achieving a new goal distribution while simultaneously restoring data redundancy due to the loss of a disk or module is known as a rebuild. Because a rebuild occurs as a result of a component failure that compromises full data redundancy, there is a period during which the non-redundant data is both restored to full redundancy and homogeneously redistributed over the remaining disks.
37
The process of achieving a new goal distribution (only occurring when redundancy exists) is known as a redistribution, during which all data in the system (including both primary and secondary copies) is redistributed, when it is a result of: The replacement of a failed disk or module following a rebuild, also known as a phase-in. When one or more modules are added to the system, known as a scale out upgrade. While the XIV Storage System does not currently support the addition of new racks, the systems inherent virtualization capabilities naturally apply in this context as well. Following any of these occurrences, the XIV Storage System immediately initiates the following sequence of events: 1. The XIV Storage System distribution algorithms calculate which partitions must be relocated and copied based on the pseudo-random distribution that is described in 2.2.2, Logical parallelism on page 14. The resultant distribution table is known as the goal
distribution.
2. The Data Modules and Interface Modules begin concurrently redistributing and copying (in the case of a rebuild) the partitions according to the goal distribution: This process occurs in a parallel, any-to-any fashion concurrently among all modules and drives in the background, with complete host transparency. The priority associated with achieving the new goal distribution is internally determined by the system. The priority cannot be adjusted by the storage administrator: Rebuilds have the highest priority; however, the transactional load is homogeneously distributed over all the remaining disks in the system resulting in a very low density of system-generated transactions. Phase-outs (caused by the XIV technician removing and replacing a failed module) have lower priority than rebuilds, because at least two copies of all data exist at all times during the phase-out. Redistributions have the lowest priority, because there is neither a lack of data redundancy nor has the system detected the potential for an impending failure.
3. The system resumes steady-state operation after the goal distribution has been met. Following the completion of goal distribution resulting from a rebuild or phase-out, a subsequent redistribution must occur when the system hardware is fully restored through a phase-in. Note: The goal distribution is transparent to storage administrators and cannot be changed. In addition, the goal distribution has many determinants depending on the precise state of the system.
Important: Never perform a phase-in to replace a failed disk or module until after the rebuild process has completed. These operations must be performed by the IBM XIV technician anyway.
38
XIV Storage System. The proactive phase-out of non-optimal hardware through autonomic monitoring and the modules cognizance of the virtualization between the logical volumes and physical disks yield unprecedented efficiency, transparency, and reliability of data preservation actions, encompassing both rebuilds and phase-outs: The rebuild of data is many times faster than conventional RAID array rebuilds and can complete in a short period of time for a fully provisioned system, because the redistribution workload spans all drives in the system resulting in very low transactional density: Statistically, the chance of exposure to data loss or a cascading hardware failure, which occurs when corrective actions in response to the original failure result in a subsequent failure, is minimized due to both the brevity of the rebuild action and the low density of access on any given disk. Rebuilding conventional RAID arrays can take many hours to complete, depending on the type of the array, the number of drives, and the ongoing host-generated transactions to the array. The rebuild process can complete 25% to 50% more quickly for systems that are not fully provisioned, which equates to a rebuild completion in as little as 15 minutes. The system relocates only real data, as opposed to rebuilding the entire array, which consists of complete disk images that often include unused space, vastly reducing the potential number of transactions that must occur. Conventional RAID array rebuilds can place many times the normal transactional load on the disks and substantially reduce effective host performance. The number of drives participating in the rebuild is about 20 times greater than in most average-sized conventional RAID arrays, and by comparison, the array rebuild workload is greatly dissipated, greatly reducing the relative impact on host performance. Whereas standard dedicated spare disks utilized during a conventional RAID array rebuild might not be globally accessible to all arrays in the system, the XIV Storage System maintains universally accessible reserve space on all disks in the system, as discussed in Global spare capacity on page 21. Because the system maintains access density equilibrium, hot-spots are statistically eliminated, which reduces the chances of isolated workload-induced failures. The system-wide goal distribution alleviates localized drive stress and associated heat soak, which can significantly increase the probability of a double drive failure during the rebuild of a RAID array in conventional subsystems. Modules intelligently send information to each other directly. There is no need for a centralized supervising controller to read information from one disk module and write to another disk module. All disks are monitored for errors, poor performance, or other signs that might indicate that a full or partial failure is impending. Dedicated spare disks in conventional RAID arrays are inactive, and therefore, unproven and unmonitored, increasing the potential for a second failure during an array rebuild.
Rebuild examples
When the full redundancy of data is compromised due to a module failure, as depicted in Figure 2-10 on page 40, the system immediately identifies the non-redundant partitions and begins the rebuild process. Because none of the disks within a given module contain the secondary copies of data residing on any of the disks in the module, the secondary copies are read from the remaining modules in the system. Therefore, during a rebuild resulting from a module failure, there will be concurrently 168 disks (180 disks in the system minus 12 disks in a module) reading, and 168 disks writing, as is conceptually illustrated in Figure 2-10 on page 40.
Chapter 2. XIV logical architecture and concepts
39
Figure 2-11 on page 41 depicts a denser population of redundant partitions for both volumes A and B, thus representing the completion of a new goal distribution, as compared to Figure 2-10, which contains the same number of redundant partitions for both volumes distributed less densely over the original number of modules and drives. Finally, consider the case of a single disk failure occurring in an otherwise healthy system (no existing phased-out or failed hardware). During the subsequent rebuild, there will be only 168 disks reading, because there is no non-redundant data residing on the other disks within the same module as the failed disk. Concurrently, there will be 179 disks writing in order to preserve full data distribution. Note: Figure 2-10 and Figure 2-11 conceptually illustrate the rebuild process resulting from a failed module. The diagrams are not intended to depict in any way the specific
placement of partitions within a real system, nor do they literally depict the number of modules in a real system.
40
assigned to any existing volumes or consumed by snapshots, as measured before the failure, is unallocated hard capacity. For details about the topic of Storage Pool sizes, refer to Storage Pool-level thin provisioning on page 26. Do not confuse this
unallocated Storage Pool hard capacity with unconsumed capacity, which is unwritten hard space allocated to volumes. 3. Reserve spare capacity: As discussed previously, the system reserves enough capacity to sustain the consecutive, non-concurrent failure of three drives and an entire module before replacement hardware must be phased in to ensure that data redundancy can be restored during subsequent hardware failures. In the event that sufficient unallocated hard capacity is available, the system will withhold allocating reserve spare space to complete the rebuild or phase-out process in order to provide additional protection. As a result, it is possible for the system to report a maximum soft size that is temporarily less than the allocated soft capacity. The soft and hard system sizes will not revert to the original values until a replacement disk or module is phased-in, and the resultant redistribution completes.
41
Important: While it is possible to resize or create volumes, snapshots, or Storage Pools while a rebuild is underway, we strongly discourage these activities until the system has completed the rebuild process and restored full data redundancy.
Redistribution
The XIV Storage System homogeneously redistributes all data across all disks whenever new disks or modules are introduced or phased in to the system. This redistribution process is not equivalent to the striping volumes on all disks employed in traditional systems: Both conventional RAID striping, as well as the data distribution, fully incorporate all spindles when the hardware configuration remains static; however, when new capacity is added and new volumes are allocated, ordinary RAID striping algorithms do not intelligently redistribute data to preserve equilibrium for all volumes through the pseudo-random distribution of data, which is described in 2.2.2, Logical parallelism on page 14. Thus, the XIV Storage System employs dynamic volume-level virtualization, obviating the need for ongoing manual volume layout planning. The redistribution process is triggered by the phase-in of a new drive or module and differs from a rebuild or phase-out in that: The system does not need to create secondary copies of data to reinstate or preserve full data redundancy. The distribution density, or the concentration of data on each physical disk, decreases instead of increasing. The redistribution of data performs differently, because the concentration of write activity on the new hardware resource is the bottleneck: When a replacement module is phased-in, there will be concurrently 168 disks reading and 12 disks writing, and thus the time to completion is limited by the throughput of the replacement module. Also, the read access density on the existing disks will be extremely low, guaranteeing extremely low impact on host performance during the process. When a replacement disk is phased-in, there will be concurrently 179 disks reading and only one disk writing. In this case, the replacement drive obviously limits the achievable throughput of the redistribution. Again, the impact on host transactions is extremely small, or insignificant.
Disaster recovery
All high availability SAN implementations must account for the contingency of data recovery and business continuance following a disaster, as defined by the organizations recovery point and recovery time objectives. The provision within the XIV Storage System to efficiently and flexibly create nearly unlimited snapshots, coupled with the ability to define Consistency Groups of logical volumes, constitutes integral elements of the data preservation strategy. In addition, the XIV Storage Systems synchronous data mirroring functionality facilitates excellent potential recovery point and recovery time objectives as a central element of the full disaster recovery plan. Refer to 11.1, Snapshots on page 286 and 12.1, Remote Mirror on page 324.
42
Disk scrubbing
The XIV Storage System maintains a series of scrubbing algorithms that run as background processes concurrently and independently scanning multiple media locations within the system in order to maintain the integrity of the redundantly stored data. This continuous checking enables the early detection of possible data corruption, alerting the system to take corrective action to restore the data integrity before errors can manifest themselves from the host perspective. Thus, redundancy is not only implemented as part of the basic architecture of the system, but it is also continually monitored and restored as required. In summary, the data scrubbing process has the following attributes: Verifies the integrity and redundancy of stored data Enables early detection of errors and early recovery of redundancy Runs as a set of background processes on all disks in parallel Checks whether data can be read from partitions and verifies data integrity by employing checksums Examines a single disk partition every two seconds (note that here the term partition does not refer to a logical partition)
43
However, as implemented in XIV Storage System, the SMART diagnostic tools, coupled with intelligent analysis and low tolerance thresholds, provide an even greater level of refinement of the disk behavior diagnostics and the performance and reliability driven reaction. For instance, the XIV Storage System measures the specific values of parameters including, but not limited to:
Reallocated sector count: If the disk encounters a read or write verification error, it designates the affected sector as reallocated and relocates the data to a reserved area of spare space on the disk. Note that this spare space is a parameter of the drive itself and is not related in any way to the system reserve spare capacity that is described in Global spare capacity on page 21. The XIV Storage System initiates phase-out at a much lower count than the manufacturer recommends. Disk temperature: The disk temperature is a critical factor that contributes to premature
drive failure and is constantly monitored by the system.
Raw read error : The raw read error count provides an indication of the condition of the magnetic surface of the disk platters and is carefully monitored by the system to ensure the integrity of the magnetic media itself. Spin-up time: The spin-up time is a measure of the average time that is required for a
spindle to accelerate from zero to 7 200 rpm. The XIV Storage System recognizes abnormal spin-up time as a potential indicator of an impending mechanical failure. Likewise, for additional early warning signs, the XIV Storage System continually monitors other aspects of disk-initiated behavior, such as spontaneous reset or unusually long latencies. The system intelligently analyzes this information in order to reach crucial decisions concerning disk deactivation and phase-out. The parameters involved in these decisions allow for a very sensitive analysis of the disk health and performance.
Redundancy-supported reaction
The XIV Storage System incorporates redundancy-supported reaction, which is the provision to exploit the distributed redundant data scheme by intelligently redirecting reads to the secondary copies of data, thereby extending the systems tolerance of above average disk service time when accessing primary data locations. The system will reinstate reads from the primary data copy when the transient degradation of the disk service time has subsided. Of course, redundancy-supported reaction itself might be triggered by an underlying potential disk error that will ultimately be managed autonomically by the system according to the severity of the exposure, as determined by ongoing disk monitoring.
44
Chapter 3.
45
Figure 3-1 IBM XIV Storage System front and rear views
46
Module 15 (Data) Module 14 (Data) Module 13 (Data) Module 12 (Data) Module 11 (Data) Module 10 (Data) Module 9 (Data + Interface) Module 8 (Data + Interface) Module 7 (Data + Interface) Ethernet Switch, Maintenance Module 6 (Data + Interface) Module 5 (Data + Interface) Module 4 (Data + Interface) Module 3 (Data) Module 2 (Data) Module 1 (Data) UPS 3 UPS 2 UPS 1
Raw capacity 180 TB Usable capacity approximately 79 TB 120 GB of system memory per rack
(8 GB per module)
1 1U Maintenance Module 2 Redundant power supplies 2 48 port 1 Gbps Ethernet switches 3 UPS systems
Two 48 port 1Gbps Ethernet switches form the basis of an internal redundant Gigabit Ethernet that links all the modules in the system. The switches are installed in the middle of the rack between the Interface Modules. The connections between the modules and switches and also all internal power connections in the rack are realized by a redundant set of cables. For power connections, standard power cables and plugs are used. Additionally, standard Ethernet cables are used for interconnection between the modules and switches. All 15 modules (six Interface Modules and nine Data Modules) have redundant connections through two 48-port 1Gbps Ethernet switches. This grid network ensures communication between all modules even if one of the switches or a cable connection fails. Furthermore, this grid network provides the capabilities for parallelism and the execution of a data distribution algorithm that contribute to the excellent performance of the XIV Storage System.
47
Note: For the same reason that the system is not dependent on specially developed parts, there might be differences in the hardware components that are used in your particular system compared with those components described next.
The rack
The IBM XIV hardware components are installed in a 482.6 mm (19 inches) NetShelter SX 42U rack (APC AR3100) from APC. The rack is 1070 mm (42 inches) deep to accommodate deeper size modules and to provide more space for cables and connectors. Adequate space is provided to house all components and to properly route all cables. The rack door and side panels are locked with a key to prevent unauthorized access to the installed components. For detailed dimensions and the weight of the rack and its components, refer to 4.3, Physical planning on page 67.
The Uninterruptible Power Supply (UPS) module complex consists of three UPS units. Each unit maintains an internal power supply in the event of temporal failure of the external power supply. In case of extended external power failure or outage, the UPS module complex maintains battery power long enough to allow a safe and ordered shutdown of the XIV Storage System. The complex can sustain the failure of one UPS unit, while protecting against external power disorders. Figure 3-4 on page 49 shows the UPS.
48
The three UPS modules are located at the bottom of the rack. Each of the modules has an output of 6 kVA to supply power to all other components in the rack and is 3U in height. The design allows proactive detection of temporary power problems and can correct them before the system goes down. In the case of a complete power outage, integrated batteries continue to supply power to the entire system. Depending on the load of the IBM XIV, the batteries are designed to continue system operation from 3.3 minutes to 11.9 minutes, which gives you enough time to gracefully power off the system.
49
In case of power problems or a failing UPS, the ATS reorganizes the power load balance between the power components. The operational components take over the load from the failing power source or power supply. This rearrangement of the internal power load is performed by the ATS in a seamless way, and system operation continues without any application impact. Note that if you do not have the two 60 amp power feeds normally required and use instead four 30 amp power feeds (feature code (FC) 9899), two of the lines will go to the ATS, which is then only connected to UPS unit 2. One of the other two lines goes to UPS unit 1 and the other line goes to UPS unit 2 as seen in Figure 3-5 on page 49.
30A Service
30A rated
#1
UPS
3U
Pigtail
30A Service 30A rated
ATS
30A Service
#2
UPS
3U
30A Service
30A rated
#3
UPS
3U
50
Data Module
The fully populated rack hosts nine Data Modules (Module 1-3 and Module 10-15). There is no difference in the hardware between Data Modules and Interface Modules (refer to Interface Module on page 54) except for the additional host adapters and GigE adapters in the Interface Modules. The main components of the module that are shown in Figure 3-7 in addition to the 12 disk drives are: System planar Processor Memory/cache Enclosure Management Card Cooling devices (fans) Memory Flash Card Redundant power supplies In addition, each Data Module contains four redundant Gigabit Ethernet ports. These ports together with the two switches form the internal network, which is the communication path for data and metadata between all modules. One Dual GigE adapter is integrated in the System Planar (port 1 and 2). The remaining two ports (3 and 4) are on an additional Dual GigE adapter installed in a PCIe slot as seen in Figure 3-8 on page 52.
51
Data Module
2 x On-board GigE Serial Dual-port GigE
4 x USB
Switch N1
Switch N2
System planar
The system planar used in the Data Modules and the Interface Modules is a standard ATX board from Intel. This high-performance server board with a built-in serial-attached SCSI (SAS) adapter supports: 64-bit quad-core Intel Xeon processor to improve performance and headroom and to provide scalability and system redundancy with multiple virtual applications Eight fully buffered 533/667 MHz dual inline memory modules (DIMMs) to increase capacity and performance Dual Gb Ethernet with Intel I/O Acceleration Technology to improve application and network responsiveness by moving data to and from applications faster Four PCI Express slots to provide the I/O bandwidth needed by servers SAS adapter
Processor
The processor is a Xeon Quad Core Processor. This 64-bit processor has the following characteristics: 2.33 GHz clock 12 MB cache 1.33 GHz Front Serial Bus
Memory/Cache
Every module has 8 GB of memory installed (8 x 1 GB FBDIMM). Fully Buffered DIMM memory technology increases reliability, speed, and density of memory for use with Xeon Quad Core Processor platforms. This processor memory configuration can provide three times higher memory throughput, enable increased capacity and speed to balance
52
capabilities of quad core processors, perform reads and writes simultaneously, and eliminate the previous read to write blocking latency. Part of the memory is used as module system memory, while the rest is used as cache memory for caching data previously read, pre-fetching of data from disk, and for delayed destaging of previously written data. For a description of the cache algorithm, refer to Write cache protection on page 36.
Cooling devices
To provide enough cooling for the disks, processor, and board, the system includes 10 fans located between the disk drives and the board. The cool air is aspirated from the front of the module through the disk drives. An air duct leads the air around the processor before it leaves the module through the back. The air flow and the alignment of the fans assure proper cooling of the entire module, even if a fan is failing.
This card is the boot device of the module and contains the software and module configuration files. Important: Due to the configuration files, the Compact Flash Card is not interchangeable between modules.
Power supplies
Figure 3-10 on page 54 shows the redundant power supplies.
53
The modules are powered by an Astec redundant Power Supply Unit (PSU) cage with a dual 850W PSU assembly as seen in Figure 3-10. These power supplies are redundant and can be individually replaced. Consequently, a power supply failure will not cause an outage, and also, there is no need to stop the system to replace it. The power supply is a field-replaceable unit (FRU).
Interface Module
Figure 3-11shows an Interface Module with iSCSI ports.
4 x USB
Switch N1
Switch N2
54
The Interface Module is similar to the Data Module. The only differences are: Each Interface Module contains iSCSI and Fibre Channel ports, through which hosts can attach to the XIV Storage System. These ports can also be used to establish Remote Mirror links with another remote XIV Storage System. There are two 4-port GigE PCIe adapters installed for additional internal network connections and also for the iSCSI ports. Refer to Figure 3-11 on page 54 and Figure 3-12.
Interface Module without iSCSI ports
Quad-port GigE 2 x On-board GigE Serial
4 x USB
Switch N1
Switch N2
All Fibre Channel ports, iSCSI ports, and Ethernet ports used for external connections are internally connected to a patch panel where the external cables are actually hooked up. Refer to 3.2.4, The patch panel on page 59. There are six Interface Modules (modules 4-9) available in the rack.
55
This Fibre Channel host bus adapter (HBA) is LSIs powerful FC949E controller and features full-duplex capable FC ports that automatically detect connection speed and can independently operate at 1Gbps, 2Gbps, or 4Gbps. The ability to operate on slower speeds ensures that these adapters remain fully compatible with existing equipment. This adapter also supports new end-to-end error detection through a cyclic redundancy check (CRC) for improved data integrity during reads and writes.
iSCSI connectivity
There are six iSCSI service ports (two ports per Interface Module) available for iSCSI over IP/Ethernet services. These ports are available in Interface Modules 7, 8, and 9 supporting the 1Gbps Ethernet host connection (refer to Figure 3-12 on page 55). These ports need to connect through the patch panel to the users IP network and provide connectivity to the iSCSI hosts. You can operate iSCSI connections for various functionalities: As an iSCSI target that the server hosts through the iSCSI protocol As an iSCSI initiator for Remote Mirroring when connected to another iSCSI port As an iSCSI for data migration when connected to a third-party iSCSI storage system For XCLI and GUI access over the iSCSI ports iSCSI ports can be defined for various uses: Each iSCSI port can be defined as an IP interface. Groups of Ethernet iSCSI ports on the same module can be defined as a single link aggregation group (IEEE standard: 802.3ad): Ports defined as a link aggregation group must be connected to the same Ethernet switch, and a parallel link aggregation group must be defined on that Ethernet switch. Although a single port is defined as a link aggregation group of one, IBM XIV support can override this configuration if this setup cannot operate with the clients Ethernet switches. For each iSCSI IP interface, you can define these configuration options: 56 IP address (mandatory) Network mask (mandatory) Default gateway (optional) MTU; Default: 1 536; Maximum: 8 192 MTU
The IBM XIV was engineered with substantial protection against data corruption and data loss, thus not just relying on the sophisticated distribution and reconstruction methods that were described in Chapter 2, XIV logical architecture and concepts on page 7. Several features and functions implemented in the disk drive also increase reliability and performance. We describe the highlights next.
57
58
Fibre Channel connections to the six Interface Modules. Each Interface Module has two Fibre Channel adapters with two ports. Thus, four FC ports per Interface Module are available at the patch panel.
iSCSI connections to Interface Modules 7, 8, and 9. There are two iSCSI connections for each module.
Connections to client network for system management with the GUI or Extended Command Line Interface (XCLI) Ports for VPN connections Connected to client network Service ports for IBM service support representative (SSR) For connection to the maintenance console Reserved
59
The Dell PowerConnect 6248 is a Gigabit Ethernet Layer 3-Switch with 48 copper and four combined ports (small form-factor pluggable (SFP) or 10/100/10000), robust stacking, and 10 Gigabit-Ethernet uplink capability. The switches are powered by Dell RPS-600 redundant power supplies to eliminate the switch power supply as a single point of failure.
60
Modem
The modem installed in the rack is needed and used for remote support. It enables the IBM XIV Support Center specialists and, if necessary, a higher level of support to connect to the XIV Storage System. Problem analysis and repair actions without a remote connection are complicated and time-consuming.
Maintenance module
A 1U remote support server is also required for the full functionality and supportability of the IBM XIV. This device has fairly generic requirements, because it is only used to gain remote access to the device through VPN or a modem for the support personnel. The current choice for this device is a SuperMicro 1U server with an average commodity level configuration.
61
logical architecture that is described in Chapter 2, XIV logical architecture and concepts on page 7 makes the XIV Storage System extremely resilient to outages.
62
Chapter 4.
63
4.1 Overview
For a smooth and efficient installation of the XIV Storage System Model A14, planning and preparation tasks must take place well before the system is scheduled for delivery and installation in the data center. There are four major areas involved in installation and installation planning: Ordering the IBM XIV hardware Select required features Physical site planning: Space requirements, dimensions, and weight Raised floor requirements Power requirements, cooling, cabling, and additional equipment Configuration planning: Basic configurations Network connections Management connections Remote Mirroring configuration Installation: Physical Installation Basic configuration
64
Table 4-1 Feature code overview Feature code 9000 9101 9800 9801 9802 9803 9804 Description Remote Support Server Modem Single phase power US/Canada/LA/AP line cord US Chicago line cord EMEA line cord Israel line cord
Notes: Line cords are 250V/60A-rated. Each cord has two poles and three wires. Conductor size for non-EMEA and Chicago line cords is 6 AWG. Conductor size for EMEA and Israel line cords is 10 mm (.3937 inches). Two line cord plugs are IEC309 complaint. Chicago line cord extends1.8 m (6 ft) when exiting the frame from the bottom and 1.6 m (5 ft 4 inches) when exiting the frame from the top. All other cords extend 4.3 m (14 ft) when exiting the frame from the bottom and 4.1 m (13 ft 4 inches) when exiting the frame from the top. All installations will require wall circuit breakers having ratings of 50A to 60A. Do not exceed wire rating facility. 9820 9821 9899 Top Exit Cables Bottom Exit Cables 30 amp Line cord
Feature codes
Specific details about these feature codes include: FC 9000 and FC 9101 These features are required to enable IBM remote support personnel to connect to your XIV Storage System for monitoring and repair. For more details about the system maintenance, refer to Chapter 10, Monitoring on page 249. FC 98nn Depending on where the system will be geographically located, be sure to select the correct feature code to get the appropriate power cables and suitable connectors for your region. FC 9820 and FC 9821: These feature codes specify whether the exit for all external cables is on the top or on the bottom of the IBM XIV rack (APC 482.6 mm (19 inches) NetShelter SX 42U rack). FC 9820 is best suited for data centers without a raised floor and a cable routing above the machines in a cable duct. FC 9821 is for installing the machine in data centers with raised floors, which are the most common types of data centers.
65
FC 9899 30 amp power line cord. This feature is required if you cannot provide 60 amp power feeds at your site (refer to Figure 3-6 on page 50).
The system will be delivered with the same hardware configuration as a non-CoD system. Feature code 1119 represents the amount of usable storage per IBM XIV Data Module. The initial order must include a minimum quantity of four FC 1119. After the initial order, additional FC 1119 can be ordered in any increment up to the maximum of 15. There will not be any restriction on the amount of usable storage. The system will be shipped with 79 TB of potentially usable storage, and the IBM XIV will use the Call Home feature (e-mail) to provide reports to IBM about the actual allocated storage. If the licensed capacity is exceeded, IBM will notify you either to reduce the amount of space used or to buy additional FC 1119. This Call Home-based process is the reason why FC9000 (RSM) and FC9101 (modem) are required for CoD configurations.
66
Rack dimension:
100 cm 120 cm 60 cm
Sides: Not closer than 45 cm to a wall, but can have adjacent racks.
67
For detailed information and further requirements, refer to the IBM XIV Installation Planning Guide.
Power requirements: Two 220V 60 amp power feeds (or four 30 amp power feeds, which require FC 9899) 7.5kW, 9kW peak Correct power connector (FC 980x) according to your local requirements Cooling requirements: More than 24K BTU/hr. Adequate airflow on the front and on the back of the box Delivery requirements: Clear and level path to bring the box in the computer room Clearance for an upright rack from the truck to the building, doors, and elevators
68
Fill in all information to prevent further inquiry and delays during the installation (refer to 4.8, IBM XIV installation on page 77): Interface Module All three Interface Modules need an IP address, Netmask, and Gateway. This address is needed to manage and monitor the IBM XIV with either the GUI or Extended Command Line Interface (XCLI). Each Interface Module needs a separate IP address in case a module is failing. DNS server If Domain Name System (DNS) is used in your environment, the IBM XIV needs to have the IP address, Netmask, and Gateway from the primary DNS server and, if available, also from the secondary server. SMTP Gateway The Simple Mail Transfer Protocol (SMTP) Gateway is needed for event notification through e-mail. IBM XIV can initiate an e-mail notification, which will be sent out through the configured SMTP Gateway (IP Address or server name, Netmask, and Gateway)
69
NTP (Time server) IBM XIV can be used with a Network Time Protocol (NTP) time server to synchronize the system time with other systems. To use this time server, IP Address, or server name, Netmask and Gateway need to be configured. Time zone Usually the time zone depends on the location where the system is installed. But, exceptions can occur for remote locations where the time zone equals the time of the host system location. E-mail sender address This is the e-mail address that is shown in the e-mail notification as the sender. Remote access/virtual private network (VPN) The modem number or an external IP Address needs to be configured for remote support. The IBM XIV support center needs to connect to the machine in case of problems. Refer to 10.2, Call Home and remote support on page 273. This basic configuration data will be entered in the system by the IBM SSR following the physical installation. Refer to 4.8.2, Basic configuration on page 78. Other configuration tasks, such defining Storage Pools, volumes, and hosts, are the responsibility of the user and are described in Chapter 5, Configuration on page 79.
High-availability configuration
To configure the Fibre Channel connections (SAN) for high availability, refer to the configuration illustrated in Figure 4-4. This configuration is highly recommended for all production systems to maintain system access and operations following a single hardware element or SAN component failure. Note that the connections depicted show only an example, and all Interface Modules can be used in this configuration.
IM 9
Switch 1
IM 7
Switch 2
For a high-availability Fibre Channel configuration, use the following guidelines: Each XIV Storage System Interface Module is connected to two Fibre Channel switches, using two ports of the module. Each host is connected to two switches using two host bus adapters or a host bus adapter (HBA) with two ports. This configuration assures full connectivity and no single point of failure: Switch failure: Each host remains connected to all modules through the second switch. Module failure: Each host remains connected to the other modules. Cable failure: Each host remains connected through the second physical link.
71
IM 9
Switch
IM 7
This configuration is resilient to the failures of a single Interface Module, host bus adapter, and cables. However in this configuration, the switch represents a single point of failure; if the switch goes down due to a hardware failure or simply because of a software update, the connected hosts will lose access.
72
IP configuration
The configuration of the XIV Storage System iSCSI connection is highly dependent on your network. In the high availability configuration, the two client-provided Ethernet switches used for redundancy can be configured as either two IP subnets or as part of the same subnet. The XIV Storage System iSCSI configuration must match the clients network. You must provide the required following configuration information for each Ethernet port: You must decide whether to configure the two ports of each module as either one logical Ethernet port (link aggregation) or as two independent Ethernet ports. This decision affects both the systems configuration and the switches configuration. IP address Net mask MTU (optional) Maximum Transmission Unit (MTU) configuration is required if your network supports an MTU, which is larger than the standard one. The largest possible MTU must be specified
Chapter 4. Physical planning and installation
73
(we advise you to use up to 9 000 bytes, if supported by the switches and routers). If the iSCSI hosts reside on a different subnet than the XIV Storage System, a default IP gateway per port must be specified. Default gateway (optional) Because XIV Storage System always acts as a TCP server for iSCSI connections, packets are always routed through the Ethernet port from which the iSCSI connection was initiated. The default gateways are required only if the hosts do not reside on the same layer-2 subnet as the XIV Storage System. The IP network configuration must be ready to ensure connectivity between the XIV Storage System and the host prior the physical system installation: If required, Ethernet switches that connect to two ports on the same module must have their ports configured as a link aggregation group (with a parallel configuration on the XIV Storage System). Ethernet virtual local area networks (VLANs), if required, must be configured correctly to enable access between hosts and the XIV Storage System. IP routers (if present) must be configured correctly to enable access between hosts and the XIV Storage System.
Management IP configurations
For each of the three management ports, you must provide the following configuration information to the IBM SSR upon installation (refer also to 4.4, Basic configuration planning on page 69): IP address of the port Net mask Default IP gateway
74
The following system-level IP information should be provided (not port-specific): IP address of the primary and secondary DNS servers IP address or DNS names of the SNMP manager, if required IP address or DNS names of the Simple Mail Transfer Protocol (SMTP) servers
Protocols
The XIV Storage System is managed through dedicated management ports running TCP/IP over Ethernet. Management is carried out through the following protocols (consider this design when configuring firewalls and other security protocols): Proprietary IBM XIV protocols are used to manage XIV Storage System from the GUI and the XCLI. This management communication is performed over TCP port 7778 where the GUI/CLI, as the client, always initiates the connection, and the XIV Storage System performs as the server. IBM XIV Storage System responds to SNMP management packets. IBM XIV Storage System initiates SNMP packets when sending traps to SNMP managers. IBM XIV Storage System initiates SMTP traffic when sending e-mails (for either event notification through e-mail or for e-mail-to-SMS gateways).
75
Remote Target Connectivity enables the communication topology between a local storage
system and a remote storage system in order to enable Remote Mirroring and data migration capabilities. Important: When defining mirroring of a volume on a remote system, the local system must be defined as a remote target on the remote system. Defining the local system as a remote target on the remote system is required, because mirroring roles can be switched and, therefore, all definitions must be symmetric.
Remote Target:
Determine the protocol that will be used for remote connectivity, either iSCSI or Fibre Channel (FC). Each remote target is available through only one of these protocols. To change the protocol the target definition must be deleted and then redefined. If the remote target is an IBM XIV then remote mirroring and data migration are supported, otherwise only data migration can be used with the remote target.
Connectivity:
Connectivity between source and target storage system is defined between specific physical modules on a source storage system and port set on the target system. The system automatically uses the required port of the local module on the local storage system.
To get more detailed information about Remote Target Connectivity, refer to Chapter 12, Remote Mirror on page 323.
76
77
78
Chapter 5.
Configuration
This chapter discusses the tasks to be performed by the storage administrator to configure the XIV Storage System using the XIV Management Software. We provide step-by-step instructions covering the following topics, in this order: Install and customize the XIV Management Software Connect to and manage XIV using graphical and command line interfaces Organize system capacity by Storage Pools Create and manage volumes in the system Create and maintain hosts and clusters Allocate logical unit numbers (LUNs) to hosts or clusters Create integrated scripts
79
80
At the time of writing, the XIV Storage Manager Version 2.2.42 was available and later GUI releases might slightly differ in appearance. Perform the following steps to install the XIV Storage Management software: 1. Locate the XIV Storage Manager installation file (either on the installation CD or a copy you downloaded from the Internet). Running the installation file first shows the welcome window displayed in Figure 5-1. Click Next.
Chapter 5. Configuration
81
2. A Setup dialog window is displayed (Figure 5-2) where you can specify the installation directory. Keep the default installation folder or change it accordingly to your needs. When done, click Next.
3. The next installation dialog is displayed. You can choose between a FULL installation method or just a command line interface installation method. We recommend that you choose FULL installation as shown in Figure 5-3. In this case, the Graphical User Interface and the Command Line Interface as well will be installed. Click Next.
82
4. The next step is to specify the Start Menu Folder as shown in Figure 5-4. When done, click Next.
5. The dialog shown in Figure 5-5 is displayed. Select the desktop icon placement and click Next.
Chapter 5. Configuration
83
6. The dialog window shown in Figure 5-6 is displayed. The XIV Storage Manager requires the Java Runtime Environment Version 6, which will be installed during the setup if needed. Click Finish.
If the computer on which the XIV GUI is installed is connected to the Internet, a window might appear to inform you that a new software upgrade is available. Click OK to download and install the new upgrade, which normally only requires a few minutes and will not interfere with your current settings or events.
84
Basic configuration
XIV Storage Management Software Install
XGUI
Advanced configuration
XCLI
Data Migration
Security
XCLI Scripting
After the installation and customization of the XIV Management Software on a Windows, Linux, or Mac OS management workstation, a physical Ethernet connection must be established to the XIV Storage System itself. The Management workstation is used to: Execute commands through the XCLI interface Control the XIV Storage System through the GUI Send e-mail notification messages and Simple Network Management Protocol (SNMP) traps upon occurrence of specific events or alerts. To ensure management redundancy in case of Interface Module failure, the XIV Storage System management functionality is accessible from three IP addresses. Each of the three IP addresses is linked to a different (hardware) Interface Module. The various IP addresses are transparent to the user, and management functions can be performed through any of the IP addresses. These addresses can also be used simultaneously for access by multiple management clients. Users only need to configure the GUI or XCLI for the set of IP addresses that are defined for the specific system. Note: All management IP interfaces must be connected to the same subnet and use the same: Network mask Gateway Maximum Transmission Unit (MTU)
Chapter 5. Configuration
85
The XIV Storage System management connectivity system allows for users to manage the system from both XCLI and GUI interfaces. Accordingly, the XCLI and GUI can be configured to manage the system through IP interfaces. Both XCLI and GUI management run over TCP port 7778, with all traffic encrypted through the Secure Sockets Layer (SSLv3) or Transport Layer Security (TLS 1.0) protocol. Note: Both XCLI and GUI management run over TCP port 7778 with all traffic encrypted through the Secure Sockets Layer (SSL).
To connect to an XIV Storage System, you must initially add the system to make it visible in the GUI by specifying its IP addresses. To add the system: 1. Make sure that the management workstation is set up to have access to the LAN subnet where the XIV Storage System resides. Verify the connection by pinging the IP address of the XIV Storage System. If this is the first time you start the GUI on this management workstation and no XIV Storage System had been previously defined to the GUI, the Add System Management dialog window is automatically displayed: If the default IP address of the XIV Storage System was not changed, check Use Predefined IP, which populates the IP/DNS Address1 field with the default IP address. Click Add to effectively add the system to the GUI. Refer to Figure 5-9 on page 87.
86
If the default IP address had already been changed to a client-specified IP address (or set of IP addresses, for redundancy), you must enter those addresses in the IP/DNS Address fields. Click Add to effectively add the system to the GUI. Refer to Figure 5-10.
2. You are now returned to the main XIV Management window. Wait until the system is displayed and shows as enabled. Under normal circumstances, the system will show a status of Full Redundancy displayed in a green label box. 3. Move the mouse cursor over the image of the XIV Storage System and click to open the XIV Storage System Management main window as shown in Figure 5-11 on page 88.
Chapter 5. Configuration
87
The XIV Storage Management GUI is mostly self-explanatory with a well-organized structure and simple navigation.
Toolbar
Menu bar
User indicator
Function Icons
Main display
The main window is divided into the following areas: Function icons: Located on the left side of the main window, you find a set of vertically stacked icons that are used to navigate between the functions of the GUI, according to the icon selected. Moving the mouse cursor over an icon brings up a corresponding option menu. The various menu options available from the function icons are presented in Figure 5-12 on page 89. Main display: It occupies the major part of the window and provides graphical representation of the XIV Storage System. Moving the mouse cursor over the graphical representation of a specific hardware component (module, disk, and Uninterruptible Power Supply (UPS) unit) brings up a status callout. When a specific function is selected, the main display shows a tabular representation of that function. Menu bar: It is used for configuring the system and as an alternative to the Function icons for accessing the various functions of the XIV Storage System. Toolbar: It is used to access a range of specific actions linked to the individual functions of the system.
88
Status bar indicators: They are located at the bottom of the window. This area indicates the overall operational levels of the XIV Storage System: The first indicator on the left shows the amount of soft or hard storage capacity currently allocated to Storage Pools and provides alerts when certain capacity thresholds are reached. As the physical, or hard, capacity consumed by volumes within a Storage Pool passes certain thresholds, the color of this meter indicates that additional hard capacity might need to be added to one or more Storage Pools. The second indicator (in the middle) displays the number of I/O operations per second (IOPS). The third indicator on the far right shows the general system status and will, for example, indicate when a redistribution is underway. Additionally, an Uncleared Event indicator is visible when events occur for which a repetitive notification was defined that has not yet been cleared in the GUI (these notifications are called Alerting Events).
Monitor define the general System connectivity and monitor overall system activity
Pools configure the features provided by the XIV Storage System for hosts and their connectivity
Remote management define the communication topology between a local and a remote storage system
Access management access control system that specifies defined user roles to control access Figure 5-12 Menu items in XIV Storage Management software
Chapter 5. Configuration
89
Tip: The configuration information regarding the connected systems and the GUI itself is stored in various files under the users home directory. As a useful and convenient feature, all the commands issued from the GUI are saved in a log in the format of XCLI syntax. The default location is in the Documents and Setting folder of the Microsoft Windows current user, for example: C:\Documents and Settings\<YOUR LOGGED IN USER>\Application Data\XIV\GUI10\logs\nextraCommands_*.log
Invoking the XCLI in order to define configurations: In these invocations, the XCLI utility is used to define configurations. A configuration is a mapping between a user-defined
name and a list of three IP addresses. This configuration can be referenced later in order to execute a command without having to specify the system IP addresses (refer to next method in this list). These various configurations are stored on the local host running the XCLI utility and must be defined again for each host.
Invoking the XCLI to execute a command: This method is the most basic and important
type of invocation. Whenever invoking an XCLI command, you must also provide either the systems IP addresses or a configuration name.
Invoking the XCLI for general purpose functions: These invocations can be used to get
the XCLIs software version or to print the XCLIs help text. The command to execute is generally specified along with parameters and their values. A
script can be defined to specify the name and path to the commands file (lists of commands
will be executed in User Mode only). For complete and detailed documentation of the IBM XIV Storage Manager XCLI, refer to the XCLI Reference Guide, GC27-2213-00.
2. Create a safe working directory for XCLI (for example, create the XCLI directory in Document and Settings). 3. Customize the xcli_admin icon: a. Right-click the icon and select Properties.
90
b. On the Shortcut tab: Target: %SystemRoot%\system32\cmd.exe /k cd C:\Documents and Settings\Administrator\My Documents\xcli && setup Start in: c:\Program Files\XIV\GUI10 c. On the Options tab: Check QuickEdit mode d. On the Layout tab: Screen buffer size Width: 160 Window size Width: 120 Height: 40 e. Click Apply and click OK. Be informed that setup (highlighted in bold in step 3b) represents a batch program. This batch program is described later in this section and is used to store relevant environment variables. Refer to Example 5-2 on page 92. As part of XIVs high-availability features, each system is assigned three IP addresses. When executing a command, the XCLI utility is provided with these three IP addresses and tries each of them sequentially until communication with one of the IP addresses is successful.You must pass the IP addresses (IP1, IP2, and IP3) with each command. To avoid too much typing and having to remember IP addresses, you can use instead a a predefined configuration name. Note: When executing a command, you must specify either a configuration or IP addresses, but not both. To issue a command against a specific XIV Storage System, you also need to supply the username and the password for it. The default user is admin and the default password is adminadmin, which can be used with the following parameters: -u user or -user sets the user name that will be used to execute the command. -p password or -password is the XCLI password that must be specified in order to execute a command in the system. -m IP1 [-m IP2 [-m IP3]] defines the IP addresses of the Nextra system Example 5-1 illustrates a common command execution syntax on a given XIV Storage System.
Example 5-1 Simple XCLI command
xcli -u admin -p adminadmin -m 149.168.100.101 user_list Managing the XIV Storage System by using the XCLI always requires that you specify these same parameters. To avoid repetitive typing, you can instead define and use specific environment variables. We recommend that you create a batch file in which you set the value for those specific environment variables, which is shown in Example 5-2 on page 92.
Chapter 5. Configuration
91
@echo off set XCLI_CONFIG_FILE=C:\Documents and Settings\hu02230\My Documents\xiv\xcliconfigs.xml set XIV_XCLIUSER=admin set XIV_XCLIPASSWORD=adminadmin xcli -L
The XCLI utility requires user and password options. If user and passwords are not specified, the default environment variables XIV_XCLIUSER and XIV_XCLIPASSWORD are utilized. If neither command options nor environment variables are specified, commands are run with the user defined in config_set default_user=XXXX. This allows smooth migration to the IBM XIV Software System for clients that do not have defined users. The configurations are stored in a file under the users home directory. A different file can be specified by -f or --file switch (applicable to configuration creation, configuration deletion, listing configurations, and command execution). Alternatively, the environment variable XCLI_CONFIG_FILE, if defined, determines the files name and path. We recommend that you create a configuration file as shown in Example 5-3.
Example 5-3 Create a configuration file
Create an empty file C:\Documents and Settings\Administrator\My Documents\xcli\xcliconfigs.xml Create XCLI config file xcli -f "C:\Documents and Settings\Administrator\My Documents\xcli\xcliconfigs.xml" -a Redbook -m <IP1> [m <IP2> [m <IP3>]] After this specification, the shortened command syntax works as shown in Example 5-4.
Example 5-4 Short command syntax
xcli -c Redbook user_list The default IP address for XIV Storage System is 14.10.202.250.
xcli xcli -c Redbook help xcli -c Redbook help command=help format=full The first command prints out the usage of xcli. The second one prints all the commands that can be used by the user in that particular system. The third one shows the usage of the help command itself with all the parameters. As mentioned in the output of a simple XCLI command, there are different parameters to get the result of a command in a predefined format. The default is the user readable format.
92
Specify the -s parameter to get it in a comma-separated format or specify the -x parameter to obtain an XML format. Note: The XML format contains all the fields of a particular command. The user and the comma-separated formats provide just the default fields as a result. To specify the required fields of a command, use the -t parameter as shown here: xcli -c Redbook -t name,fields help command=user_list
Improved regulation of storage space: Automatic snapshot deletion occurs when the storage capacity limit is reached for each Storage Pool independently. Therefore, when a Storage Pools size is exhausted, only the snapshots that reside in the affected Storage Pool are deleted.
The size of Storage Pools and the associations between volumes and Storage Pools are constrained by: The size of a Storage Pool can range from as small as possible (17.1 GB) to as large as possible (the entire system) without any limitation. The size of a Storage Pool can always be increased, limited only by the free space on the system. The size of a Storage Pool can always be decreased, limited only by the space already consumed by the volumes and snapshots in that Storage Pool. Volumes can be moved between Storage Pools without any limitations, as long as there is enough free space in the target Storage Pool. Important: All of these operations are handled by the system at the metadata level, and they do not cause any data movement (copying) from one disk drive to another. Hence, they are completed almost instantly and can be done at any time without impacting the applications.
Thin provisioned pools Thin provisioning is the practice of allocating storage on a just-in-time and as needed
basis by defining a logical, or soft, capacity that is larger than the physical, or hard, capacity. Thin provisioning enables XIV Storage System administrators to manage capacity based on the total space actually consumed rather than just the space allocated. Thin provisioning can be specified at the Storage Pool level. Each thinly provisioned pool has its own hard capacity (which limits the actual disk space that can be effectively consumed) and soft capacity (which limits the total logical size of volumes defined).
Chapter 5. Configuration
93
Hard pool size: The hard pool size represents the physical storage capacity allocated to
volumes and snapshots in the Storage Pool. The hard size of the Storage Pool limits the total of the hard volume sizes of all volumes in the Storage Pool and the total of all storage consumed by snapshots.
Soft Pool size: This size is the limit on the total soft sizes of all the volumes in the Storage
Pool. The soft pool size has no effect on snapshots. For more detailed information about the concept of XIV thin provisioning and a detailed discussion of hard and soft size for Storage Pools and volumes, refer to 2.3.4, Capacity allocation and thin provisioning on page 24. When using the GUI, you specify what type of pool is desired (Regular Pool or a Thin Provisioned Pool) when creating the pool. Refer to Creating Storage Pools on page 96. When using the XCLI, you create a thinly provisioned pool by setting the soft size to a greater value than its hard size. In case of changing requirements, the pools type can be changed (non-disruptively) later. Tip: The thin provisioning management is performed individually for each Storage Pool, and running out of space in one pool does not impact other pools.
To view overall information about the Storage Pools, select Pools from the Pools menu shown in Figure 5-13 to display the Storage Pool window seen in Figure 5-14 on page 95.
94
The Storage Pools GUI window displays a table of all the pools in the system combined with a series of gauges for each pool. This view gives the administrator a quick grasp and general overview of essential information about the system pools. The capacity consumption by volumes and snapshots within a given Storage Pool is indicated by different colors: Green is the indicator for consumed capacity below 80%. Yellow represents a capacity consumption above 80%. Orange is the color for a capacity consumption of over 90%. Any Storage Pool with depleted hard capacity appears in red within this view. The name, the size, and the separated segments are labeled adequately. Figure 5-15 shows the meaning of the various numbers.
Storage Pool Soft Limit
Chapter 5. Configuration
95
96
2. In the Select Type drop-down list box, choose Regular or Thin Provisioned according to your needs. For thinly provisioned pools, two new fields appear: Soft Size: Here, you specify the upper limit of soft capacity. Lock Behavior: Here, you specify the behavior in case of depleted capacity. This value specifies whether the Storage Pool is locked for write or whether it is disabled for both read and write when running out of storage space. The default value is read only. 3. In the Pool Size field, specify the required size of the Storage Pool. 4. In the Snapshots Size field, enter the required size of the reserved snapshot area. Note: Although it is possible to create a pool with identical snapshot and pool size, you cannot create a new volume in this type of a pool afterward without resizing first. 5. In the Pool Name field, enter the desired name (it must be unique across the Storage System) for the Storage Pool. 6. Click Add to add this Storage Pool.
Chapter 5. Configuration
97
The resize operation can also be used to change the type of Storage Pool from thin provisioned to regular or from regular to thin provisioned. Just change the type of the pool in the Resize Pool window Select Type list box. Refer to Figure 5-18 on page 99: When a regular pool is converted to a thin provisioned pool, you have to specify an additional soft size parameter besides the existing hard size. Obviously, the soft size must be greater than the hard pool size. When a thin provisioned pool is changed to a regular pool, the soft pool size parameter will disappear from the window; in fact, its value will be equal to the hard pool size. If the space consumed by existing volumes exceeds the pools actual hard size, the pool cannot be changed to a regular type pool. In this case, you have to specify a minimum pool hard size equal to the total capacity consumed by all the volumes within this pool.
98
The remaining soft capacity is displayed in red characters and calculated by the system in the following manner: Remaining Soft Capacity = [Current Storage Pool Soft Size + Remaining System Soft Size] - Current Storage Pool Hard Size
Chapter 5. Configuration
99
A volume that belongs to a Consistency Group cannot be moved without the entire Consistency Group. As shown in the Figure 5-19, in the Volume by Pools report, just select the appropriate volume with a right-click and initiate a Move to Pool operation to change the location of a volume.
In the pop-up window, select the appropriate Storage Pool as shown in Figure 5-20 and click OK to move the volume into it.
100
Name cg_move
Description Moves a Consistency Group, all its volumes and all their Snapshots and Snapshot Sets from one Storage Pool to another. pool_change_config Changes the Storage Pool Snapshot limitation policy. pool_create Creates a Storage Pool. pool_delete Deletes a Storage Pool. pool_list Lists all Storage Pools or the specified one. pool_rename Renames a specified Storage Pool. pool_resize Resizes a Storage Pool. vol_move Moves a volume and all its Snapshot from one Storage Pool to another. To list the existing Storage Pools in a system, use the following command: xcli -c Redbook pool_list A sample result of this command is illustrated in Figure 5-21.
Hard Size (GB) 51 1511 39015 17 807 206 206 Empty Space (GB) 0 549 429 0 360 0 17 Used by Volumes (GB) 17 0 19069 0 0 0 0 Used by Snapshots (GB) 0 0 0 0 0 0 0
Locked no no no no no no no
Chapter 5. Configuration
101
For the purpose of new pool creation, enter the following command: xcli -c Redbook pool_create pool=DBPool size=1000 snapshot_size=0 The size of the Storage Pool is specified as an integer multiple of 109 bytes, but the actual size of the created Storage Pool is rounded up to the nearest integer multiple of 16x230 bytes. The snapshot_size parameter specifies the size of the snapshot area within the pool. It is a mandatory parameter, and you must specify a positive integer value for it. The following command shows how to resize one of the existing pools: xcli -c Redbook pool_resize pool=DBPool size=1300 With this command, you can increase or decrease the pool size. The pool_create and the pool_resize commands are also used to manage the size of the snapshot area within a Storage Pool. To rename an existing pool, issue this command: xcli -c Redbook pool_rename pool=DBPool new_name=DataPool To delete a pool, type: xcli -c Redbook pool_delete pool=DBPool Use the following command to move the volume named log_vol to the Storage Pool DBPool: xcli -c Redbook vol_move vol=log_vol pool=DBPool The command only succeeds if the destination Storage Pool has enough free storage capacity to accommodate the volume and its snapshots. The following command will move a particular volume and its snapshots from one Storage Pool to another, but if the volume is part of a Consistency Group, the entire group must be moved. In this case, the cg_move command is the correct solution: xcli -c Redbook cg_move cg=DBGroup pool=DBPool All volumes in the Consistency Group are moved, all snapshot groups of this Consistency Group are moved, and all snapshots of the volumes are moved.
102
To specify the behavior in case of depleted capacity reserves in a thin provisioned pool, use the following command: xcli -c Redbook pool_change_config pool=DBPool lock_behavior=read_only This command specifies whether the Storage Pool is locked for write or whether it disables both read and write when running out of storage space. Note: The lock_behavior parameter can be specified for non-thin provisioning pools, but it has no effect.
5.4 Volumes
After defining Storage Pools, the next milestone in the XIV Storage System configuration is volume management. The XIV Storage System offers logical volumes as the basic data storage element for allocating usable storage space to attached hosts. This logical unit concept is well known and widely used by other storage subsystems and vendors. However, neither the volume segmentation nor its distribution over the physical disks is conventional in the XIV Storage System. Traditionally, logical volumes are defined within various RAID arrays, where their segmentation and distribution are manually specified. The result is often a suboptimal distribution within and across modules (expansion units) and is significantly dependent upon the administrators knowledge and expertise. As explained in 2.3, Full storage virtualization on page 14, the XIV Storage System uses true virtualization as one of the basic principles for its unique design. With XIV, each volume is divided into tiny 1 MB partitions, and these partitions are distributed randomly and evenly, and duplicated for protection. The result is optimal distribution in and across all modules, which means that for any volume, the physical drive location and data placement are invisible to the user. This method dramatically simplifies storage provisioning, letting the system lay out the users volume in an optimal way. This method offers complete virtualization, without requiring preliminary volume layout planning or detailed and accurate stripe or block size pre-calculation by the administrator. All disks are equally used to maximize the I/O performance and exploit all the processing power and all the bandwidth available in the storage system. XIV Storage System virtualization incorporates an advanced snapshot mechanism with unique capabilities, which enables creating a virtually unlimited number of point-in-time copies of any volume, without incurring any performance penalties. The concept of snapshots is discussed in detail in 11.1, Snapshots on page 286. Volumes can also be grouped into larger sets called Consistency Groups and Storage Pools. Refer to 5.3, Storage Pools on page 93 and 11.1.3, Consistency Groups on page 300. Important: The basic hierarchy is (refer to Figure 5-22 on page 104): A volume can have multiple snapshots. A volume can be part of one and only one Consistency Group. A volume is always a part of one and only one Storage Pool. All volumes of a Consistency Group must belong to the same Storage Pool.
Chapter 5. Configuration
103
Storage Pool
Consistency Group
DbVol2
TestVol
Snapshots from CG
Snapshot Reserve
The Volumes & Snapshots menu item is used to list all the volumes and snapshots that have been defined in this particular XIV Storage System. An example of the resulting window can be seen in Figure 5-24 on page 105.
104
Volumes are listed in a tabular format. If the volume has snapshots, then a + or a - icon appears on the left. Snapshots are listed under their master volumes, and the list can be expanded or collapsed at the volume level by clicking the + or - icon respectively. Snapshots are listed as a sub-branch of the volume of which they are a replica, and their row is indented and highlighted in off-white. The Master column of a snapshot shows the name of the volume of which it is a replica. If this column is empty, the volume is the master. Tip: To customize the columns in the lists, just click one of the column headings and make the required selection of attributes. The default column set does not contain the Master column. Table 5-1 on page 106 shows the columns of this view with their description.
Chapter 5. Configuration
105
Table 5-1 Columns in the Volumes and Snapshots view Column Qty. Description indicates the number snapshots belonging to a volume Name of a volume or snapshot Volume or snapshot size. (value is zero if the volume is specified in blocks) Used capacity in a volume Volume size in blocks Consumed capacity Snapshot Masters name Consistency Group name Storage Pool name Indicates the locking status of a volume or snapshot. Lock icon. Shows if the snapshot was unlocked or modified Indicates the priority of deletion by numbers for snapshots Shows the creation time of a snapshot Volume or snapshot creator name Volume or snapshot serial number Shows the mirroring type status Default N
Y Y
Used (GB) Size (Blocks) Size (Consume) Master Consistency Group Pool () () Deletion Priority Created Creator Serial Number Sync Type
Y N N N Y Y Y Y N Y N N N
Most of the volume-related and snapshot-related actions can be selected by right-clicking any row in the table to display a drop-down menu of options. The options in the menu differ slightly for volumes and snapshots.
106
Removing from a Consistency Group Moving Volumes Between Storage Pools; refer toMoving volumes between Storage Pools on page 99 Creating a snapshot Creating a snapshot/(Advanced) Overwriting a snapshot Copying a Volume or snapshot Locking/Unlocking a Volume or snapshot Mappings Displaying Properties of a Volume or snapshot Changing a snapshots Deletion Priority Duplicating a snapshot or a snapshot (Advanced) Restoring from a snapshot
Creating volumes
When you create a volume in a traditional or regular Storage Pool, the entire volume storage capacity is reserved (static allocation). In other words, you cannot define more space for volumes in a regular Storage Pool than the actual hard capacity of the pool, which guarantees the functionality and integrity of the volume. If you create volumes in a Thin Provisioned Pool, the capacity of the volume will not be reserved immediately to the volumes, but a basic 17.1 GB piece, taken out of the Storage Pool hard capacity, will be allocated at the first I/O operation. In a Thin Provisioned Pool, you are able to define more space for volumes than the actual hard capacity of the pool, up to the soft size of the pool. The volume size is the actual net storage space, as seen by the host applications, not including any mirroring or other data protection overhead. The free space consumed by the volume will be the smallest multiple of 17 GB that is greater than the specified size. For example, if we request an 18 GB volume to be created, the system will round this volume size to 34 GB. In case of a 16 GB volume size request, it will be rounded to 17 GB. Figure 5-25 on page 108 gives you several basic examples of volume definition and planning in a thinly provisioned pool. It depicts the volumes with the minimum amount of capacity, but the principle can be used for larger volumes as well. As shown in this picture, we recommend that you plan carefully the number of volumes or the hard size of the thinly provisioned pool because of the minimum hard capacity that is consumed by one volume. If you create more volumes in a thinly provisioned pool than the hard capacity can cover, the I/O operations against the volumes will fail at the first I/O attempt. Note: We recommend that you plan the volumes in a Thin Provisioned Pool in accordance with this formula: Pool Hard Size >= 17 GB x (number of volumes in the pool)
Chapter 5. Configuration
107
Volume I 17 GB
Volume II 17 GB
Volume I 17 GB
Volume II 17 GB
Volume III 17 GB
Volume I 17 GB
Volume II 34 GB
The size of a volume can be specified either in gigabytes (GB) or in blocks (where each block is 512 bytes). If the size is specified in blocks, volumes are created in the exact size specified, and the size will be not rounded up. It means that the volume will show the exact block size and capacity to the hosts but will nevertheless consume a 17 GB size in the XIV Storage System. This capability is relevant and useful in migration scenarios. If the size is specified in gigabytes, the actual volume size is rounded up to the nearest 17.1 GB multiple (making the actual size identical to the free space consumed by the volume, as just described). This rounding up prevents a situation where storage space is not fully utilized because of a gap between the free space used and the space available to the application. The volume is logically formatted at creation time, which means that any read operation results in returning all zeros as a response. To create volumes with the XIV Storage Management GUI: 1. Click the add volumes icon in the Volume and Snapshots view (Figure 5-24 on page 105) or right-click in the body of the window (not on a volume or snapshot) and select Add Volumes. The window shown in Figure 5-26 on page 109 is displayed. 2. From the Select Pool field, select the Pool in which this volume is stored. You can refer to 5.3, Storage Pools on page 93 for a description of how to define Storage Pools. The storage size and allocation of the selected Storage Pool is shown textually and graphically in a color-coded bar: Green indicates the space already allocated in this Storage Pool. Yellow indicates the space that will be allocated to this volume (or volumes) after it is created. Gray indicates the space that remains free after this volume (or volumes) is allocated.
108
3. In the Number of Volumes field, specify the required number of volumes. 4. In the Volume Size field, specify the size of each volume to define. The size can also be modified by dragging the yellow part of the size indicator. Note: When multiple volumes are created, they all have the same size as specified in the Volume Size field. 5. In the Volume Name field, specify the name of the volume to define. The name of the volume must be unique in the system. If you specified that more than one volume be defined, they are successively named by appending an incrementing number to end of the specified name. 6. Click Create to effectively create and add the volumes to the Storage Pool. After a volume is successfully added, its state is unlocked, meaning that write, format, and resize operations are permitted. The creation time of the volume is set to the current time and is never changed.
Resizing volumes
Resizing volumes is an operation very similar to their creation. Only an unlocked volume can be resized. When you resize a volume, its size is specified as an integer multiple of 109 bytes, but the actual new size of the volume is rounded up to the nearest valid size, which is an integer multiple of 17 GB. Note: The size of the volume can be decreased. However, to avoid possible data loss, you must contact your IBM XIV support personnel if you need to decrease a volume size. (Mapped volume size cannot be decreased.)
Chapter 5. Configuration
109
The volume address space is extended (at the end of the existing volume) to reflect the increased size, and the additional capacity is logically formatted (that is, zeroes are returned for all read commands). When resizing a regular volume (not a writable snapshot), all storage space that is required to support the additional volume capacity is reserved (static allocation), which guarantees the functionality and integrity of the volume, regardless of the resource levels of the Storage Pool containing that volume. Resizing a master volume does not change the size of its associated snapshots. These snapshots can still be used to restore their individual master volumes at their initial sizes.
To resize volumes with XIV Storage Management GUI: 1. Right-click the row of the volume to be resized and select Resize. The total amount of storage is presented both textually and graphically. The amount that is already allocated by the other existing volumes is shown in green. The amount that is free is shown in gray. The current size of the volume is displayed in yellow, to the left of a red vertical bar. This red bar provides a constant indication of the original size of the volume as you resize it. Place the mouse cursor over the red bar to display the volumes initial size. 2. In the New Size field, use the arrows to set the new size or type the new value. 3. Click Update to resize the volume.
Deleting volumes
With the GUI, the deletion of a volume is as easy as creating one. Important: After you delete a volume or a snapshot, all data stored on the volume is lost and cannot be restored.
110
All the storage space that was allocated (or reserved) for the volume or snapshot is freed and returned to its Storage Pool. The volume or snapshot is then removed from all the logical unit number (LUN) Maps that contain mapping of this volume. Deleting a volume deletes all the snapshots associated with this volume, even snapshots that are part of snapshot Groups. This deletion can only happen when the volume was in a Consistency Group and was removed from it. You can delete a volume regardless of the volumes lock state, but you cannot delete a volume that is part of a Consistency Group. To delete a volume or a snapshot: 1. Right-click the row of the volume to be deleted and select Delete. 2. Click to delete the volume.
Maintaining volumes
There are several other operations that can be issued on a volume. Refer to Menu option actions on page 106. The usage of these operations is obvious, and you can initiate an operation with a right-mouse click. These operations are: Format a volume: A formatted volume returns zeros as a response to any read command. The formatting of the volume is done logically, and no data is actually written to the physical storage space allocated for the volume. Consequently, the formatting action is performed instantly. Rename a volume: A volume can be renamed to a unique name in the system. A locked volume can also be renamed. Lock/Unlock a volume: You can lock a volume so that hosts cannot write to it. A volume that is locked is write-protected, so that hosts can read the data stored on it, but they cannot change it. The volume appears then as a lock icon. In addition, a locked volume cannot be formatted or resized. In general, locking a volume prevents any operation (other than deletion) that changes the volumes image. Note: Master volumes are set to unlocked when they are created. Snapshots are set to locked when they are created. Consistency Groups: XIV Storage System enables a higher level of volume management provided by grouping volumes and snapshots into sets called Consistency Groups. This kind of grouping is especially useful for cluster-specific volumes. Refer to 11.1.3, Consistency Groups on page 300 for a detailed description. Copy a volume: You can copy a source volume onto a target volume. Obviously, all the data that was previously stored on the target volume is lost and cannot be restored. Refer to 11.2, Volume Copy on page 317 for a detailed description. Snapshot functions: The XIV Storage Systems advanced snapshot feature has unique capabilities that enable the creation of a virtually unlimited number of copies of any volume, with no performance penalties. Refer to 11.1, Snapshots on page 286. Map a volume: While the storage system sees volumes and snapshots at the time of their creation, the volumes and snapshots are visible to the hosts only after the mapping procedure. To get more information about mapping, refer to 5.5, Host definition and mappings on page 113.
Chapter 5. Configuration
111
Category volume volume volume volume volume volume volume volume volume volume volume
Name vol_by_id vol_clear_keys vol_copy vol_create vol_delete vol_format vol_list vol_lock vol_rename vol_resize vol_unlock
Description Prints the volume name according to its specified SCSI serial number. Clears all SCSI reservations and registrations. Copies a source volume onto a target volume. Creates a new volume. Deletes a volume. Formats a volume. Lists all volumes or a specific one. Locks a volume so that it is read-only. Renames a volume. Resizes a volume. Unlocks a volume, so that it is no longer read-only and can be written to.
To list the existing volumes in a system, use the following command: xcli -c Redbook vol_list The result of this command is similar to the illustration given in Figure 5-28.
Size (GB) 34 34 34 51 Master Name Consistency Group Used Capacity (GB) 0 0 0 0
To find and list a specific volume by its SCSI ID, issue the following command: xcli -c Redbook vol_by_id=12 To create a new volume, enter the following command: xcli -c Redbook vol_create vol=DBVolume size=2000 pool=DBPool The size can be specified either in gigabytes or in blocks (where each block is 512 bytes). If the size is specified in blocks, volumes are created in the exact size specified. If the size is specified in gigabytes, the actual volume size is rounded up to the nearest 17 GB multiple 112
IBM XIV Storage System: Concepts, Architecture, and Usage
(making the actual size identical to the free space consumed by the volume, as described above). This rounding up prevents a situation where storage space is not fully utilized because of a gap between the free space used and the space available to the application. Note: If pools are already created in the system, the specification of the Storage Pool name is mandatory. The volume is logically formatted at creation time, which means that any read operation results in returning all zeros as a response. To format a volume, use the following command: xcli -c Redbook vol_format vol=DBVolume Note that all data stored on the volume will be lost and unrecoverable. If you want to bypass the warning message, just put -y right after the XCLI command. The following example shows how to resize one of the existing volumes: xcli -c Redbook vol_resize vol=DBVolume size=2100 With this command, you can increase or decrease the volume size. The size of the volume can be decreased. However, to avoid data loss, contact the XIV Storage System support personnel if you need to decrease the size of a volume. To rename an existing volume, issue this command: xcli -c Redbook vol_rename vol=DBVolume new_name=DataVol To delete an existing created volume, enter: xcli -c Redbook vol_delete vol=DataVol
Chapter 5. Configuration
113
Clicking the icon brings up the Hosts and LUNs menu. Select Hosts from this menu to get the Hosts main view displayed in Figure 5-30. The Hosts view enables you to perform a range of activities for managing hosts, including defining, editing, deleting, renaming, and linking the host servers used in the XIV Storage System. The main Hosts window lists all the hosts that have been defined in the XIV Storage System.
114
Table 5-2 describes the columns displayed in the Hosts window. The hidden columns can be revealed by right-clicking the heading line of the view.
Table 5-2 Column Name Type Columns in the Hosts view Description Host name FC: Fibre Channel iSCSI: iSCSI (Internet Small Computer System Interface) The creator of the host Cluster name to which this host belongs LUN Map to which this host is linked LUN Map identification number Access Default Y Y
N Y N N Y
By expanding the hosts, you can see the port World Wide Web Net addresses for Fibre Channel (FC) hosts and the Internet Small Computer System Interface (iSCSI) initiator names for iSCSI hosts.
Creating a host
When trying to define a new host (Figure 5-31), remember that the name of the host must be unique in the system.
To create Hosts with the XIV Storage Management GUI: 1. Click the Add Host icon on the toolbar in the Hosts view or right-click in the body of the window (not on a host or a port) and select Add Host. A Define Host panel is displayed as shown in Figure 5-31. 2. Enter the desired name for the host. 3. Select the cluster name if the host going to be a part of a cluster; otherwise, select None. 4. Click Create to effectively define the new Host.
Chapter 5. Configuration
115
The host just created is reflected in the Hosts view but without any ports yet. The name of the host later can be modified by selecting the rename option in the menu.
Adding a port to a host: 1. Right-click the predefined host and select Add Port. The Add Port panel appears as shown in Figure 5-32. 2. Select the Port Type according to the host connection type (FC or iSCSI). 3. Specify the Port Name. The FC port address or iSCSI initiator (port) name assigned to the host must be unique in the XIV Storage System: a. For FC port specification, you can choose the existing port names, which are already seen by the XIV Storage System, from a list box or just type a new port name. The FC port name must be exactly 16 characters long, in hexadecimal form. Only the following 116
alphanumeric characters are valid: 0-9, A-F, and a-f. In addition to the 16 characters, colons (:) can be used as separators in the 16 character port name. The port naming convention for XIV Storage ports is: WWPN: 5000001738XXXXYZ 001738 = Registered identifier for XIV XXXX = Serial number in hex Y = (hex) Interface Module number Z = (hex) FC port number within the Interface Module b. For iSCSI port selection, the iSCSI port (initiator) name must not exceed 253 characters and must not contain any blank spaces. 4. Click Add to add a new port to the host. The name of the ports cannot be modified later, just deleted. If you need to change a name, a new port must be allocated to the host.
Chapter 5. Configuration
117
Volumes can be mapped to the logical unit numbers (assigned to the right side of the window) in one of these ways: Mapping volumes automatically by selecting volumes and clicking Map Mapping volumes semi-automatically, by selecting volumes, selecting a free LUN row, and clicking Map Mapping volumes (manually) by dragging and dropping To place volumes automatically: 1. Click the volumes required from the Volumes table while holding down Ctrl if more than one volume is required. The volumes are automatically allocated places in the LUNs table in sequential order, according to the free locations. The LUN rows are temporarily highlighted in light yellow, and the volume names and sizes are displayed, as shown in Figure 5-33. 2. Click Map to accept this mapping configuration.
118
To place volumes semi-automatically: 1. Click the required volumes from the Volumes table while holding down Ctrl if more than one volume is required. The volumes are automatically allocated places in the LUNs table in sequential order, according to the free locations. The LUN rows are temporarily highlighted in light yellow, and the volume names and sizes are displayed showing the initial, automatic placement. 2. Click a starting point on the LUNs table where you want the volumes to be copied. The volumes automatically cascade downward to the next available free locations, with the first volume marked in a darker yellow, as shown in Figure 5-34.
3. Click Map to accept this mapping configuration. You can manually map volumes to LUNs while selecting the same number of rows on the LUNs table as in the Volumes table, which helps you select LUNs that are not in sequential order. Another option is to change mapped LUNs by selecting them on the right and dragging and dropping them to a different free location on the right table. Volumes mapped to LUNs (assigned on the right side of the window) can also be unassigned or unmapped. Unmapping LUNs from a host after this procedure is easy and simple, just right-click the LUN to be unmapped and select Unmap.
Chapter 5. Configuration
119
Cluster
In many cases, you might need to define identical mappings for a set of hosts. To implement this configuration, it is necessary to define a cluster as an entity that groups several hosts together and assigns the same mapping to all of the hosts. The mapping of volumes to LUN identifiers is defined per cluster and applies concurrently to all the hosts in the cluster. There is no way to define different mappings for different hosts belonging to the same cluster.
Creating a cluster
The cluster creation in XIV Storage Management software is similar to the host creation (Figure 5-35).
To create a cluster with XIV Storage Management GUI: 1. Click the Add Cluster icon on the toolbar in the Hosts view or right-click in the body of the window (not on a host or a port) and select Add Cluster. A Create Cluster panel appears as shown in Figure 5-35. 2. Enter the name, which must be unique in the system, of the new cluster. 3. Click OK to define the new cluster.
120
The mapping definitions do not revert to the hosts original mapping before it was added to the cluster. After removing the host from the cluster, the administrator can change the hosts mapping. A cluster of specific volumes can all be used simultaneously as a group and a synchronized snapshot of them can be created by using a Consistency Group. The volumes in a Consistency Group are grouped into a single Volume Set. To get more information about Consistency Groups, refer to 11.1.3, Consistency Groups on page 300.
Category host host host host host host host host host host host host host host host host host
Name cluster_add_host cluster_create cluster_delete cluster_list cluster_remove_host cluster_rename host_add_port host_define host_delete host_list host_remove_port host_rename map_vol mapping_list special_type_set unmap_vol vol_mapping_list
Description Adds a host to a cluster. Creates a new cluster. Deletes a cluster. Lists a specific cluster or all of them. Removes a host from a cluster. Renames a cluster. Adds a port address to a host. Defines a new host to connect to the XIV Storage System. Deletes a host. Lists a specific host or all hosts. Removes a port from a host. Renames a host. Maps a volume to a host or a cluster. Lists the mapping of volumes to a specified host or cluster. Sets the special type of a host or a cluster. Unmaps a volume from a host or a cluster. Lists all the hosts and clusters to which a volume is mapped.
This section shows the most common ways to manage the host or cluster management by using the XCLI tool. To create a host or a cluster with XCLI, use these commands: xcli -c Redbook host_define host=Windows_Server1 xcli -c Redbook cluster_create cluster=Mscs_Cluster1 These commands create a new host or cluster. The newly created cluster contains the hosts. Neither the host nor the cluster has mapping at the time of definition. Adding members to a cluster can be done in one of two ways. First, at the creation of a host, you can define another parameter, cluster=. Or, use the cluster_add_host command: xcli -c Redbook cluster_add_host cluster=Mscs_Cluster1 host=Server1 map=cluster
Chapter 5. Configuration
121
To list the existing hosts or clusters in the XIV Storage System, use the following two commands as shown: xcli -c Redbook host_list xcli -c Redbook cluster_list You need to allocate ports to the newly defined hosts. There are two port types in the XIV Storage System: FC (Fibre Channel) and iSCSI (Internet Small Computer System Interface) ports. The FC port address or iSCSI initiator (port) name assigned to the host must be unique for the XIV Storage System. The FC port name must be exactly 16 characters long and in hexadecimal form. Only the following alphanumeric characters are valid: 0-9, A-F, and a-f. In addition to the 16 characters, colons (:) can be used as separators in the 16 character port name. The iSCSI initiator name must not exceed 253 characters and must not contain any blank spaces. The port naming convention for XIV Storage System ports is: WWPN: 5000001738XXXXYZ 001738 = Registered identifier for XIV XXXX = Serial number in hex Y = (hex) Interface Module number Z = (hex) FC port number within the Interface Module The following example shows a port definition: xcli -c Redbook host_add_port host=Server1 fcaddress=10000000C92CFD36 You can get more information about the FC and iSCSI connectivity by using these commands: xcli -c Redbook fc_connectivity_list xcli -c Redbook fc_port_list xcli -c Redbook host_connectivity_list Using the previous commands, you can discover the network or list the existing connections. After the port definition, you can map existing volumes to one of the hosts or clusters: xcli -c Redbook map_vol host=Server1 vol=DBvol1 lun=1 ovverride=yes The command fails if another volume is mapped to the same LUN for this cluster/host, unless the override is specified. If the Override option is specified, the hosts existing mapping is replaced by the newly specified mapping, which enables the host (or all the hosts in the cluster) to see the continuous mapping of volumes to this LUN. Although, the volumes content, and possibly size, might change.
5.6 Scripts
IBM XIV Storage Manager software XCLI commands can be used in scripts or batch programs in case you need to use repetitive or complex operations.The XCLI can be used in a shell environment to interactively configure the system or as part of a script to perform specific tasks. Example 5-9 on page 123 shows a Windows XP batch program.
122
-----------------------------------------------------------------------------This batch program erase all volumes and snapshots in a specified Storage Pool within an IBM XIV Storage System Prerequisite: existing configuration with user, password and IP address Limitation: it will not delete volumes with mirror relationship Operating system: Windows XP Tested xcli version: 2.2.43 ------------------------------------------------------------------------------
@echo off cls rem rem set set set rem Set the varaiables -----------------------------------------------------------------------------POOLNAME= ANSWER=N SYSNAME=Redbook ------------------------------------------------------------------------------
:POOLIST rem lists the poolS and sorts them rem ----------------------------------------------------------------------------echo ***These are the storage pools in - %SYSNAME% - Storage System*** xcli -t name -c %SYSNAME% pool_list :POOLINPUT rem prompt for user input for Storage Pool name and check it echo ---------------------------------------------------------------------------@set /p POOLNAME=Type the Storage Pool name where you want to delete ALL the volumes: xcli -s -t name -c %SYSNAME% pool_list | findstr \<%POOLNAME%\> @if errorlevel 1 ( echo There is no such Storage Pool in the system called: %POOLNAME% goto quit ) :VOLIST rem lists the volumes in the requested pool echo ---------------------------------------------------------------------------echo ***Listing volumes in - %POOLNAME% - Storage Pool*** for /F usebackq skip=1 delims=, tokens=1,3,5 %%i in (`xcli -s -c %SYSNAME% vol_list`) do ( if %%~k EQU %POOLNAME% ( if %%j EQU ( echo Volume: %%~i ) else ( echo Snapshot: %%~i - Master: %%~j ) ) ) :VOLLEY rem ask confirmation for delete and in case of YES delete the volumes echo ---------------------------------------------------------------------------echo Please type Y for Yes or N for No - default is No
Chapter 5. Configuration
123
set /p ANSWER=Do you really want to delete all the volumes and snapshots in %POOLNAME% now ? Y/N: if /i %ANSWER:~,1% EQU N ( echo - Action is cancelled goto quit ) if /i %ANSWER:~,1% EQU y ( for /F usebackq skip=1 delims=, tokens=1,3,5 %%i in (`xcli -s -c %SYSNAME% vol_list`) do ( if %%~k equ %POOLNAME% ( if %%j EQU ( rem ----------------------------------------------------------------------------rem Uncomment the next commands if you want to remove the volumes from a CG rem ----------------------------------------------------------------------------rem echo Removing %%~i from CG ... rem xcli -y -c %SYSNAME% cg_remove_vol vol=%%~i rem ----------------------------------------------------------------------------rem Uncomment the next commands if you want to unmap mapped volumes rem ----------------------------------------------------------------------------rem echo Unmapping %%~i ... rem for /F usebackq skip=1 delims=, tokens=1 %%h in (`xcli -s -c %SYSNAME% vol_mapping_list vol^=%%~i`) do ( rem xcli -y -c %SYSNAME% unmap_vol vol=%%~i host=%%~h rem ) rem ----------------------------------------------------------------------------rem This command will delete the volumes rem ----------------------------------------------------------------------------echo Erasing %%~i ... xcli -y -c %SYSNAME% vol_delete vol=%%~i ) ) ) goto quit ) goto VOLLEY :quit echo ---------------------------------------------------------------------------echo End of Program
124
Chapter 6.
Security
This chapter discusses the XIV Storage System security features from different perspectives. More specifically, it covers the following topics: System physical access security User access and authorizations Password management Managing multiple machines Enhanced access security
125
Important: Protect your XIV Storage System by locking the rack doors and monitoring physical access to the rack.
126
Chapter 6. Security
127
Table 6-1 Default users and their categories Predefined user admin technician N/A N/A xiv_development xiv_maintenance Default password adminadmin technician N/A N/A N/A N/A Category storageadmin technician applicationadmin readonly xiv_development xiv_maintenance
Both GUI and XCLI use the same user and role definitions.
User groups
A user group is a group of application administrators who share the same set of snapshot creation limitations. The limitations are enforced by associating the user groups to hosts or clusters and, therefore, the snapshots of volumes that are mapped to these hosts or clusters. After a user (application administrator) belongs to a user group, which is associated with a host, it is possible for the user to manage snapshots of the volumes mapped to that host. The concept of user groups allows a simple update, through a single command, of the limitations for all the users in the user group. User groups have these rules: Only users who are defined as application administrators can be assigned to a group. A user can belong to only a single user group. A user group can contain up to eight users. Important: A user group only applies to users with the application administrator role. Storage Administrators create the user groups and control the various application administrator permissions. Rules: A maximum of 32 users can be created. A maximum of eight user groups can be created. A maximum of eight users can be attached to a user group.
128
2. Users are defined per system. If you manage multiple systems and they have been added to the GUI, select the particular system with which you want to work. 3. In the main Storage Manager GUI window, move the mouse pointer over the padlock icon to display the Access menu. All user access operations can be performed from the Access menu (refer to Figure 6-2). There are three choices: Users: Define or change single users Users Groups: Define or change user groups, and assign application administrator users to groups Access Control: Define or change user groups, and assign application administrator users or user group to hosts 4. Move the mouse over the Users menu item (it is now highlighted in yellow) and click (Figure 6-2).
5. The Users window is displayed. If the storage system is accessed for the first time, the window displays the predefined users (refer Figure 6-3 on page 130 for an example). The default columns are Name, Role, Group, E-mail, and Phone. An additional column called Full Access can eventually be displayed (this indication only applies to users with a role of application administrator). To add the Full Access column, right-click the blue heading bar to display the Customize Columns dialog.
Chapter 6. Security
129
c. We recommend that you change the default passwords for the predefined users, which can be accomplished by right-clicking the user name and selecting Change Password from the context menu, as illustrated in Figure 6-4. Repeat the operation for each of the four predefined users.
6. To add a new user, you can either click the Add icon in the menu bar or right-click the empty space to get the context menu. Both options are visible in Figure 6-5. Click Add User.
130
7. The Define User dialog is displayed. A user is defined by a unique name and a password (refer to Figure 6-6). The default role (denoted as Category in the dialog panel) is storageadmin and must be changed. Optionally, enter the e-mail address and phone number for the user. Click Define to define the user and return to the Users window.
8. If you need to test the user that you just defined, click the current user name shown in the upper right corner of the IBM XIV Storage Manager window (Figure 6-7), which allows you to log in as a new user.
Chapter 6. Security
131
2. The Users Groups window displays. To add a new user group, either click the Add User Group icon (shown in Figure 6-9) in the menu bar, or right-click in an empty area of the Users Groups table and select Add User Group from the context menu as shown in Figure 6-9.
3. The Create User Group dialog displays. Enter a meaningful group name and click OK (refer to Figure 6-10).
4. At this stage, the user group EXCHANGE CLUSTER 01 is still empty. Next, we add a host to the user group. Select Access Control from the Access menu as shown in Figure 6-11. This Access Control window appears.
5. Right-click the name of the user group that you have created to bring up a context menu and select Updating Access Control as shown in Figure 6-12 on page 133.
132
6. The Access Control Definitions dialog that is shown in Figure 6-13 is displayed. The panel contains the names of all the hosts or clusters defined to the XIV Storage System. The left pane displays the list of Unauthorized Hosts/Clusters for this particular user group and the right pane shows the list of hosts that have already been associated to the user group. You can add or remove hosts from either list by selecting a host and clicking the appropriate arrow. Finally, click Update to save the changes.
7. After a host (or multiple hosts) have been associated to a user group, you can add users to the user group (remember that a user must have the application administrator role to be added to a user group). Go to the Users window and right-click the user name to display the context menu. From the context menu (refer to Figure 6-14), select Add to Group to add this user to a group.
Chapter 6. Security
133
8. The Select User Group dialog is displayed. Select the desired group from the pull-down list and click OK (refer to Figure 6-15).
9. The user adm_mike02 has been assigned to the user group EXCHANGE CLUSTER 01 in this example. You can verify this assignment in the Users panel as shown in Figure 6-16.
10.The user adm_mike02 is an applicationadmin with the Full Access right set to no. This user can now perform snapshots of the EXCHANGE CLUSTER 01 volumes. Because the exchange cluster is the only host in the group, adm_mike02 is only allowed to map those snapshots to the EXCHANGE CLUSTER 01. However, you can add another host, such as a test or backup host, to allow adm_mike02 to map a snapshot volume to a test server.
134
Table 6-2
XCLI access control commands Description Defines an association between a user group and a host. Deletes an access control definition. Lists access control definitions. Defines a new user. Deletes a user. Adds a user to a user group. Creates a user group. Deletes a user group. Lists all user groups or a specific one. Removes a user from a user group. Renames a user group. Lists all users or a specific user. Renames a user. Updates a user. You can update the password, Access_all or Full_access, e-mail, area code, or phone number. Role required to use command storageadmin
Command access_define
access_delete access_list user_define user_delete user_group_add_user user_group_create user_group_delete user_group_list user_group_remove_user user_group_rename user_list user_rename user_update
storageadmin storageadmin, readonly, and applicationadmin storageadmin storageadmin storageadmin storageadmin storageadmin storageadmin, readonly, and applicationadmin storageadmin storageadmin storageadmin, readonly, and applicationadmin storageadmin technician, storageadmin, and applicationadmin
2. In Example 6-2, we check the current state of the particular system with which we want to work (note that if the system name contains blanks, it must be inserted between single quotation marks). The default user admin is used.
Chapter 6. Security
135
C:\>xcli -c 1300203 -u admin -p adminadmin state_list Command completed successfully system_state off_type=off safe_mode=no shutdown_reason=No Shutdown system_state=on target_state=on 3. XCLI commands are grouped into categories. The help command can be used to get a list of all commands related to the category accesscontrol. Refer to Example 6-3.
Example 6-3 XCLI help
C:\>xcli -c 1300203 -u admin -p adminadmin help category=accesscontrol Category Name Description accesscontrol access_define Associate a user group and a host. accesscontrol access_delete Deletes an access control definition. accesscontrol access_list Lists access control definitions. accesscontrol user_define Defines a new user. accesscontrol user_delete Deletes a user. accesscontrol user_group_add_user Adds a user to a user group. accesscontrol user_group_create Creates a user group. accesscontrol user_group_delete Deletes a user group. accesscontrol user_group_list Lists all user groups or a specific one. accesscontrol user_group_remove_user Removes a user from a user group. accesscontrol user_group_rename Renames a user group. accesscontrol user_list Lists all users or a specific user. accesscontrol user_rename Renames a user. accesscontrol user_update Updates a user. 4. Use the user_list command to obtain the list of predefined users and roles (categories) as shown in Example 6-4. This example assumes that no users, other than the default users, have been added to the system.
Example 6-4 XCLI user_list
C:\>xcli -c 1300203 -u admin -p adminadmin user_list Name Category Group/EmailAddress/AreaCode/Phone/AccessAll xiv_development xiv_development xiv_maintenance xiv_maintenance admin storageadmin technician technician 5. If this is a new system, you must change the default passwords for obvious security reasons. Use the update_user command as shown in Example 6-5 for the user technician.
Example 6-5 XCLI user_update
C:\>xcli -c 1300203 -u admin -p adminadmin user_update user=technician password=d0ItNOW password_verify=d0ItNOW Command completed successfully 6. Adding a new user is straightforward as shown in Example 6-6. A user is defined by a unique name, password, and role (designated here as category). 136
IBM XIV Storage System: Concepts, Architecture, and Usage
C:\>xcli -c 1300203 -u admin -p adminadmin user_define user=adm_itso02 password=wr1teFASTER password_verify=wr1teFASTER category=storageadmin Command completed successfully 7. Example 6-7 shows a quick test to verify that the new user can log on.
Example 6-7 XCLI user_list
C:\>xcli -c 1300203 -u adm_itso02 -p wr1teFASTER user_list Name Category xiv_development xiv_development xiv_maintenance xiv_maintenance admin storageadmin technician technician adm_itso02 storageadmin
C:\>xcli -c 1300203 -u adm_itso02 -p wr1teFASTER user_group_create user_group=EXCHANGE_CLUSTER_01 Command completed successfully Note: Avoid spaces in user group names. If spaces are required, the group name must be placed between single quotation marks, such as name with spaces. 2. The user group EXCHANGE_CLUSTER_01 is empty and has no associated host. The next step is to associate a host or cluster. In Example 6-9, user group EXCHANGE_CLUSTER_01 is associated to EXCHANGE_CLUSTER_MAINZ.
Example 6-9 XCLI access_define
C:\>xcli -c 1300203 -u adm_itso02 -p wr1teFASTER access_define user_group="EXCHANGE_CLUSTER_01" cluster="EXCHANGE_CLUSTER_MAINZ" Command completed successfully 3. A host has been assigned to the user group. The user group still does not have any users included. In Example 6-10 on page 137, we add the first user.
Example 6-10 XCLI user_group_add_user
C:\>xcli -c 1300203 -u adm_itso02 -p wr1teFASTER user_group_add_user user_group="EXCHANGE_CLUSTER_01" user="adm_mike02" Command completed successfully 4. The user adm_mike02 has been assigned to the user group EXCHANGE_CLUSTER_ 01. You can verify the assignment by using the XCLI user_list command as shown in Example 6-11.
Chapter 6. Security
137
C:\>xcli -c 1300203 -u adm_itso02 -p wr1teFASTER user_list Name Category Group Access All xiv_development xiv_development xiv_maintenance xiv_maintenance admin storageadmin technician technician adm_itso02 storageadmin adm_mike02 applicationadmin EXCHANGE_CLUSTER_01 no The user adm_mike02 is an applicationadmin with the Full Access right set to no. This user can now perform snapshots of the EXCHANGE_CLUSTER_01 volumes. Because EXCHANGE_CLUSTER_01 is the only cluster (or host) in the group, adm_mike02 is only allowed to map those snapshots to the same EXCHANGE_CLUSTER_01. This is not useful in practice and is not supported in most cases. Most servers (operating systems) cannot handle having two disks with the same metadata mapped to the system. In order to prevent issues with the server, you need to map the snapshot to another host, not the host to which the master volume is mapped. Therefore, to make things practical, a user group is typically associated to more than one host.
138
Figure 6-17 shows that you can change a password by right-clicking the selected user in the Users window. Then, select Change Password from the context menu.
The Change Password dialog shown in Figure 6-18 is displayed. Enter the New Password and then retype it for verification in the appropriate field (remember that only alphanumeric characters are allowed). Click Update.
Example 6-12 on page 139 shows the same password change procedure using the XCLI. Remember that a user with the storageadmin role is required to change the password on behalf of another user.
Example 6-12 XCLI change user password
C:\>xcli -c 1300203 -u admin -p adminadmin user_update user=adm_mike02 password=workLESS password_verify=workLESS Command completed successfully
Chapter 6. Security
139
Figure 6-19 illustrates the GUI view of multiple systems when using different IDs (or passwords). For this example, the system named ESP has an ID named tester that provides the storage admin operation. Because the tester ID is not configured for the other XIV Storage Systems, only the ESP system is currently shown as accessible. The user can see the other systems, but the user is unable to access them with the tester ID (the unauthorized systems appear in black and white). They also state that the user is unknown. If the system has the user ID tester defined with a different password, the systems are still displayed in the same state.
In order to allow easy access between systems for the system manager, it is best to have a universal ID. Figure 6-20 illustrates the universal ID being used with five systems. The storage administrator can easily switch between these systems for the activities without having to log on each time with a different user ID. Each of the authorized systems for that ID and password combination is now displayed in color with an indication of its status.
140
141
The window is split into two sections: The top part contains the management tools, such as wizards in the menu bar and a series of input fields and drop-down menus that act as selection filters. The bottom part is a table displaying the events according to the selection criteria. Use the table tile bar or headings to enable or change sort direction.
At the time of writing this book, the XIV GUI is limited to displaying a maximum of 300 events, and the name field is disabled (the name field allows you to filter based on the name of the user). At this time, the only way to get events related to a specific user is through the XCLI. Note: The XIV GUI does not display the user who performed a transaction. A transaction audit with the user name can only be performed on the XCLI.
Event attributes
This section gives an overview of all available event types, event codes, and their severity levels.
Severity Levels
Select one of six possible severity levels as the minimal level to be displayed: none: Includes all severity levels informational: Changes, such as volume deletion, size changes, or host multi-pathing warning: Volume usage limits reach 80%, failing message sent minor : Power supply power input loss, volume usage over 90%, component TEST failed major: Component failed (disk), user system shutdown, volume and pool usage 100%, UPS on battery or Simple Mail Transfer Protocol (SMTP) gateway unreachable critical: Module failed or UPS failed
Event codes
Refer to the XCLI Reference Guide, GC27-2213-00, for a list of event codes.
142
Event types
The following event types can be used as filters (specified with the parameter object_type in the XCLI command): cons_group: Consistency Group destgroup: event notification group of single destination (mixture of SMTP/SMS) dest: event notification address dm: data migration host: host map: volume mapping mirror: mirroring pool: pool rule: rule smsgw: sms gateway smtpgw: smtp gateway target: fc/iSCSI connection volume: volume
Chapter 6. Security
143
Command smsgw_define smsgw_delete smsgw_list smsgw_prioritize smsgw_rename smsgw_update smtpgw_define smtpgw_delete smtpgw_list smtpgw_prioritize smtpgw_rename smtpgw_update rule_activate rule_create rule_deactivate rule_delete rule_list rule_rename rule_update
Description Defines a Short Message Service (SMS) gateway. Deletes an SMS gateway. Lists SMS gateways. Sets the priorities of the SMS gateways for sending SMS messages. Renames an SMS gateway. Updates an SMS gateway. Defines an SMTP gateway. Deletes a specified SMTP gateway. Lists SMTP gateways. Sets the priority of which SMTP gateway to use to send e-mails. Renames an SMTP gateway. Updates the configuration of an SMTP gateway. Activates an event notification rule. Creates an event notification rule. Deactivates an event notification rule. Deletes an event notification rule. Lists event notification rules. Renames an event notification rule. Updates an event notification rule.
XCLI examples
To illustrate how the commands operates, the event_list command displays the events currently in the system. Example 6-13 shows the first few events logged in our system.
144
C:\XIV>xcli -c "XIV Index Code Alerting Clea red User 8545 UNMAP_VOLUME 08:26:49 no admin 8546 UNMAP_VOLUME 08:26:50 no admin 8547 UNMAP_VOLUME 08:26:51 no admin 8548 UNMAP_VOLUME 08:26:53 no admin 8549 UNMAP_VOLUME 08:26:54 no admin ....... .......
Informational yes Informational yes Informational yes Informational yes Informational yes
2008-08-20
2008-08-20
2008-08-20
2008-08-20
2008-08-20
Example 6-14 illustrates the command for listing all instances when the user was updated. The USER_UPDATED event is generated when a users password, e-mail, or phone number is modified. In this example, the -t option is used to display specific fields, such as index, code, description of the event, time stamp, and user name. The description field provides the ID that was modified, and the user field is the ID of the user performing the action.
Example 6-14 View USER_UPDATED event with the XCLI
V10.0 1300203" -t
User admin admin admin admin
index,code,description,timestamp,user_name event_list code=USER_UPDATED Index Code Description Timestamp 425 USER_UPDATED User with name 'xiv_pfe' was updated. 2008-08-07 426 USER_UPDATED User with name 'xiv_pfe' was updated. 2008-08-07 441 USER_UPDATED User with name 'adm_mike02' was updated. 2008-08-07 1279 USER_UPDATED User with name 'chris' was updated. 2008-08-25
C:\XIV>xcli -c "XIV ESP 1 V10.0 1300203" rule_create rule=test codes=ACCESS_OF_USER_GROUP_TO_CLUSTER_REMOVED,ACCESS_OF_USER_GROUP_TO_HOST_REMOVED ,ACCESS_TO_CLUSTER_GRANTED_TO_USER_GROUP,ACCESS_TO_HOST_GRANTED_TO_USER_GROUP dests=relay
Chapter 6. Security
145
Command executed successfully. A simpler example is setting up a rule notification for when a user account is modified. Example 6-16 creates a rule on the XIV Storage System called ESP that sends a notification whenever a user ID is modified on the system. The notification is transmitted through the relay destination.
Example 6-16 Create a rule for notification with the XCLI
C:\XIV>xcli -c "XIV ESP 1 V10.0 1300203" rule_create rule=user_update codes=USER_UPDATED dests=relay Command executed successfully. The same rule can be created in the GUI. From the events menu, select Rules and enter the details in the panel. Refer to Chapter 10, Monitoring on page 249 for more details about configuring the system to provide notifications and setting up rules.
146
Chapter 7.
Host connectivity
This chapter discusses the host attachment capabilities for the XIV Storage System. It addresses key aspects of host attachment and reviews concepts and requirements for both Fibre Channel and Internet Small Computer System Interface (iSCSI) protocols. The information in this chapter applies to the various host operating systems that are compatible with XIV. Operating system-specific information is provided in subsequent chapters of the book. You can configure the XIV Storage System for the following adapter types and protocols: Fibre Channel adapters for support of Fibre Channel Protocol (FCP) Ethernet Adapter or iSCSI host bus adapter (HBA) for support of iSCSI over IP Ethernet networks As explained in Interface Module on page 54, the XIV Storage System has six host Interface Modules. Each host Interface Module contains four Fibre Channel ports, and three host Interface Modules contain two iSCSI ports. Use these ports to attach hosts and a remote XIV Storage System to the XIV Storage System. To simplify host cabling, the XIV Storage System has an integrated patch panel.
147
Host
Host
iSCSI
iSCSI
SAN Fabric 1
iSC SI
FCP
SAN Fabric 2
FCP
Ethernet Network
Any host traffic is served through six Interface Modules (numbers 4-9). Although the XIV Storage System distributes the traffic between I/O modules and Data Modules, it is important to understand that it is the storage administrators responsibility to ensure that host I/Os are equitably distributed among the various Interface Modules. This workload balance must be watched and reviewed over time when host traffic patterns change.
Important: Host I/Os are not automatically balanced by the system. It is the storage administrators responsibility to ensure that host connections are made to avoid a single point of failure (such as a Module or HBA) and that the host workload is adequately spread across the connections and Interface Modules.
FCP
FC FC FC FC FC FC FC FC ETH FC FC ETH FC FC ETH
Module 4
Module 5
Module 6
Module 7
Module 8
Module 9
FC
148
Patch Panel
Module 10-15
FCP
Module 9
FC FC ETH
FCP FCP
FC P
FC FC ETH
Module 8 Module 7
FC FC ETH
FCP
FCP
FCP
FCP
Host
FCP
FCP
Module 6
FC FC
iSCSI
Host
Module 5
FC FC
iSCSI
SI iSC
Module 4
FC FC
Host
Module 1-3
1-10 Ports
ETH
INTERNAL CABLES
EXTERNAL CABLES
Figure 7-2 Host connectivity end-to-end view: Internal cables compared to external cables
Figure 7-3 on page 150 provides additional details, such as the iSCSI-qualified name and Fibre Channel worldwide port names (WWPNs).
Chapter 7. Host connectivity
149
9 8 Interface Modules 7
IP(1) IP(2) ETH IP(3) IP(4) ETH IP(5) IP(6) ETH ...150 ...151 ...152 ...153 ...142 ...143 ...160 ...161 ...162 ...163 ...180 ...181 ...170 ...171
SAN Fabric 1
FC P
HBA 1 WWPN HBA 2 WWPN
P FC
SAN Fabric 2
P FC
HOST
6 5 4
...162 ...163 FC ...152 ...153 FC ...142 ...143 FC IP(5) IP(6) IP(3) IP(4) ...140 ...141 IP(1) IP(2)
I iSCS
IP(7) IP(8)
iqn.hostxy HOST
Patch Panel
Network
Hosts
An even more detailed view of host connectivity and variations is given for FCP in 7.2, Fibre Channel (FC) connectivity on page 152 and for iSCSI in 7.3, iSCSI connectivity on page 159.
150
You can also have a mix (or coexistence) of FC and iSCSI connections to attach various hosts (do not use both FC and iSCSI connections to the same host). Figure 7-4 shows, however, that it is possible to use FC and iSCSI connections concurrently and for attaching the same LUN. This configuration is usually not supported by most operating systems. For details, refer to the IBM System Storage Interoperability Center (SSIC) at: http://www.ibm.com/systems/support/storage/config/ssic/displayesssearchwithoutjs.w ss?start_over=yes We only mention it, because this capability is useful and solely recommended when migrating from a former storage system that only supports one of the protocols. Note: We do not recommend that you use both FCP and iSCSI for shared access to the same LUN from a single host.
HOST
HBA 1 WWPN
HBA 2 WWPN
FCP
iSCSI
Ethernet Network
iSCSI
IP(7)
iqn.hostxy
151
5. Install the adapter driver that came with your HBA or download and install an updated adapter driver.
7.2.2 FC configurations
Hosts can attach to the Fibre Channel ports either directly or through an FC fabric. Several configurations are technically possible, and they vary in terms of their cost and the degree of flexibility, performance, and reliability that they provide. A desirable goal is a high availability and high performance solution. Avoid a single point of failure in the connectivity solution and use as many connections as possible. However, to keep the cost of the solution in-line with the business requirement, less expensive, less desirable solutions can be justified. Next, we review the three most common FC topologies that are supported.
HBA 1 WWPN ...190 ...191 ...192 ...193 ...182 ...183 ...172 ...173 HBA 1 WWPN HBA 2 WWPN ...160 ...161 ...150 ...151 ...140 ...141 ...162 ...163 ...152 ...153 ...142 ...143
SAN Fabric 1
FC P
HBA 2 WWPN
P FC
SAN Fabric 2
Patch Panel
Network
Hosts
In this configuration: Each host is equipped with dual HBAs. Each HBA (or HBA port) is connected to one or two FC switches. Each of the FC switches has a connection to a separate FC port of each of the six Interface Modules. This configuration has no single point of failure: If a module fails, each host remains connected to the other five modules. If an FC switch fails, each host remains connected to all modules through the second FC switch. Upon an HBA port failure, the host can still connect over the other HBA port.
153
Single switch
This configuration must be used if only a single switch is available, as shown in Figure 7-6. Another better variation of this solution is to use two HBAs for each host, which then still makes the XIV Storage System resilient to a single HBA failure. Still, the unique SAN switch remains a potential single point of failure.
HBA 1 WWPN ...190 ...191 ...192 ...193 ...182 ...183 ...172 ...173 HBA 1 WWPN
FC P
HBA 1 WWPN HBA 1 WWPN HBA 1 WWPN
FCP
Patch Panel
Figure 7-6 FC configurations: Single switch
Network
Hosts
...190 ...191
FCP
Hosts 1 Hosts 2
Hosts n
Hosts 6
Patch Panel
Figure 7-7 FC configurations: Direct attach
Hosts
154
...190 ...191
1 2 3 4 5 6 1 2 3 4 5 6
SAN Fabric 1
FC
P
HBA 1 WWPN HBA 2 WWPN
Hosts 1
FCP
However, in large implementations, this approach also makes the zoning management effort grow drastically. A common way of zoning is then to have a single initiator - multiple targets zone as shown in Figure 7-9 on page 156. For more in-depth information about SAN zoning, refer to section 4.7 of the IBM Redbooks publication Introduction to Storage Area Networks, SG24-5470. You can download it from: http://www.redbooks.ibm.com/redbooks/pdfs/sg245470.pdf
155
SAN Fabric 1
...190 ...191 ...192 ...193 ...182 ...183 ...172 ...173
1 2
FCP
Hosts 1
Hosts 2
FCP
Network
Hosts
Follow these best practices recommendations: For general configurations, zone each host HBA to a single port from each of three Interface Modules, which provides six paths to dual HBA hosts. For high workload applications, consider zoning each HBA to one port from each of the six Interface Modules. Do not configure more than 24 logical paths per host. There is no advantage to configuring more than 24 logical paths to a single host and doing so can actually compromise overall stability. Remember also to use separate HBAs if you need to attach tape devices to a host that is also connected to the XIV Storage System. Disk and tape traffic are different in nature, and a specific HBA can only be set to either be disk-optimized or tape-optimized. Disks will run into timeouts and disconnect while tape drives will have to rewind and backup or restore performance drops significantly. Note: Use a single initiator zoning scheme. Do not share a host HBA for disk and tape access. Zone members are either identified by switch ports (hard zoning) or by their HBA worldwide port name (WWPN) (soft zoning). While it is simple to get the switch port number, several specific commands are required to get the WWPN of the target ports. The next section explains how to get them.
156
Example 7-1 XCLI: How to get WWPN of IBM XIV Storage System
C:\>xcli -c "XIV V10.0 MN00050" -u admin -p adminadmin fc_port_list Component ID Status Currently WWPN Port ID Role Functioning
1:FC_Port:4:4 1:FC_Port:4:3 1:FC_Port:4:2 1:FC_Port:4:1 1:FC_Port:5:4 1:FC_Port:5:3 1:FC_Port:5:2 1:FC_Port:5:1 1:FC_Port:6:4 1:FC_Port:6:3 1:FC_Port:6:2 1:FC_Port:6:1 1:FC_Port:7:4 1:FC_Port:7:3 1:FC_Port:7:2 1:FC_Port:7:1 1:FC_Port:9:4 1:FC_Port:9:3 1:FC_Port:9:2 1:FC_Port:9:1 1:FC_Port:8:4 1:FC_Port:8:3 1:FC_Port:8:2 1:FC_Port:8:1 OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes 5001738000320143 5001738000320142 5001738000320141 5001738000320140 5001738000320153 5001738000320152 5001738000320151 5001738000320150 5001738000320163 5001738000320162 5001738000320161 5001738000320160 5001738000320173 5001738000320172 5001738000320171 5001738000320170 5001738000320193 5001738000320192 5001738000320191 5001738000320190 5001738000320183 5001738000320182 5001738000320181 5001738000320180 00FFFFFF 00020C00 00010F00 00FFFFFF 00FFFFFF 00010E00 00010400 00020400 000000EF 000000EF 00010D00 00020500 00FFFFFF 00640900 001B0F00 00130B00 00120C00 00640800 000A1000 00120B00 00130C00 00FFFFFF 001B0E00 00111000 Initiator Target Target Target Initiator Target Target Target Initiator Target Target Target Initiator Target Target Target Initiator Target Target Target Initiator Target Target Target
To get same information from the XIV GUI, select the main view of an XIV Storage System, use the arrow at the bottom (circled in red) to reveal the patch panel and move the mouse cursor over a particular port to reveal the port details, including the WWPN (refer to Figure 7-10 on page 158).
157
Figure 7-10 GUI: How to get WWPNs of IBM XIV Storage System
Note: The WWPNs of an XIV Storage System are static. The last two digits of the WWPN indicate from which module and port the WWPN came. As shown in Figure 7-10, the WWPN is 5001738000320151, which means that the WWPN is from module 5 port 2. The ports in the WWPN are numbered from 0 to 3 (instead of 1 to 4). The values that comprise the WWPN are shown in Example 7-2.
Example 7-2 WWPN illustration
If WWPN is 50:01:73:8N:NN:NN:RR:MP 5 001738 NNNNN RR M P NAA (Network Address Authority) IEEE Company ID IBM XIV Serial Number in hex Rack ID (01-ff, 0 for WWNN) Module ID (1-f, 0 for WWNN) Port ID (0-7, 0 for WWNN)
Migration and Remote Mirroring require an initiator port. Note that in the default IBM XIV Storage System configuration, port 4 (component ID: 1:FC_Port:Module ID:4) of each Interface Module is configured as an initiator.
158
159
An iSCSI storage system can use Challenge Handshake Authentication Protocol (CHAP) to authenticate initiators, and initiators can likewise authenticate targets, such as the storage system. CHAP is a method of authenticating iSCSI users. The IBM XIV Storage System does not currently support CHAP. We therefore recommend that you segregate the iSCSI network on a private network.
High availability
Figure 7-11 illustrates the best practice for high availability (HA) iSCSI connectivity. This solution makes the best usage of the available iSCSI connectors in the XIV Storage System. Each Interface Module is connected through two ports to two separate Ethernet switches, and each host is connected to the two switches. This solution provides a network architecture resilient to failure of any individual network switch or Interface Module. Use IP(1) to IP(7) and IP(4) to IP(8) to spread traffic from the hosts. There is no additional management required for the physical connections on the storage side. In the case of a network failure, hosts still utilize all Interface Modules and caches.
9 8 Interface Modules 7
iSCS
iS C
SI
IP(7) IP(8)
HOST
6 5 4
Patch Panel
Network
Hosts
160
Note: This High Availability configuration is the best practice for iSCSI connectivity. For the best performance, use a dedicated iSCSI network infrastructure. Aggregation of ports is not possible in this solution.
Single switch
Single switch connectivity must only be used when cost is a concern or a second Ethernet switch is not available. As shown in Figure 7-12, the host has dual connections, and there are multiple connections to the different iSCSI-equipped Interface Modules for module resiliency. However, the Ethernet switch remains a single point of failure. To achieve hardware high availability, a compromise is to use a resilient Ethernet switch, such as the Cisco 6500. With this configuration, you can also bond the two iSCSI connections of a module to get a 2 Gigabit bandwidth. In this case, you only require one IP address per link aggregate (also refer to 7.3.2, Link aggregation on page 162).
9 8 Interface Modules 7
iSCSI
possible bonding
iSC SI
Ethernet Network
IP(7) IP(8)
HOST
6 5 4
Patch Panel
Network
Hosts
161
9 8 Interface Modules 7
iSC SI
IP(7) IP(8)
Host 1
IP(9) IP(10)
Host 2
IP(11) IP(12)
Host 3
6 5 4
Patch Panel
Hosts
A possible application for this configuration might be for a small test environment, while the production hosts are connected through FC.
As indicated before, an IQN is also required to uniquely identify the systems for iSCSI communication in the IP network and have it play the role of an iSCSI initiator. The rest of this section explains step-by-step how to use the GUI or XCLI to set up iSCSI communications.
2. The iSCSI connections window opens. Click the Define icon at the top of the window (refer to Figure 7-15) to open the Define IP interface dialog.
3. Enter the name, address, netmask, and gateway in the appropriate fields. The default MTU is 4500. All devices in a network must use the same MTU. If in doubt, set MTU to 1500, because 1500 is the default value for Gigabit Ethernet. Performance might be impacted if the MTU is set incorrectly. In this example, we also set up network aggregation by selecting both available network ports of one module, in this case, module 7 (refer to Figure 7-16).
163
C:\>xcli -c "XIV V10.0 MN00050" -u admin -p adminadmin ipinterface_create ipinterface=iSCSI_module7_bonding address=192.168.1.1 netmask=255.255.255.0 module=1:Module:7 ports=P1,P2 Command executed successfully.
Use the IP addresses shown in this view to connect a host to the corresponding iSCSI port for the XIV Storage System. To check that the connection was successfully established, move the mouse over the Hosts and LUNs icon in the main GUI window and select Host Connectivity from the Hosts and LUNs menu (refer to Figure 7-14 on page 163). A working connection with a specific iSCSI module/port is indicated by green check sign, as seen in Figure 7-18 on page 165. Note the identifier at beginning of the line (iqn.), because it is the only way to differentiate FCP from iSCSI connections here. This host (x342_alex) that we used for our illustration had an iSCSI HBA connected to three iSCSI ports on the XIV Storage Systems, but is only showing one of them as working.
164
If you need to analyze why a connection is not working, use the XCLI, because it provides additional capabilities for that purpose.
1 1 1 2 2
Note the column named Type that displays the role of each port (Management, VPN, or iSCSI). Again, port 2 of module 7 is not shown, because it is not configured yet. Also, note the MTU indicated on an individual port basis. To see a complete list of IP interfaces, including iSCSI, use the command ipinterface_list_ports. A reworked output of this command is shown in Example 7-5. To make it more readable and focus on iSCSI, the output was reworked to include only iSCSI Role ports.
Example 7-5 XCLI to list iSCSI ports with ipinterface_list_ports command
C:\>xcli.exe -c "XIV V10.0 MN00050" -u Index Role IP Interface Link Up? 1 iSCSI M7_P1 yes 2 iSCSI no 1 iSCSI M9_P1 yes 2 iSCSI iSCSI_M9_P2 yes 1 iSCSI M8_P1 yes 2 iSCSI iSCSI_M8_P2 yes
admin -p adminadmin ipinterface_list_ports Speed (MB/s) Full Duplex? Module 1000 yes 1:Module:7 0 no 1:Module:7 1000 yes 1:Module:9 1000 yes 1:Module:9 1000 yes 1:Module:8 1000 yes 1:Module:8
From the XCLI, you can use specific network commands to help in IP problem determination. Next, we illustrate the ipinterface_run_traceroute command and the ipinterface_run_arp command.
165
Example 7-6 shows the ipinterface_run_traceroute command. In this particular example, we look at the IP connectivity between two XIV Storage Systems. The result confirms that both systems are connected to the same Ethernet switch and that the iSCSI interface (IP 9.155.56.100) on the first system can communicate with iSCSI 9.155.56.58 on the second system. This output indicates that the two systems are in a remote copy relationship and are able to communicate.
Example 7-6 XCLI iSCSI diagnostics with traceroute
C:\>xcli.exe -c "XIV V10.0 MN00050" -u admin -p adminadmin ipinterface_run_traceroute localipaddress=9.155.56.100 remote=9.155.56.58 Command executed successfully. data=traceroute to 9.155.56.58 (9.155.56.58), 5 hops max, 40 byte packets data= 1 9.155.56.58 (9.155.56.58) 1.478 ms 0.267 ms 0.091 ms Example 7-7 illustrates the use of the ipinterface_run_arp command with the following scenario: One host with one iSCSI HBA (IP:192.168.1.4) is accessing all three modules of the XIV Storage System, and all three paths work. Another host with one iSCSI HBA (IP:192.168.1.7) has the same configuration, but only two of the paths work. Given that the first host can see a path to each of the modules, we can conclude that the problem is not with the Interface Module, and the problem determination must now focus on either the network connections to Interface Module M 9 or the second host.
Example 7-7 XCLI iSCSI diagnostics with arp
C:\>xcli.exe -c "XIV V10.0 MN00050" -u admin -p adminadmin ipinterface_run_arp localipaddress=192.168.1.1 Command executed successfully.
data=Address data=192.168.1.4 data=192.168.1.7 HWtype ether ether HWaddress 00:C0:DD:07:78:31 00:C0:DD:04:15:BB Flags Mask C C Iface M7_P1 M7_P1
C:\>xcli.exe -c "XIV V10.0 MN00050" -u admin -p adminadmin ipinterface_run_arp localipaddress=192.168.1.2 Command executed successfully.
data=Address data=192.168.1.7 data=192.168.1.4 HWtype ether ether HWaddress 00:C0:DD:04:15:BB 00:C0:DD:07:78:31 Flags Mask C C Iface M8_P1 M8_P1
C:\>xcli.exe -c "XIV V10.0 MN00050" -u admin -p adminadmin ipinterface_run_arp localipaddress=192.168.1.3 Command executed successfully.
data=Address data=192.168.1.4 HWtype ether HWaddress 00:C0:DD:07:78:31 Flags Mask C Iface M9_P1
166
Important: Do not attempt to change the IQN. If a change is required, you must engage IBM support. The IQN is visible as part of the XIV Storage System configuration properties. From the XIV GUI, click Configure System from the system main window menu bar to display the Configure System dialog that is shown in Figure 7-19. For the corresponding XCLI config_get command, refer to Example 7-8.
Figure 7-19 iSCSI: Use XIV GUI to get iSCSI name (IQN) Example 7-8 iSCSI: use XCLI to get iSCSI name (IQN)
C:\>xcli -c "XIV V10.0 MN00050" -u admin -p adminadmin config_get Command executed successfully. default_user= dns_primary= dns_secondary= email_reply_to_address= email_sender_address=PFE_XIV@de.ibm.com email_subject_format={severity}: {description} iscsi_name=iqn.2005-10.com.xivstorage:000050 machine_model=A14 machine_serial_number=MN00050 machine_type=2810 ntp_server= snmp_community=XIV snmp_contact=Unknown snmp_location=Unknown system_id=50 system_name=XIV V10.0 MN00050 timezone=-7200
167
Before we highlight the advantages of one solution over the other, here are several considerations affecting performance in both cases: To optimize performance, separate data from storage traffic. In other words, use a separate Ethernet switch for storage-related traffic. An iSCSI boot is possible with both types of initiators. However, software initiators are slightly more restrictive (check vendor documents about limitations). Using iSCSI is not recommended for latency sensitive applications. In that case, use FCP. The key benefits of using hardware initiators (iSCSI HBAs) are: Their performance will be noticeably faster. The traffic that passes through the HBAs will not load the servers CPU to the same extent as when the traffic goes through the standard IP stack (as is the case with software initiators). The key benefits of using software initiators (Ethernet network interface cards (NICs)) are: There are no daughter cards. The cost of the hardware (daughter cards) is avoided. There is no need for PCI slots (saves two slots). You use fewer IP addresses, because the two IP addresses that are used for storage traffic can also be used for data traffic. It is the least expensive possible storage networking solution, including both data and storage networking, by even avoiding the cost of additional switches; however, this method might impact performance. It is possible to access other network storage devices, such as network access servers (NASs), network file servers, or other file servers using the same network interfaces as those used for iSCSI. Note: The core idea of the XIV Storage System is to use commodity hardware and implement functionality in the software. Based on this philosophy, the system uses commodity Ethernet adapters for the iSCSI traffic.
168
Table 7-2 iSCSI connectivity parameter maximum values Parameter Maximum number of Interface Modules with iSCSI ports Maximum number of 1 GB iSCSI ports per Interface Module Maximum queue depth per iSCSI host port Maximum queue depth per mapped volume per (host port, target port, or volume) tuple Maximum iSCSI ports for any connection (host or XDRP) Maximum number of hosts (defined WWPNs and IQNs) Maximum number of mirroring coupling (number of mirrors) Maximum number of mirrors on remote machine Maximum number of remote targets Maximum value 3 2 1400 256 6 4000 128 128 4
169
170
9 8 Interface Modules 7
IP(1) IP(2) ETH IP(3) IP(4) ETH IP(5) IP(6) ETH ...150 ...151 ...152 ...153 ...142 ...143 ...160 ...161 ...162 ...163 ...180 ...181 ...170 ...171
SAN Fabric 1
FC P
P FC
SAN Fabric 2
P FC
FC HOST
6 5 4
...162 ...163 FC ...152 ...153 FC ...142 ...143 FC IP(5) IP(6) IP(3) IP(4) ...140 ...141 IP(1) IP(2)
iqn.2000-04.com.qlogic:host.ibm.com
I iSCS
IP(7) IP(8)
iSCSI HOST
Patch Panel
Network
Hosts
To prepare the hardware: 1. Hardware preparation: In our scenario, we assume that the systems are already in place and physically cabled (with redundancy) as shown in Figure 7-20. Write down the component names and IDs as illustrated in Table 7-3 on page 172 for our particular example.
171
Table 7-3 Example: Required component information Component IBM XIV FC HBAs FC environment WWPN: 5001738000320nnn nnn for Fabric1: 140, 150, 160, 170, 180, and 190 nnn for Fabric2: 142, 152, 162, 172, 182, and 192 HBA1 WWPN: 210100E08BAFA29E HBA2 WWPN: 210000E08B8FA29E N/A iSCSI environment N/A
Host HBAs
N/A
IP(1): 192.168.1.1 IP(2): 192.168.1.2 IP(3): 192.168.1.3 IP(4): 192.168.1.4 IP(5): 192.168.1.5 IP(6): 192.168.1.6 iqn.2005-10.com.xivstorage:00 0050 IP(7): 192.168.1.7 IP(8): 192.168.1.8 iqn.2000-04.com.qlogic:host.ib m.com
IBM XIV iSCSI IQN (do not change) Host IPs Host iSCSI IQN
The OS type is also required information. With the current XIV Storage System, it is only relevant for Hewlett-Packard UNIX (HP/UX), all other OSs use the same host type. 2. If the new server is using FCP to connect, it is preferable to first configure the network (SAN Fabric 1 and 2) and power on the host server, which populates the XIV Storage System list of WWPN hosts and allows a simple selection of the host adapter WWPN. The configuration steps for the FCP network include zoning: a. Log on to the Fabric 1 SAN switch and create a host zone (single initiator): Zonename: FChost_HBA1 Members: 5001738000320140, 5001738000320150, 5001738000320160, 5001738000320170, 5001738000320180, 5001738000320190, 210100E08BAFA29E b. Log on to the Fabric 2 SAN switch and create a host zone (single initiator): Zonename: FChost_HBA2 Members: 5001738000320142, 5001738000320152, 5001738000320162, 5001738000320172, 5001738000320182, 5001738000320192, 210000E08B8FA29E c. Add new zones to the current zone set, and save and apply the new configuration. For iSCSI connectivity, there is no zoning step, because the required IP addresses are easily entered manually. The subsequent steps can be performed via the XIV GUI or XCLI. For XIV GUI, continue with 7.4.2, Prepare for a new host: XIV GUI on page 173. For XCLI, go to 7.4.3, Prepare for a new host: XCLI on page 177.
172
4. The Hosts window is displayed showing a list of hosts that are already defined, if any. To add a new host or cluster, click either the Add Host or Add Cluster choice in the menu bar (refer to Figure 7-22). In our example, we select Add host.
5. The Define Host dialog is displayed as shown in Figure 7-23. Enter a name for the host. If a cluster definition was created in the previous step, it is available in the cluster drop-down list box. To add a server to a cluster, select a cluster name. Because we do not create a cluster in our example, we select None.
6. Repeat steps 4 and 5 to create a second host definition for the iSCSI-attached host (refer to Figure 7-24 on page 174).
173
7. Host access to LUNs is granted depending on the host adapter ID. For an FC connection, the host adapter ID is the FC HBA WWPN. For an iSCSI connection, the host adapter ID is the host or HBA IQN. To add a WWPN or IQN to a host definition, right-click the host and select Add Port from the context menu (refer to Figure 7-25).
8. The Add Port dialog is displayed as shown in Figure 7-26. Select port type FC or iSCSI. In this example, the FC host is defined first. Add the WWPN for HBA1 as listed in Table 7-3 on page 172. If the host is correctly connected and has done a port login at least one time, the WWPN is shown in the drop-down list box. Otherwise, you can manually enter the WWPN.
Now, proceed in the same manner to add the second HBA (HBA2) WWPN as shown in Figure 7-27 on page 175. The XIV Storage System does not care which FC port name is added first. 174
IBM XIV Storage System: Concepts, Architecture, and Usage
In our example, there is also an iSCSI host. An iSCSI HBA is installed in this host. In the Add port dialog, specify the port type as iSCSI and enter the IQN of the HBA as the iSCSI Name or port name (refer to Figure 7-28).
9. The final configuration step is to map the volume to the host. While still in the Hosts configuration pane, right-click the host to which the volume is to be mapped and select Map Volumes to this Host from the context menu (refer to Figure 7-29).
10.The Volume to Host Mapping window opens. The process of adding a volume to a host definition is straightforward in this panel (refer to Figure 7-30 on page 176): Select an available volume from the left pane. The GUI will suggest a LUN ID to which to map the volume.
Chapter 7. Host connectivity
175
Click Map and the volume is assigned immediately. Note the GUI enforces that a volume can only be connected to one host. (For a cluster, create a cluster definition and map volumes to the cluster definition.)
There is no difference in mapping a volume to an FC or iSCSI host in the XIV GUI volume mapping view. For completeness, add a volume to the iSCSI host defined in this example (refer to Figure 7-31).
11.To complete this example, power up the host server and check connectivity. The XIV Storage System has a real-time connectivity status overview. Select Hosts Connectivity from the Hosts and LUNs menu to access the connectivity status (refer to Figure 7-32).
The host connectivity window is displayed (from the XIV Storage System point of view). In our example, the ExampleFChost was expected to have dual path connectivity to every module. However, only two modules (5 and 6) show as connected (refer to Figure 7-33 on page 177), and the iSCSI host has no connection to module 9.
176
12.At this stage, the setup of the new FC and iSCSI hosts on the XIV Storage System is complete. To complete the additional configuration, OS-dependent steps must be performed, which are described in their relative OS chapters. 13.It is a best practice to document all changes performed on a production system.
C:\>xcli -c "XIV V10.0 MN00050" -u admin -p adminadmin host_define host=ExampleFChost Command executed successfully. C:\>xcli -c "XIV V10.0 MN00050" -u admin -p adminadmin host_define host=ExampleiSCSIhost Command executed successfully. 4. Host access to LUNs is granted depending on the host adapter ID. For an FC connection, the host adapter ID is the FC HBA WWPN. For an iSCSI connection, the host adapter ID is the IQN found on the host network interface card (NIC) with the software initiator or iSCSI HBA. In Example 7-10, the WWPN of the FC host for HBA1 and HBA2 is added with the host_add_port command and by specifying an fcaddress.
Example 7-10 Create FC port and add to host definition
C:\>xcli -c "XIV V10.0 MN00050" -u admin -p adminadmin host_add_port host=ExampleFChost fcaddress=210100E08BAFA29E Command executed successfully. C:\>xcli -c "XIV V10.0 MN00050" -u admin -p adminadmin host_add_port host=ExampleFChost fcaddress=210000E08B8FA29E Command executed successfully.
177
In Example 7-11, the IQN of the iSCSI host is added. Note this is the same host_add_por t command, but with the iscsi_name parameter instead of fcaddress.
Example 7-11 Create iSCSI port and add to the host definition
C:\>xcli -c "XIV V10.0 MN00050" -u admin -p adminadmin host_add_port host=ExampleiSCSIhost iscsi_name=iqn.2000-04.com.qlogic:host.ibm.com Command executed successfully. 5. The final configuration step is to map volumes to the host definition. Note that for a cluster the volumes are mapped to the cluster host definition. Again, there is no difference for FC or iSCSI mapping to a host. Both commands are shown in Example 7-12.
Example 7-12 XCLI example: Map volumes to hosts
C:\>xcli -c "XIV V10.0 MN00050" -u admin -p adminadmin map_vol host=ExampleFChost vol=ExampleFChost lun=1 Command executed successfully. C:\>xcli -c "XIV V10.0 MN00050" -u admin -p adminadmin map_vol host=ExampleiSCSIhost vol=ExampleiSCSIhost lun=1 Command executed successfully. 6. To complete the example, power up the server and check the host connectivity status from the XIV Storage System point of view. Example 7-13 shows the output for both hosts.
Example 7-13 XCLI example: Check host connectivity
In Example 7-13, there is only one path per host FC HBA instead of the expected six paths per host FC HBA, which was intentional in our setup for illustration purposes. Problem determination for a situation like this one needs to start at the FC fabric zoning and FC cabling of the XIV Storage System. Similarly, the iSCSI host in our example has one connection missing to module 9. Investigate to solve this situation by using diagnostic commands on the XIV Storage System to trace the route, check network cabling to module 9, and check the network switch configuration for a possible firewall or virtual local area network (VLAN) misconfiguration. 7. Setup of the new FC and iSCSI hosts on the XIV Storage System is now complete. The remaining steps are OS dependent and described in the relative OS chapters. 8. It is a best practice to document changes performed on a production system.
178
Chapter 8.
179
Multi-path support
Microsoft provides a multi-path framework and development kit called the Microsoft Multi-path I/O (MPIO). The driver development kit allows storage vendors to create Device Specific Modules (DSM) for MPIO and to build interoperable multi-path solutions that integrate tightly with the Microsoft Windows family of products. MPIO has been extended by IBM to support XIV Storage System. The current version of the MPIO framework (with XIV extension) is 1.21 and also requires the XIV Device Specific Module (DSM) 11.21. MPIO ensures the high availability of data by utilizing multiple different paths between the server on which the application executes and the storage where the data is physically stored. Microsoft MPIO support allows the initiator to establish multiple sessions with the same target and aggregate the duplicate devices into a single device exposed to Windows. 180
IBM XIV Storage System: Concepts, Architecture, and Usage
In other words, the MPIO framework provides an active/active policy for a Windows host system to connect to the XIV Storage System, and it can handle I/O on any path at any time.
181
Upgrading an existing MPIO driver requires a complete removal of the old version, and a regular installation of the new one. 182
IBM XIV Storage System: Concepts, Architecture, and Usage
During the upgrade process, we recommend that you remove all volumes already mapped to the system. As usual, we also recommend that you back up your system configuration prior to the upgrade. Important: Make sure to use the latest supported MPIO version. At the time of writing this book, it is Version 1.21. These are the steps of the upgrade process: 1. Download the latest MPIO installer package for XIV Storage System. 2. Extract the compressed package to a working directory. In our case, it is C:\Program Files\XIV\mpio\1.21\ 3. Open a command prompt and navigate to this directory. Make sure that the directory contains the latest MPIO installer files. 4. Issue the following command to remove any previous version of the MPIO framework: install -u C:\Program Files\XIV\mpio\1.21\ dsmxiv.inf Root\DSMXIV
5. Reboot the server. 6. Verify in Device Manager that Multi-Path Support has been removed from the SCSI and RAID controllers section. If for some reason it is still listed, make sure that the Windows server is not attached to any external storage regardless of the type of connection (FC, iSCSI, SAS, or others). Right-click the device driver and remove it manually. Finally, reboot the server again. 7. Proceed with the normal MPIO installation procedure as explained in Installing MPIO multi-path driver on page 181.
183
For 2Gbps FC and high-throughput workloads, each host HBA can be zoned to all six IMs for a total path count of 12. After the connections are in place and the zoning established, you will see the host worldwide names (WWNs) in the XIV Storage System. To check a specific WWN, use a command as shown in the following example: xcli -c Redbook fc_connectivity_list | find "210000E08B0B941D"
1:FC_Port:7:2 1 1:FC_Port:9:2 1
210000E08B0B941D 210000E08B0B941D
yes yes
In the XIV GUI, these ports will selectable from the drop-down list in the Host Port Definition window as shown in Figure 5-32 on page 116. For the detailed descriptions of host definition and volume mapping, refer to 5.5, Host definition and mappings on page 113. After the host definition and volume mapping have been done in the XIV Storage System, issue a Rescan Disk command in the Windows Computer Management window (right-click Disk Management). You get a list of the attached XIV volumes as shown in Figure 8-4.
The number of IBM 2810XIV SCSI Disk Devices depends on the number of paths from the host to the XIV Storage System. The mapped volume can be seen as illustrated in Figure 8-5 on page 185.
184
Figure 8-5 Mapped volume appears as a new disk in Windows Server 2003
185
2. The Installation Options dialog is displayed. Select Initiator Service and software initiator. Refer to Figure 8-7. Important: Do not select MPIO at this time. The MPIO framework is packaged together with a Device Specific Module for XIV Storage System.
3. Read the license agreement, select I Agree and click Next. Refer to Figure 8-8 on page 187.
186
4. The Microsoft iSCSI software initiator is now being installed. When the process is complete, click Finish as shown in Figure 8-9.
5. To verify the new device installation, check the status in the Device Manager window, under SCSI and RAID controllers. Refer to Figure 8-10 on page 188.
187
188
For the detailed description of host definition and volume mapping in XIV, refer to 5.5, Host definition and mappings on page 113. 2. In the iSCSI Initiator Properties window, select the Discovery tab and click Add in the Target Portals pane. Use one of your systems iSCSI IP addresses, as defined during the systems installation. To view IP addresses for the iSCSI ports in the XIV GUI, move the mouse cursor over the Hosts and LUNs icons in the main XIV window and select iSCSI Connectivity from the Host and LUNs menu as shown in Figure 8-12.
Alternatively, you can issue the Extended Command Line Interface (XCLI) command as shown in Example 8-2.
Example 8-2 List iSCSI interfaces c:\>xcli -c Redbook ipinterface_list | find "iSCSI iSCSI_M8_P1 iSCSI 9.155.56.80 255.255.255.0 9.155.56.1 iSCSI_M7_P1 iSCSI 9.155.56.81 255.255.255.0 9.155.56.1 4500 1:Module:8 1 4500 1:Module:7 1
You can see that the iSCSI addresses used in our test environment are 9.155.56.80 and 9.155.56.81. 3. If your host is equipped with more than one Ethernet adapter (Network Interface Card (NIC)), click Advanced (in the Discovery tab of the iSCSI Initiator Properties window) and select iSCSI Software Initiator and your preferred IP address. To add the Target Portal, click OK as shown in Figure 8-13.
4. The XIV Storage System is now being discovered by the initiator. Change to the Targets tab in the iSCSI Initiator Properties window to see the discovered XIV Storage System. Refer to Figure 8-14 on page 190. The storage most likely shows as inactive status.
189
To activate the connection, click Log On. 5. In the Log On to Target pop-up window, select Enable multi-path as shown in Figure 8-15. You can select the first check box too if you want to automatically restore this connection at the system boot time. Click Advanced.
6. The Advanced Settings window is displayed. Select the Microsoft iSCSI initiator from the Local adapter drop-down. In the Source IP drop-down, click the IP address that is connected to the first iSCSI LAN, and in the Target Portal, select the first available IP address of the XIV Storage System as illustrated in Figure 8-16 on page 191. Click OK. You are returned to the parent window. Click OK again.
190
7. The iSCSI Target connection status now shows as active and connected. Make sure that the target is in the Connected status. The redundant paths are not yet configured. To do so, repeat this process for all IP addresses in your system. In other words, establish connection sessions to all of the desired XIV iSCSI interfaces from all of your desired source IP addresses. After the iSCSI sessions are created to each target portal, you can see details of the sessions. Select the iSCSI target in the target list (Figure 8-14 on page 190), and click Details to verify the sessions of the connection. Refer to Figure 8-17.
191
Depending on your environment, numerous sessions can appear, according to what you have configured. If you have already mapped volumes to the host system, you will see them under the Devices tab. If no volumes are mapped to this host yet, you can allocate them now. Another way to verify your allocated disks is to open the Windows Device Manager as shown in Figure 8-18.
Figure 8-18 Windows Device Manager with XIV disks connected through iSCSI
192
Interoperability
IBM XIV Storage System supports different versions of AIX operating system, either via Fibre Channel (FC) or iSCSI connectivity. Various FC Host Bus Adapters (HBAs) are supported. Supported IBM-branded EMULEX HBAs and IBM HBA Firmware versions can be found at: http://www.software.ibm.com/webapp/set2/firmware/gjs For the supported version of AIX and its hardware environment, refer to the latest XIV interoperability information at the following Web site: http://www.ibm.com/systems/support/storage/config/ssic/displayesssearchwithoutjs.w ss?start_over=yes
Prerequisites
If the current AIX operating system level installed on your system is not a level that is compatible with XIV, you must upgrade the system prior to attaching the XIV storage. To determine the maintenance package or technology level currently installed on your system, use the oslevel command as shown in Example 8-3.
Example 8-3 Determine current AIX version and maintenance level
# oslevel -g Fileset Actual Level Maintenance Level ----------------------------------------------------------------------------bos.rte 5.3.8.0 5.3.0.0 At the time of writing this book, a binary AIX patch is required prior to the installation according to the current oslevel. These patches are shown in the Table 8-1.
Table 8-1 Details of AIX patches When Before AIX level AIX 5.3 TL7 AIX 5.3 TL7 AIX 5.3 TL8 AIX 5.3 TL9 After AIX 5.3 TL9 AIX 6.1 TL0 AIX 6.1 TL1 AIX 6.1 TL2 After AIX 6.1 TL2 Patch Not supported IZ28969 IZ28970 IZ28047 iFIX/APAR is not required IZ28002 IZ28004 IZ28079 iFIX/APAR is not required
193
The following best practices documents describe system planning and support procedures for the AIX operating system: http://www.software.ibm.com/webapp/set2/sas/f/best/aix_service_strategy_v3.pdf Download AIX upgrades from the IBM Fix Central Web site: http://www.ibm.com/eserver/support/fixes/fixcentral/main/pseries/aix Before further configuring your host system or the XIV Storage System, make sure that the physical connectivity between the XIV and the POWER system is properly established. In addition to proper cabling, if using FC switched connections, you must ensure a correct zoning (using the WWPN numbers of the AIX host). Refer to 7.2, Fibre Channel (FC) connectivity on page 152 for the recommended cabling and zoning setup.
# lsdev -Cc adapter | grep fcs0 Available 02-00 4Gb fcs1 Available 00-00 4Gb fcs2 Available 00-01 4Gb
fcs FC PCI Express Adapter (df1000fe) FC PCI Express Adapter (df1000fe) FC PCI Express Adapter (df1000fe)
This example shows that, in our case, we have three FC ports. Another useful command that is shown in Example 8-5 returns not just the ports, but also where the Fibre Channel adapters reside in the system (in which PCI slot). This command can be used to physically identify in what slot a specific adapter is placed.
Example 8-5 Locating FC adapters
# lsslot -c pci | grep fcs U789D.001.DQD73N0-P1-C2 PCI-E capable, Rev 1 slot with 8x lanes U789D.001.DQD73N0-P1-C6 PCI-E capable, Rev 1 slot with 8x lanes
To obtain the Worldwide Port Name (WWPN) of each of the POWER system FC adapters, you can use the lscfg command, as shown in Example 8-6 on page 195.
194
Example 8-6 Finding Fibre Channel adapter WWN lscfg -vl fcs0 fcs0 U1.13-P1-I1/Q1 FC Adapter
Part Number.................00P4494 EC Level....................A Serial Number...............1A31005059 Manufacturer................001A Feature Code/Marketing ID...2765 FRU Number.................. 00P4495 Network Address.............10000000C93318D6 ROS Level and ID............02C03951 Device Specific.(Z0)........2002606D Device Specific.(Z1)........00000000 Device Specific.(Z2)........00000000 Device Specific.(Z3)........03000909 Device Specific.(Z4)........FF401210 Device Specific.(Z5)........02C03951 Device Specific.(Z6)........06433951 Device Specific.(Z7)........07433951 Device Specific.(Z8)........20000000C93318D6 Device Specific.(Z9)........CS3.91A1 Device Specific.(ZA)........C1D3.91A1 Device Specific.(ZB)........C2D3.91A1 Device Specific.(YL)........U1.13-P1-I1/Q1
You can also print the WWPN of an HBA directly by issuing this command: lscfg -vl <fcs#> | grep Network The # stands for the instance of any FC HBA that you want to query. After you have identified the FC adapter in the system, use the lsattr command to list its attributes. Refer to Example 8-7.
Example 8-7 Listing FC adapter attributes in AIX operating system
# lsattr -El fcs0 bus_intr_lvl bus_io_addr 0xffc00 bus_mem_addr 0xffebf000 init_link al intr_msi_1 66085 intr_priority 3 lg_term_dma 0x800000 max_xfer_size 0x100000 num_cmd_elems 200 pref_alpa 0x1 sw_fc_class 2
Bus interrupt level False Bus I/O address False Bus memory address False INIT Link flags True Bus interrupt level False Interrupt priority False Long term DMA True Maximum Transfer Size True Maximum number of COMMANDS to queue to the adapter True Preferred AL_PA True FC Class for Fabric True
At this point, you can define the AIX host system on the XIV Storage and assign FC ports for the WWPNs. If the FC connection was correctly done, the zoning enabled, and the FC adapters are in an available state on the host, these ports will be selectable from the drop-down list in the Host Port Definition window of the XIV Graphical User Interface. Refer to Figure 5-32 on page 116.
195
For the detailed description of host definition and volume mapping, refer to 5.5, Host definition and mappings on page 113.
It might happen that disk drives with an Other FC SCSI Disk Drive description appear in the system if FC discovery (cfgmgr) was run before the previously mentioned package installation was completed. In that case, remove these drives and run the discovery procedure again. The removal and running the cfgmgr procedure are illustrated in Example 8-9.
Example 8-9 Cleanup and reconfiguration
# lsdev -Cc disk hdisk0 Available Virtual SCSI Disk Drive hdisk2 Available 20-58-02 Other FC SCSI Disk Drive hdisk3 Available 20-58-02 Other FC SCSI Disk Drive 196
IBM XIV Storage System: Concepts, Architecture, and Usage
# rmdev -dl hdisk2 hdisk2 deleted # rmdev -dl hdisk3 hdisk3 deleted # cfgmgr -l fcs0 # cfgmgr -l fcs1 Now, when we list the disks, we see the correct number of disks from the storage, and we see them labeled as XIV disks, as shown in Example 8-10.
Example 8-10 XIV labeled FC disks
# lsdev -Cc disk hdisk0 Available Virtual SCSI Disk Drive hdisk1 Available 00-00-02 IBM 2810XIV Fibre Channel Disk hdisk2 Available 00-00-02 IBM 2810XIV Fibre Channel Disk
The management of MPIO devices is described in the online guide System Management Guide: Operating System and Devices for AIX 5L from the AIX documentation Web site at:
http://publib16.boulder.ibm.com/pseries/en_US/aixbman/baseadmn/manage_mpio.htm
197
Consequently, we recommend that you switch the algorithm to fail-over and assign a greater queue_depth of 32 (as a typical value). Tip: Use the fail_over algorithm with a queue depth of 32. Then, make sure to load balance the I/Os across the FC adapters and paths by setting the path priority attribute for each LUN so that 1/nth of the LUNs are assigned to each of the n FC paths. A fail-over algorithm can be used in such a way that it is superior to any load balancing algorithm (such as round-robin). First, consider, that any load balancing algorithm must consume CPU and memory resources to determine the best path to use. Second, it is possible to set up fail-over LUNs so that the loads are balanced across the available FC adapters. Let us use an example with two FC adapters. Assume that we correctly lay out our data so that the I/Os are balanced across the LUNs, which is usually a best practice. Then, if we assign half the LUNs to FC adapterA and half to FC adapterB, the I/Os are evenly balanced across the adapters. There might be times when I/Os to one LUN on FC adapterA are higher than a LUN on FC adapterB. The question to then ask is, Will the additional load on the adapter have a significant impact on I/O latency? In most cases, because the FC adapters are capable of handling more than 30 000 IOPS, we are unlikely to bottleneck at the adapter and add significant latency to the I/O. There is also a priority attribute for paths that can be used to specify a preference for the path used for I/Os (as of this writing, the lspath man page incorrectly refers to this as the weight attribute). The effect of the priority attribute depends whether the hdisk algorithm attribute is set to fail_over or round_robin: For algorithm=fail_over, then the path with the higher priority value handles all the I/Os unless there is a path failure, then the other path will be used. After a path failure and recovery, if you have IY79741 installed, I/Os will be redirected down the path with the highest priority; otherwise, if you want the I/Os to go down the primary path, you will have to use chpath to disable the secondary path, and then re-enable it. If the priority attribute is the same for all paths, the first path listed with lspath -Hl <hdisk> will be the primary path. So, you can set the primary path to be used by setting its priority value to 1, and the next paths priority (in case of path failure) to 2, and so on. For algorithm=round_robin, and the priority attributes are the same, then I/Os go down each path equally. If you set pathAs priority to 1 and pathBs to 255, for every I/O going down pathA, there will be 255 I/Os sent down pathB. To change the path priority of an MPIO device, use the chpath command. Refer to Example 8-14 on page 199 for an illustration.
198
It can also be used to read the attributes of a given path to an MPIO-capable device. It is shown in Example 8-13. It is also good to know that the <connection> info is either <SCSI ID>, <LUN ID> for SCSI, (for example, 5,0 ) or <WWN>, <LUN ID> for FC devices.
Example 8-13 The lspath command reads attributes of the 0 path for hdisk2
# lspath -AHE -l hdisk2 -p fscsi0 -w "5001738000cb0181,0" attribute value description user_settable scsi_id 0x120d00 N/A False node_name 0x5001738000cb0000 FC Node Name False priority 1 Priority True The chpath command is used to perform change operations on a specific path. It can either change the operational status or tunable attributes associated with a path. It cannot perform both types of operations in a single invocation. Example 8-14 illustrates the use of the chpath command with an XIV Storage System, which sets the primary path to fscsi1 using the first path listed (there are two paths from the switch to the storage for this adapter). Then for the next disk, we set the priorities to 4, 1, 2, 3 respectively. Assuming the I/Os are relatively balanced across the hdisks, this will balance the I/Os evenly across the paths.
Example 8-14 The chpath command
# lspath -l hdisk34 -F"status parent connection" Enabled fscsi1 5001738000230190,21000000000000 Enabled fscsi1 5001738000230180,21000000000000 Enabled fscsi3 5001738000230190,21000000000000 Enabled fscsi3 5001738000230180,21000000000000 # chpath -l hdisk34 -p fscsi1 -w 5001738000230180,21000000000000 -a priority=2 path Changed # chpath -l hdisk34 -p fscsi3 -w 5001738000230190,21000000000000 -a priority=3 path Changed # chpath -l hdisk34 -p fscsi3 -w 5001738000230180,21000000000000 -a priority=4 path ChangedL The rmpath command unconfigures or undefines, or both, one or more paths to a target device. It is not possible to unconfigure (undefine) the last path to a target device using the rmpath command. The only way to unconfigure (undefine) the last path to a target device is to unconfigure the device itself (for example, use the rmdev command). Refer to the man pages of the MPIO commands for more information.
199
To make sure that your system is equipped with the required filesets, run the lslpp command as shown in Example 8-15. We used the AIX Version 5.3 operating system with Technology level 9 in our illustrations.
Example 8-15 Verifying installed iSCSI filesets in AIX
lslpp -la "*.iscsi*" Fileset Level State Description ---------------------------------------------------------------------------Path: /usr/lib/objrepos devices.common.IBM.iscsi.rte 5.3.0.60 COMMITTED Common iSCSI Files 5.3.8.0 COMMITTED Common iSCSI Files devices.iscsi.disk.rte 5.3.0.60 COMMITTED iSCSI Disk Software 5.3.7.0 COMMITTED iSCSI Disk Software devices.iscsi.tape.rte 5.3.0.30 COMMITTED iSCSI Tape Software devices.iscsi_sw.rte 5.3.0.60 COMMITTED iSCSI Software Device Driver 5.3.8.0 COMMITTED iSCSI Software Device Driver Path: /etc/objrepos devices.common.IBM.iscsi.rte 5.3.0.60 COMMITTED Common iSCSI Files 5.3.8.0 COMMITTED Common iSCSI Files devices.iscsi_sw.rte 5.3.0.60 COMMITTED iSCSI Software Device Driver 5.3.8.0 COMMITTED iSCSI Software Device Driver At the time of writing this book, only AIX iSCSI software initiator is supported for connecting to the XIV Storage System.
Volume Groups
To avoid configuration problems and error log entries when you create Volume Groups using iSCSI devices, follow these guidelines: Configure Volume Groups that are created using iSCSI devices to be in an inactive state after reboot. After the iSCSI devices are configured, manually activate the iSCSI-backed Volume Groups. Then, mount any associated file systems. Volume Groups are activated during a different boot phase than the iSCSI software driver. For this reason, it is not possible to activate iSCSI Volume Groups during the boot process. Do not span Volume Groups across non-iSCSI devices.
200
I/O failures
To avoid I/O failures: If connectivity to iSCSI target devices is lost, I/O failures occur. To prevent I/O failures and file system corruption, stop all I/O activity and unmount iSCSI-backed file systems before doing anything that will cause long term loss of connectivity to the active iSCSI targets. If a loss of connectivity to iSCSI targets occurs while applications are attempting I/O activities with iSCSI devices, I/O errors will eventually occur. It might not be possible to unmount iSCSI-backed file systems, because the underlying iSCSI device stays busy. File system maintenance must be performed if I/O failures occur due to loss of connectivity to active iSCSI targets. To do file system maintenance, run the fsck command.
# lsattr -El iscsi0|grep initiator_name initiator_name iqn.com.ibm.de.mainz.p6-570-lab-2v17.hostid.099b325a iSCSI Initiator Name True 6. The Maximum Targets Allowed field corresponds to the maximum number of iSCSI targets that can be configured. If you reduce this number, you also reduce the amount of network memory pre-allocated for the iSCSI protocol driver during configuration. After the software initiator is configured, define iSCSI targets that will be accessed by the iSCSI software initiator. To specify those targets: 1. First of all, you have to know one of your iSCSI IP addresses in the XIV Storage System. To get that information, select iSCSI Connectivity from the Host and LUNs menu as shown in Figure 8-20 on page 202. Or just issue the following command in Example 8-17 in the XCLI.
Example 8-17 List iSCSI interfaces
c:\>xcli -c Redbook ipinterface_list | find "iSCSI" iSCSI_M8_P1 iSCSI 9.155.56.80 255.255.255.0 9.155.56.1 iSCSI_M7_P1 iSCSI 9.155.56.81 255.255.255.0 9.155.56.1
1 1
201
You can see our current iSCSI addresses are: 9.155.56.80 and 9.155.56.81. 2. The next step is find the iSCSI name (IQN) of the XIV Storage. To get this information, navigate to the basic system view in the XIV GUI and right-click the XIV Storage box itself and select Properties. The System Properties window appears as shown in Figure 8-21.
If you are using XCLI, issue the config_get command. Refer to Example 8-18.
Example 8-18 The config_get command in XCLI
C:\>xcli -c ESP config_get | find "iscsi" iscsi_name=iqn.2005-10.com.xivstorage:000203 3. Go back to the AIX system and edit the /etc/iscsi/targets file to include the iSCSI targets needed during device configuration: Note: The iSCSI targets file defines the name and location of the iSCSI targets that the iSCSI software initiator will attempt to access. This file is read any time that the iSCSI software initiator driver is loaded. Each uncommented line in the file represents an iSCSI target. iSCSI device configuration requires that the iSCSI targets can be reached through a properly configured network interface. Although the iSCSI software initiator can work using a 10/100 Ethernet LAN, it is designed for use with a gigabit Ethernet network that is separate from other network traffic. 202
Include your specific connection information in the targets file as shown in Example 8-19. Insert a HostName PortNumber and iSCSIName similar to what is shown in this example.
Example 8-19 Inserting connection information into /etc/iscsi/targets file in AIX operating system
9.155.56.80 3260 iqn.2005-10.com.xivstorage:000203 4. After editing the /etc/iscsi/targets file, enter the following command at the AIX prompt: cfgmgr -l iscsi0 This command will reconfigure the software initiator driver, and this command causes the driver to attempt to communicate with the targets listed in the /etc/iscsi/targets file, and to define a new hdisk for each LUN found on the targets. Note: If the appropriate disks are not defined, review the configuration of the initiator, the target, and any iSCSI gateways to ensure correctness. Then, rerun the cfgmgr command. If you want to further configure parameters for iSCSI software initiator devices, use SMIT.
203
to the first place in the mapping view and it will replace the management LUN to your volume and assign the zero value to it. To see the mapping method, refer to 5.5.1, Managing hosts and mappings with XIV GUI on page 113.
8.3 Linux
There are several organizations (distributors) that bundle the Linux kernel, tools, and applications to form a distribution, a package that can be downloaded or purchased and installed on a computer. Several of these distributions are commercial; others are not. The Linux kernel, along with the tools and software needed to run an operating system, are maintained by a loosely organized community of thousands of (mostly) volunteer programmers.
8.3.1 Support issues that distinguish Linux from other operating systems
Linux is different from the other proprietary operating systems in many ways: There is no one person or organization that can be held responsible or called for support. Depending on the target group, the distributions differ largely in the kind of support that is available. Linux is available for almost all computer architectures. Linux is rapidly changing. All these factors make it difficult to promise and provide generic support for Linux. As a consequence, IBM has decided on a support strategy that limits the uncertainty and the amount of testing. IBM only supports the major Linux distributions that are targeted at enterprise clients: Red Hat Enterprise Linux SUSE Linux Enterprise Server These distributions have release cycles of about one year, are maintained for five years, and require you to sign a support contract with the distributor. They also have a schedule for regular updates. These factors mitigate the issues listed previously. The limited number of supported distributions also allows IBM to work closely with the vendors to ensure interoperability and support. Details about the supported Linux distributions and supported SAN boot environments can be found in the System Storage Interoperation Center (SSIC): http://www-03.ibm.com/systems/support/storage/config/ssic/displayesssearchwithoutj s.wss?start_over=yes
204
1. Install the HBAs on the Linux server and configure the options according to the HBA manufacturers instructions. Check the Fibre Channel physical connection from your host to the XIV Storage System. 2. Make configuration changes and install the additional packages required on the Linux host to support the XIV Storage System. 3. Configure Device-Mapper multi-pathing. 4. Configure the host and volumes, and define host mappings in the XIV Storage System. 5. Reboot the Linux server to discover the volumes created on the XIV Storage System. Our environment to prepare the examples that we present in the remainder of this section consisted of an IBM System x server x345 with QLogic HBAs QLA2340, which ran Red Hat Enterprise Linux 5.2.
[root@x345-tic-30 ~]# tar -xvzf qla2xxx-v8.02.14_01-dist.tgz qlogic/ qlogic/drvrsetup qlogic/libinstall qlogic/libremove qlogic/qla2xxx-src-v8.02.14_01.tar.gz qlogic/qlapi-v4.00build12-rel.tgz qlogic/README.qla2xxx [root@x345-tic-30 ~]# cd qlogic/ [root@x345-tic-30 qlogic]# ./drvrsetup Extracting QLogic driver source... Done. [root@x345-tic-30 qlogic]# cd qla2xxx-8.02.14/ [root@x345-tic-30 qla2xxx-8.02.14]# ./extras/build.sh install QLA2XXX -- Building the qla2xxx driver, please wait... Installing intermodule.ko in /lib/modules/2.6.18-92.el5/kernel/kernel/ QLA2XXX -- Build done. QLA2XXX -- Installing the qla2xxx modules to /lib/modules/2.6.18-92.el5/kernel/drivers/scsi/qla2xxx/... Set the queue depth to 127, disable the failover mode for the driver, and set the timeout for a PORT-DOWN status before returning I/O back to the OS to 1 in the /etc/modprobe.conf. Refer to Example 8-21 for details.
Example 8-21 Modification of /etc/modprobe.conf for the XIV
[root@x345-tic-30 > options qla2xxx > options qla2xxx > options qla2xxx
>
205
> EOF [root@x345-tic-30 qla2xxx-8.02.14]# cat /etc/modprobe.conf alias eth0 e1000 alias eth1 e1000 alias scsi_hostadapter mptbase alias scsi_hostadapter1 mptspi alias scsi_hostadapter2 qla2xxx install qla2xxx /sbin/modprobe qla2xxx_conf; /sbin/modprobe --ignore-install qla2xxx remove qla2xxx /sbin/modprobe -r --first-time --ignore-remove qla2xxx && { /sbin/modprobe -r --ignore-remove qla2xxx_conf; } alias qla2100 qla2xxx alias qla2200 qla2xxx alias qla2300 qla2xxx alias qla2322 qla2xxx alias qla2400 qla2xxx options qla2xxx qlport_down_retry=1 options qla2xxx ql2xfailover=0 We now have to build a new RAM disk image, so that the driver will be loaded by the operating system loader after a boot. Next, we reboot the Linux host as shown in Example 8-22.
Example 8-22 Build a new ram disk image
[root@x345-tic-30 qla2xxx-8.02.14]# cd /boot/ [root@x345-tic-30 boot]# cp -f initrd-2.6.18-92.el5.img initrd-2.6.18-92.el5.img.bak [root@x345-tic-30 boot]# mkinitrd -f initrd-2.6.18-92.el5.img 2.6.18-92.el5 [root@x345-tic-30 boot]# reboot Broadcast message from root (pts/1) (Tue Aug The system is going down for reboot NOW! 5 13:57:28 2008):
[root@x345-tic-30 ~]# cat /etc/selinux/config # This file controls the state of SELinux on the system. # SELINUX= can take one of these three values: # enforcing - SELinux security policy is enforced. # permissive - SELinux prints warnings instead of enforcing. # disabled - SELinux is fully disabled. SELINUX=disabled # SELINUXTYPE= type of policy in use. Possible values are: # targeted - Only targeted network daemons are protected. # strict - Full SELinux protection. SELINUXTYPE=targeted Download the udev package from the following Web site: 206
IBM XIV Storage System: Concepts, Architecture, and Usage
https://launchpad.net/udev/main/ Compile the udev package and install the scsi_id_t10 package as illustrated in Example 8-24.
Example 8-24 Compilation of udev and installation of scsi_id_t10
[root@x345-tic-30 ~]# tar jxvf udev-095.tar.bz2 . . [root@x345-tic-30 ~]# cd udev-095 [root@x345-tic-30 udev-095]# cd extras/scsi_id [root@x345-tic-30 scsi_id]# patch -l -p 1 scsi_serial.c << EOF > 40a41 > > { SCSI_ID_T10_VENDOR, SCSI_ID_NAA_DONT_CARE, SCSI_ID_ASCII }, > 59d59 > < { SCSI_ID_T10_VENDOR, SCSI_ID_NAA_DONT_CARE, SCSI_ID_ASCII }, > EOF patching file scsi_serial.c Hunk #2 FAILED at 60. 1 out of 2 hunks FAILED -- saving rejects to file scsi_serial.c.rej [root@x345-tic-30 scsi_id]# patch -l -p 1 scsi_serial.c << EOF > 40a41 > > { SCSI_ID_T10_VENDOR, SCSI_ID_NAA_DONT_CARE, SCSI_ID_ASCII }, > 59d59 > < { SCSI_ID_T10_VENDOR, SCSI_ID_NAA_DONT_CARE, SCSI_ID_ASCII }, > EOF patching file scsi_serial.c Hunk #2 succeeded at 42 (offset -18 lines). [root@x345-tic-30 scsi_id]# make -C ../.. EXTRAS=extras/scsi_id USE_STATIC=true 2>&1 | grep -v warning make: Entering directory `/root/udev-095' GENHDR udev_version.h CC udev_device.o CC udev_config.o CC udev_node.o CC udev_db.o CC udev_sysfs.o CC udev_rules.o CC udev_rules_parse.o CC udev_utils.o CC udev_utils_string.o CC udev_utils_file.o CC udev_utils_run.o CC udev_libc_wrapper.o AR libudev.a RANLIB libudev.a CC udev.o LD udev CC udevd.o LD udevd CC udevtrigger.o LD udevtrigger CC udevsettle.o LD udevsettle CC udevcontrol.o LD udevcontrol
207
CC udevmonitor.o LD udevmonitor CC udevinfo.o LD udevinfo CC udevtest.o LD udevtest CC udevstart.o LD udevstart make[1]: Entering directory `/root/udev-095/extras/scsi_id' GENHDR scsi_id_version.h CC scsi_id.o CC scsi_serial.o LD scsi_id make[1]: Leaving directory `/root/udev-095/extras/scsi_id' make: Leaving directory `/root/udev-095' [root@x345-tic-30 scsi_id]# /bin/cp -f scsi_id /lib/udev/scsi_id_t10 [root@x345-tic-30 scsi_id]# ln -s -f /lib/udev/scsi_id_t10 /sbin [root@x345-tic-30 scsi_id]# cd ../../.. [root@x345-tic-30 ~]# /bin/rm -rf udev-095
[root@x345-tic-30 ~]# chkconfig --add multipathd [root@x345-tic-30 ~]# chkconfig --level 2345 multipathd on [root@x345-tic-30 ~]# /bin/cp -p /etc/multipath.conf /etc/multipath.conf.`date +%d-%m-%Y.%H:%M:%S` [root@x345-tic-30 ~]# cat > /etc/multipath.conf << EOF > blacklist { > device { > vendor "IBM-ESXS" > } > device { > vendor "LSILOGIC" > } > device { > vendor "ATA" > } > > } > devices { > device { > vendor "IBM" > product "2810XIV" > selector "round-robin 0" > path_grouping_policy multibus > rr_min_io 1000 > getuid_callout "/sbin/scsi_id_t10 -g -u -s /block/%n" > path_checker tur > failback immediate > no_path_retry queue 208
IBM XIV Storage System: Concepts, Architecture, and Usage
[root@x345-tic-30 ~]# cat /proc/scsi/qla2xxx/2|grep scsi-qla0-adapter-port scsi-qla0-adapter-port=210000e08b08e7c4; [root@x345-tic-30 ~]# cat /proc/scsi/qla2xxx/3|grep scsi-qla1-adapter-port scsi-qla1-adapter-port=210000e08b0b973c; Create and map two volumes to the Linux host, as described in 5.5, Host definition and mappings on page 113.
[root@x345-tic-30 ~]# cat /proc/scsi/scsi Attached devices: Host: scsi0 Channel: 00 Id: 00 Lun: 00 Vendor: LSILOGIC Model: 1030 IM Type: Direct-Access Host: scsi0 Channel: 00 Id: 02 Lun: 00 Vendor: IBM Model: IC35L018UCD210-0 Type: Direct-Access Host: scsi0 Channel: 00 Id: 08 Lun: 00 Vendor: IBM Model: 32P0032a S320 1 Type: Processor Host: scsi0 Channel: 01 Id: 00 Lun: 00 Vendor: IBM Model: IC35L018UCD210-0 Type: Direct-Access Host: scsi2 Channel: 00 Id: 00 Lun: 00 Vendor: IBM Model: 2810XIV-LUN-0 Type: RAID Host: scsi2 Channel: 00 Id: 02 Lun: 00 Vendor: IBM Model: 2810XIV Type: Direct-Access Host: scsi2 Channel: 00 Id: 02 Lun: 01 Vendor: IBM Model: 2810XIV Type: Direct-Access Host: scsi3 Channel: 00 Id: 02 Lun: 00 Vendor: IBM Model: 2810XIV Type: Direct-Access Host: scsi3 Channel: 00 Id: 02 Lun: 01 Vendor: IBM Model: 2810XIV Type: Direct-Access [root@x345-tic-30 ~]# cat /proc/partitions
Rev: 1000 ANSI SCSI revision: 02 Rev: S5BS ANSI SCSI revision: 03 Rev: 1 ANSI SCSI revision: 02 Rev: S5BS ANSI SCSI revision: 03 Rev: 10.0 ANSI SCSI revision: 05 Rev: 10.0 ANSI SCSI revision: 05 Rev: 10.0 ANSI SCSI revision: 05 Rev: 10.0 ANSI SCSI revision: 05 Rev: 10.0 ANSI SCSI revision: 05
209
major minor
#blocks
name
8 0 17921024 sda 8 1 104391 sda1 8 2 17816085 sda2 8 16 17921835 sdb 8 17 17920476 sdb1 253 0 33652736 dm-0 253 1 2031616 dm-1 8 32 16777216 sdc 8 48 16777216 sdd 8 64 16777216 sde 8 80 16777216 sdf 253 2 16777216 dm-2 253 3 16777216 dm-3 [root@x345-tic-30 ~]# multipathd -k"show topo" 1IBM_2810XIV_MN000320016dm-2 IBM,2810XIV [size=16G][features=1 queue_if_no_path][hwhandler=0] \_ round-robin 0 [prio=2][active] \_ 2:0:2:0 sdc 8:32 [active][ready] \_ 3:0:2:0 sde 8:64 [active][ready] 1IBM_2810XIV_MN000320017dm-3 IBM,2810XIV [size=16G][features=1 queue_if_no_path][hwhandler=0] \_ round-robin 0 [prio=2][active] \_ 2:0:2:1 sdd 8:48 [active][ready] \_ 3:0:2:1 sdf 8:80 [active][ready] [root@x345-tic-30 ~]# multipathd -k"list paths" hcil dev dev_t pri dm_st chk_st next_check 0:0:2:0 sdb 8:16 1 [undef] [ready] [orphan] 2:0:2:0 sdc 8:32 1 [active][ready] XXXXXXX... 15/20 2:0:2:1 sdd 8:48 1 [active][ready] XXXXXXX... 15/20 3:0:2:0 sde 8:64 1 [active][ready] XXXXXXX... 15/20 3:0:2:1 sdf 8:80 1 [active][ready] XXXXXXX... 15/20 [root@x345-tic-30 ~]# multipathd -k"list maps status" name failback queueing paths dm-st 1IBM_2810XIV_MN000320016 immediate on 2 active 1IBM_2810XIV_MN000320017 immediate on 2 active
[root@HS20-tic-15 ~]# cat >> /etc/modprobe.conf << EOF > options qla2xxx qlport_down_retry=1 > EOF Now, we have to build a new RAM disk image, so that the driver will be loaded by the operating system loader after a boot. Next, reboot the Linux host as shown in Example 8-22 on page 206.
Example 8-29 Build a new ram disk image
[root@HS20-tic-15 ~]# cd /boot [root@HS20-tic-15 boot]# cp -f initrd-2.6.18-92.el5.img initrd-2.6.18-92.el5.img.bak [root@HS20-tic-15 boot]# mkinitrd -f initrd-2.6.18-92.el5.img 2.6.18-92.el5 [root@HS20-tic-15 boot]# reboot Broadcast message from root (pts/1) (Fri Aug 22 08:47:56 2008): The system is going down for reboot NOW!
[root@HS20-tic-15 boot]# cat /etc/selinux/config # This file controls the state of SELinux on the system. # SELINUX= can take one of these three values: # enforcing - SELinux security policy is enforced. # permissive - SELinux prints warnings instead of enforcing. # disabled - SELinux is fully disabled. SELINUX=disabled # SELINUXTYPE= type of policy in use. Possible values are: # targeted - Only targeted network daemons are protected. # strict - Full SELinux protection. SELINUXTYPE=targeted
211
Add udev rules for the XIV Storage System to set the devices loss timeout to 1 second and the queue depth for the devices to 127, as shown in Example 8-31.
Example 8-31 Add udev rules for the XIV Storage System
[root@HS20-tic-15 ~]# cat > /etc/udev/rules.d/45-xiv-devs.rules << EOF > SUBSYSTEM=="block", ACTION=="add", KERNEL=="sd*[!0-9]", SYSFS{model}=="2810XIV", RUN+="/bin/sh -c 'echo 127 > /sys\$devpath/device/queue_depth'" > SUBSYSTEM=="fc_remote_ports", ACTION=="add", SYSFS{port_name}=="0x5001738000*", RUN+="/bin/sh -c 'echo 1 > /sys\$devpath/dev_loss_tmo'" > EOF Download the udev package from the following Web site: https://launchpad.net/udev/main/ Compile the udev package and install the scsi_id_t10 package as illustrated in Example 8-24 on page 207.
Example 8-32 Compilation of udev and installation of scsi_id_t10
[root@HS20-tic-15 ~]# cd udev-095 [root@HS20-tic-15 udev-095]# cd extras/scsi_id/ [root@HS20-tic-15 scsi_id]# patch -l -p 1 scsi_serial.c << EOF > 40a41 > > { SCSI_ID_T10_VENDOR, SCSI_ID_NAA_DONT_CARE, SCSI_ID_ASCII }, > 59d59 > < { SCSI_ID_T10_VENDOR, SCSI_ID_NAA_DONT_CARE, SCSI_ID_ASCII }, > EOF patching file scsi_serial.c Hunk #2 FAILED at 60. 1 out of 2 hunks FAILED -- saving rejects to file scsi_serial.c.rej [root@HS20-tic-15 scsi_id]# patch -l -p 1 scsi_serial.c << EOF > 40a41 > > { SCSI_ID_T10_VENDOR, SCSI_ID_NAA_DONT_CARE, SCSI_ID_ASCII }, > 59d59 > < { SCSI_ID_T10_VENDOR, SCSI_ID_NAA_DONT_CARE, SCSI_ID_ASCII }, > EOF patching file scsi_serial.c Hunk #2 succeeded at 42 (offset -18 lines). [root@HS20-tic-15 scsi_id]# make -C ../.. EXTRAS=extras/scsi_id USE_STATIC=true 2>&1 | grep -v warning make: Entering directory `/root/udev-095' GENHDR udev_version.h CC udev_device.o CC udev_config.o CC udev_node.o CC udev_db.o CC udev_sysfs.o CC udev_rules.o CC udev_rules_parse.o CC udev_utils.o CC udev_utils_string.o CC udev_utils_file.o CC udev_utils_run.o CC udev_libc_wrapper.o AR libudev.a 212
RANLIB libudev.a CC udev.o LD udev CC udevd.o LD udevd CC udevtrigger.o LD udevtrigger CC udevsettle.o LD udevsettle CC udevcontrol.o LD udevcontrol CC udevmonitor.o LD udevmonitor CC udevinfo.o LD udevinfo CC udevtest.o LD udevtest CC udevstart.o LD udevstart make[1]: Entering directory `/root/udev-095/extras/scsi_id' GENHDR scsi_id_version.h CC scsi_id.o CC scsi_serial.o LD scsi_id make[1]: Leaving directory `/root/udev-095/extras/scsi_id' make: Leaving directory `/root/udev-095' [root@HS20-tic-15 scsi_id]# /bin/cp -f scsi_id /lib/udev/scsi_id_t10 [root@HS20-tic-15 scsi_id]# ln -s -f /lib/udev/scsi_id_t10 /sbin [root@HS20-tic-15 scsi_id]# cd ../../.. [root@HS20-tic-15 ~]# /bin/rm -rf udev-095
root@HS20-tic-15 ~]# chkconfig --add multipathd [root@HS20-tic-15 ~]# chkconfig --level 2345 multipathd on [root@HS20-tic-15 ~]# /bin/cp -p /etc/multipath.conf /etc/multipath.conf.`date +%d-%m-%Y.%H:%M:%S` [root@HS20-tic-15 ~]# cat > /etc/multipath.conf << EOF > blacklist { > device { > vendor "IBM-ESXS" > } > device { > vendor "LSILOGIC" > } > device { > vendor "ATA" > } > > }
Chapter 8. OS-specific considerations for host connectivity
213
> > > > > > > > > > > > > >
devices { device { vendor "IBM" product "2810XIV" selector "round-robin 0" path_grouping_policy multibus rr_min_io 1000 getuid_callout "/sbin/scsi_id_t10 -g -u -s /block/%n" path_checker tur failback immediate no_path_retry queue } } EOF
[root@HS20-tic-15 ~]# cat /sys/class/fc_host/host1/port_name 0x210000e08b853458 [root@HS20-tic-15 ~]# cat /sys/class/fc_host/host2/port_name 0x210100e08ba53458
Create and map two volumes to the Linux host, as described in 5.5, Host definition and mappings on page 113.
[root@HS20-tic-15 ~]# cat /proc/scsi/scsi Attached devices: Host: scsi0 Channel: 00 Id: 00 Lun: 00 Vendor: IBM-ESXS Model: ST973401LC FN Type: Direct-Access Host: scsi1 Channel: 00 Id: 00 Lun: 00 Vendor: IBM Model: 2810XIV-LUN-0 Type: RAID Host: scsi1 Channel: 00 Id: 01 Lun: 00 Vendor: IBM Model: 2810XIV-LUN-0 Type: RAID Host: scsi1 Channel: 00 Id: 02 Lun: 00 Vendor: IBM Model: 2810XIV Type: Direct-Access Host: scsi1 Channel: 00 Id: 02 Lun: 01 Vendor: IBM Model: 2810XIV Type: Direct-Access Host: scsi2 Channel: 00 Id: 00 Lun: 00 Vendor: IBM Model: 2810XIV
Rev: B41D ANSI SCSI revision: 04 Rev: 10.0 ANSI SCSI revision: 05 Rev: 10.0 ANSI SCSI revision: 05 Rev: 10.0 ANSI SCSI revision: 05 Rev: 10.0 ANSI SCSI revision: 05 Rev: 10.0
214
Type: Direct-Access Host: scsi2 Channel: 00 Id: 00 Lun: 01 Vendor: IBM Model: 2810XIV Type: Direct-Access Host: scsi2 Channel: 00 Id: 01 Lun: 00 Vendor: IBM Model: 2810XIV-LUN-0 Type: RAID Host: scsi2 Channel: 00 Id: 02 Lun: 00 Vendor: IBM Model: 2810XIV-LUN-0 Type: RAID [root@HS20-tic-15 ~]# cat /proc/partitions major minor #blocks name
ANSI SCSI revision: 05 Rev: 10.0 ANSI SCSI revision: 05 Rev: 10.0 ANSI SCSI revision: 05 Rev: 10.0 ANSI SCSI revision: 05
8 0 71687000 sda 8 1 104391 sda1 8 2 71577607 sda2 8 16 16777216 sdb 8 32 16777216 sdc 8 48 16777216 sdd 8 64 16777216 sde 253 0 69533696 dm-0 253 1 2031616 dm-1 253 2 16777216 dm-2 253 3 16777216 dm-3 [root@HS20-tic-15 ~]# multipathd -k"show topo" 1IBM_2810XIV_MN0003201AAdm-2 IBM,2810XIV [size=16G][features=1 queue_if_no_path][hwhandler=0] \_ round-robin 0 [prio=2][active] \_ 1:0:2:0 sdb 8:16 [active][ready] \_ 2:0:0:0 sdd 8:48 [active][ready] 1IBM_2810XIV_MN0003201ABdm-3 IBM,2810XIV [size=16G][features=1 queue_if_no_path][hwhandler=0] \_ round-robin 0 [prio=2][active] \_ 1:0:2:1 sdc 8:32 [active][ready] \_ 2:0:0:1 sde 8:64 [active][ready] [root@HS20-tic-15 ~]# multipathd -k"list paths" hcil dev dev_t pri dm_st chk_st next_check 1:0:2:0 sdb 8:16 1 [active][ready] X......... 2/20 1:0:2:1 sdc 8:32 1 [active][ready] X......... 2/20 2:0:0:0 sdd 8:48 1 [active][ready] X......... 2/20 2:0:0:1 sde 8:64 1 [active][ready] X......... 2/20 [root@HS20-tic-15 ~]# multipathd -k"list maps status" name failback queueing paths dm-st 1IBM_2810XIV_MN0003201AA immediate on 2 active 1IBM_2810XIV_MN0003201AB immediate on 2 active
215
[root@x345-tic-30 ~]# rpm -ivh iscsi-initiator-utils-6.2.0.868-0.7.el5.i386.rpm warning: iscsi-initiator-utils-6.2.0.868-0.7.el5.i386.rpm: Header V3 DSA signature: NOKEY, key ID 37017186 Preparing... ########################################### [100%] 1:iscsi-initiator-utils ########################################### [100%] [root@x345-tic-30 ~]# chkconfig --add iscsi [root@x345-tic-30 ~]# chkconfig --level 2345 iscsi on
216
[root@x345-tic-30 ~]# iscsiadm -m discovery -t sendtargets -p 9.155.50.27 9.155.50.27:3260,2 iqn.2005-10.com.xivstorage:000050 9.155.50.34:3260,4 iqn.2005-10.com.xivstorage:000050 192.168.1.1:3260,1 iqn.2005-10.com.xivstorage:000050 192.168.1.2:3260,3 iqn.2005-10.com.xivstorage:000050 192.168.1.3:3260,5 iqn.2005-10.com.xivstorage:000050 [root@x345-tic-30 ~]# iscsiadm -m discovery -t sendtargets -p 9.155.50.34 9.155.50.27:3260,2 iqn.2005-10.com.xivstorage:000050 9.155.50.34:3260,4 iqn.2005-10.com.xivstorage:000050 192.168.1.1:3260,1 iqn.2005-10.com.xivstorage:000050 192.168.1.2:3260,3 iqn.2005-10.com.xivstorage:000050 192.168.1.3:3260,5 iqn.2005-10.com.xivstorage:000050 [root@x345-tic-30 ~]# cat /etc/iscsi/initiatorname.iscsi InitiatorName=iqn.1994-05.com.redhat:92d581f8b30 Create and map two volumes to the Linux host, as described in 5.5, Host definition and mappings on page 113. Now, reboot the host and you will see disks dm-4 and dm-5 as illustrated in Example 8-38.
Example 8-38 iSCSI multi-pathing output
[root@x345-tic-30 ~]# multipathd -k"show topo" 1IBM_2810XIV_MN000320016dm-2 IBM,2810XIV [size=16G][features=1 queue_if_no_path][hwhandler=0] \_ round-robin 0 [prio=2][active] \_ 2:0:2:0 sdc 8:32 [active][ready] \_ 3:0:2:0 sde 8:64 [active][ready] 1IBM_2810XIV_MN000320017dm-3 IBM,2810XIV [size=16G][features=1 queue_if_no_path][hwhandler=0] \_ round-robin 0 [prio=2][active] \_ 2:0:2:1 sdd 8:48 [active][ready] \_ 3:0:2:1 sdf 8:80 [active][ready] 1IBM_2810XIV_MN000320052dm-4 IBM,2810XIV [size=16G][features=1 queue_if_no_path][hwhandler=0] \_ round-robin 0 [prio=2][enabled] \_ 5:0:0:0 sdh 8:112 [active][ready] \_ 4:0:0:0 sdg 8:96 [active][ready] 1IBM_2810XIV_MN000320053dm-5 IBM,2810XIV [size=16G][features=1 queue_if_no_path][hwhandler=0] \_ round-robin 0 [prio=2][enabled] \_ 5:0:0:1 sdj 8:144 [active][ready] \_ 4:0:0:1 sdi 8:128 [active][ready] [root@x345-tic-30 ~]# multipathd -k"list paths" hcil dev dev_t pri dm_st chk_st next_check 0:0:2:0 sdb 8:16 1 [undef] [ready] [orphan] 2:0:2:0 sdc 8:32 1 [active][ready] XXX....... 7/20 2:0:2:1 sdd 8:48 1 [active][ready] XXX....... 7/20 3:0:2:0 sde 8:64 1 [active][ready] XXX....... 7/20 3:0:2:1 sdf 8:80 1 [active][ready] XXX....... 7/20 4:0:0:0 sdg 8:96 1 [active][ready] XXX....... 7/20
217
5:0:0:0 sdh 8:112 1 [active][ready] XXX....... 7/20 4:0:0:1 sdi 8:128 1 [active][ready] XXX....... 7/20 5:0:0:1 sdj 8:144 1 [active][ready] XXX....... 7/20 [root@x345-tic-30 ~]# multipathd -k"list maps status" name failback queueing paths dm-st 1IBM_2810XIV_MN000320016 immediate on 2 active 1IBM_2810XIV_MN000320017 immediate on 2 active 1IBM_2810XIV_MN000320052 immediate on 2 active 1IBM_2810XIV_MN000320053 immediate on 2 active
Customizing /kernel/drv/scsi_vhci.conf
Add the following lines to the /kernel/drv/scsi_vhci.conf and enable STMS by entering stmsboot -e as shown in Example 8-39. For detailed information about STMS, refer to: http://dlc.sun.com/pdf/819-5604-17/819-5604-17.pdf
Example 8-39 Multi-pathing configuration and enabling STMS
bash-3.00# cat >> /kernel/drv/scsi_vhci.conf << EOF > device-type-scsi-options-list = > "IBM 2810XIV", "symmetric-option"; > symmetric-option = 0x1000000; > EOF bash-3.00# stmsboot -e WARNING: stmsboot operates on each supported multipath-capable controller detected in a host. In your system, these controllers are 218
IBM XIV Storage System: Concepts, Architecture, and Usage
/devices/pci@8,600000/SUNW,qlc@1/fp@0,0 /devices/pci@8,600000/SUNW,qlc@2/fp@0,0 /devices/pci@9,600000/SUNW,qlc@2/fp@0,0 If you do NOT wish to operate on these controllers, please quit stmsboot and re-invoke with -D { fp | mpt } to specify which controllers you wish to modify your multipathing configuration for. Do you wish to continue? [y/n] (default: y) y Checking mpxio status for driver fp Checking mpxio status for driver mpt WARNING: This operation will require a reboot. Do you want to continue ? [y/n] (default: y) y The changes will come into effect after rebooting the system. Reboot the system now ? [y/n] (default: y) y
The first two HBAs in our example are the SG-XPCI1FC-QL2 HBAs, and the third HBA is a controller for the local disks. Create and map two volumes to the Solaris host, as described in 5.5, Host definition and mappings on page 113. You now see the two volumes and two paths for WWPN 5001738000320190 on controller c2 and 5001738000320170 on controller c3. Refer to Example 8-41. To use the disk, the disks must be labelled using the format command. If the disks are visible via the cfgadm command and do not show up when using the format command, enter the devfsadm vCc disks command to clean up and repopulate the /dev namespace.
Example 8-41 Discovery of the volumes
bash-3.00# cfgadm -lao show_FCP_dev Ap_Id Type c1 fc-private c1::500000e010183a51,0 disk c1::500000e0102e9dd1,0 disk c2 fc-fabric c2::5001738000320190,0 disk c2::5001738000320190,1 disk c2::5001738000cb0161 unavailable c2::5001738000cb0181,0 array-ctrl c3 fc-fabric c3::5001738000320170,0 disk
Receptacle connected connected connected connected connected connected connected connected connected connected
Occupant configured configured configured configured configured configured unconfigured configured configured configured
Condition unknown unknown unknown unknown unknown unknown failed unknown unknown unknown 219
c4t001738000032014Ad0: configured with capacity of 15.98GB c4t001738000032014Bd0: configured with capacity of 15.98GB
AVAILABLE DISK SELECTIONS: 0. c1t0d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424> /pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w500000e0102e9dd1,0 1. c1t1d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424> /pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w500000e010183a51,0 2. c4t001738000032014Ad0 <IBM-2810XIV-10.0 cyl 2046 alt 2 hd 128 sec 128> /scsi_vhci/ssd@g001738000032014a 3. c4t001738000032014Bd0 <IBM-2810XIV-10.0 cyl 2046 alt 2 hd 128 sec 128> /scsi_vhci/ssd@g001738000032014b Specify disk (enter its number): 2 selecting c4t001738000032014Ad0 [disk formatted] Disk not labeled. Label it now? y
FORMAT MENU: disk type partition current format repair label analyze defect backup verify save inquiry volname !<cmd> quit format> disk
select a disk select (define) a disk type select (define) a partition table describe the current disk format and analyze the disk repair a defective sector write label to the disk surface analysis defect list management search for backup labels read and display labels save new disk/partition definitions show vendor, product and revision set 8-character volume name execute <cmd>, then return
AVAILABLE DISK SELECTIONS: 0. c1t0d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424> /pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w500000e0102e9dd1,0 1. c1t1d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424> /pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w500000e010183a51,0 2. c4t001738000032014Ad0 <IBM-2810XIV-10.0 cyl 2046 alt 2 hd 128 sec 128> /scsi_vhci/ssd@g001738000032014a 3. c4t001738000032014Bd0 <IBM-2810XIV-10.0 cyl 2046 alt 2 hd 128 sec 128> /scsi_vhci/ssd@g001738000032014b Specify disk (enter its number)[2]: 3
220
selecting c4t001738000032014Bd0 [disk formatted] Disk not labeled. Label it now? y format> quit
bash-3.00# iscsiadm add discovery-address 9.155.50.27:3260 bash-3.00# iscsiadm list discovery-address -v 9.155.50.27:3260 Discovery Address: 9.155.50.27:3260 Target name: iqn.2005-10.com.xivstorage:000050 Target address: 192.168.1.1:3260, 1 Target name: iqn.2005-10.com.xivstorage:000050 Target address: 9.155.50.27:3260, 2 Target name: iqn.2005-10.com.xivstorage:000050 Target address: 192.168.1.2:3260, 3 Target name: iqn.2005-10.com.xivstorage:000050 Target address: 9.155.50.34:3260, 4 Target name: iqn.2005-10.com.xivstorage:000050 Target address: 192.168.1.3:3260, 5 bash-3.00# iscsiadm add static-config iqn.2005-10.com.xivstorage:000050,9.155.50.27:3260 bash-3.00# iscsiadm add static-config iqn.2005-10.com.xivstorage:000050,9.155.50.34:3260 bash-3.00# iscsiadm list static-config Static Configuration Target: iqn.2005-10.com.xivstorage:000050,9.155.50.27:3260 Static Configuration Target: iqn.2005-10.com.xivstorage:000050,9.155.50.34:3260 bash-3.00# iscsiadm modify discovery --static enable bash-3.00# iscsiadm list initiator-node
Chapter 8. OS-specific considerations for host connectivity
221
Initiator node name: iqn.1986-03.com.sun:01:0003ba4dbd8a.489acacf Initiator node alias: v480-1 Login Parameters (Default/Configured): Header Digest: NONE/Data Digest: NONE/Authentication Type: NONE RADIUS Server: NONE RADIUS access: unknown Configured Sessions: 1
bash-3.00# devfsadm -i iscsi bash-3.00# format Searching for disks...done c4t0017380000320056d0: configured with capacity of 15.98GB c4t0017380000320057d0: configured with capacity of 15.98GB
AVAILABLE DISK SELECTIONS: 0. c1t0d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424> /pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w500000e0102e9dd1,0 1. c1t1d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424> /pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w500000e010183a51,0 2. c4t001738000032014Ad0 <IBM-2810XIV-10.0 cyl 2046 alt 2 hd /scsi_vhci/ssd@g001738000032014a 3. c4t001738000032014Bd0 <IBM-2810XIV-10.0 cyl 2046 alt 2 hd /scsi_vhci/ssd@g001738000032014b 4. c4t0017380000320056d0 <IBM-2810XIV-10.0 cyl 2046 alt 2 hd /scsi_vhci/ssd@g0017380000320056 5. c4t0017380000320057d0 <IBM-2810XIV-10.0 cyl 2046 alt 2 hd /scsi_vhci/ssd@g0017380000320057 Specify disk (enter its number): 4 selecting c4t0017380000320056d0 [disk formatted] Disk not labeled. Label it now? y
128 sec 128> 128 sec 128> 128 sec 128> 128 sec 128>
FORMAT MENU: disk type partition current format repair label analyze defect backup 222
select a disk select (define) a disk type select (define) a partition table describe the current disk format and analyze the disk repair a defective sector write label to the disk surface analysis defect list management search for backup labels
read and display labels save new disk/partition definitions show vendor, product and revision set 8-character volume name execute <cmd>, then return
AVAILABLE DISK SELECTIONS: 0. c1t0d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424> /pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w500000e0102e9dd1,0 1. c1t1d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424> /pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w500000e010183a51,0 2. c4t001738000032014Ad0 <IBM-2810XIV-10.0 cyl 2046 alt 2 hd /scsi_vhci/ssd@g001738000032014a 3. c4t001738000032014Bd0 <IBM-2810XIV-10.0 cyl 2046 alt 2 hd /scsi_vhci/ssd@g001738000032014b 4. c4t0017380000320056d0 <IBM-2810XIV-10.0 cyl 2046 alt 2 hd /scsi_vhci/ssd@g0017380000320056 5. c4t0017380000320057d0 <IBM-2810XIV-10.0 cyl 2046 alt 2 hd /scsi_vhci/ssd@g0017380000320057 Specify disk (enter its number)[4]: 5 selecting c4t0017380000320057d0 [disk formatted] Disk not labeled. Label it now? y format>quit
128 sec 128> 128 sec 128> 128 sec 128> 128 sec 128>
8.5 VMware
The XIV Storage System currently supports the VMware high-end virtualization solution Virtual Infrastructure 3 and the included VMware ESX Server 3.5. At the time of writing this book, SAN boot was supported. Details about the supported VMware ESX server versions, supported SAN boot environments, and REMOTE boot via iSCSI can be found at the System Storage Interoperation Center (SSIC) Web site: http://www.ibm.com/systems/support/storage/config/ssic/displayesssearchwithoutjs.w ss?start_over=yes
223
Our environment to prepare the examples presented in the remainder of this section consisted of an IBM BladeCenter HS20 equipped with QMC 2462 HBAs and running VMware ESX Server 3.5 Update 2.
For detailed information about how to now use these LUNs with virtual machines, refer to the VMware Guides, available at the following Web sites: http://www.vmware.com/pdf/vi3_35/esx_3/r35u2/vi3_35_25_u2_admin_guide.pdf http://www.vmware.com/pdf/vi3_35/esx_3/r35u2/vi3_35_25_u2_3_server_config.pdf
224
3. Add Network for VMkernel. 4. Enable the iscsi Software Adapter. 5. Configure the host, volumes, and host mapping in the IBM XIV. 6. Discover the iSCSI targets.
Firewall settings
Before you start configuring the iSCSI storage, make sure that the firewall settings on the VMware host allow the software iSCSI client to connect to the iSCSI target. The firewall settings can be found in the Configuration tab under Security Profile. Verify that the Software iSCSI client is enabled as shown in Figure 8-23.
225
2. Click Add Networking to initialize the Add Network Wizard. The Add Network Wizard is shown in Figure 8-25.
3. Select VMkernel as the Connection Type and click Next. The next Add Network Wizard window is displayed as shown in Figure 8-26 on page 227.
226
4. Select an unused network adapter to create a new virtual switch as illustrated in Figure 8-26, and click Next. 5. Fill in the Connection settings, such as the network address and the subnet mask for the VMkernel as shown in Figure 8-27 on page 228, and click Next.
227
6. The next window shows a summary of your settings. If correct, click Finish (refer to Figure 8-28).
228
The Networking panel now shows an additional virtual switch (vSwitch) for the VMkernel as illustrated in Figure 8-29.
Click Properties (Figure 8-30) to display the iSCSI Initiator (iSCSI Software Adapter) Properties panel as shown in Figure 8-31 on page 230.
229
2. In the iSCSI initiator pane under the General tab (Figure 8-31), click Configure to enable the iSCSI Software Adapter as illustrated in Figure 8-32.
Figure 8-32 iSCSI Software Adapter general properties: Enable iSCSI initiator
3. Check Enabled and click OK. A message will appear as shown in Figure 8-33.
Figure 8-33 iSCSI Software Adapter general properties: Enable iSCSI initiator message
230
4. Because with VMware ESX Server 3.5, it is no longer necessary to define a service console network port on the same vSwitch as the VMkernel, click No (Figure 8-33 on page 230). The General tab now displays the iSCSI initiator properties (iSCSI name and alias) as shown in Figure 8-34.
5. Close the Properties panel, and the iSCSI Software Adapter is now visible as vmhba32 as illustrated in Figure 8-35.
231
2. Click Add and define the IP addresses for the iSCSI ports on the XIV as depicted in Figure 8-37.
Figure 8-37 iSCSI Software Adapter Dynamic Discovery: Add Send Targets Server
After adding all iSCSI ports available on the XIV Storage System, they will be visible in the Dynamic Discovery tab as displayed in Figure 8-38 on page 233.
232
3. Create and map two volumes to the VMware ESX host, as described in 5.5, Host definition and mappings on page 113. After a rescan on the ESX host, you will see the two volumes as LUN 1 and 2 over two paths. This information is visible in the Configuration tab of the Storage Adapter panel for vmhba32 as illustrated in Figure 8-39 on page 234. According to the SCSI standard, XIV Storage System maps itself in every map to LUN 0. This LUN serves as the well known LUN for that map, and the host can issue SCSI commands to that LUN, which are not related to any specific volume.
233
234
Chapter 9.
Performance characteristics
The XIV Storage System is a high performing disk storage subsystem. We have described in Chapter 2, XIV logical architecture and concepts on page 7 the XIV Storage Systems massive parallelism, disk utilization, and unique caching algorithms. These characteristics, inherent to the system design, guarantee optimized, consistent performance. As increased stress is applied to the system, the XIV Storage System maintains a consistent performance level. This chapter further explores the concepts behind the high performance, provides the best practices recommendations when connecting to an XIV Storage System, and explains how to use the statistics monitor that is provided by the XIV Storage System.
235
Memory
PCI-e Bus
236
Having a large pipe permits the XIV Storage System to have small cache pages. Indeed, a large pipe between the disk and the cache allows the system to perform many small requests in parallel, thus improving the performance. If the pipe were small, the system has to serialize the requests or attempt to group small requests together to maximize the data throughput between the disk and the cache. Therefore, a large pipe enables the XIV system to manage many small tasks that can be accommodated by small cache pages. A Least Recently Used (LRU) algorithm is the basis for the cache management algorithm. Combined with the small cache pages, the LRU algorithm becomes very efficient. This feature allows the system to generate a high hit ratio for frequently utilized data. In other words, the efficiency of the cache usage for small transfers is very high, when the host is accessing the same data set. Due to the efficiency of the cache, the prefetching algorithm is very aggressive. The algorithm starts with a small number of pages and gradually increase the number of pages prefetched until an entire partition, 1 MB, is read into cache. Specifically, the algorithm starts with two pages (8 KB). If the access results in a cache hit, the algorithm doubles the amount of data prefetched into the system. In this example, the next prefetch requests four pages, 16 KB, of data. The algorithm continues to double the prefetch size until a cache miss occurs, or the prefetch size maximum of 1 MB is obtained. Because the modules are managed independently if a prefetch crosses a module boundary, then the logically adjacent module (for that volume) is notified in order to begin pre-staging the data into its local cache.
237
disk drive. With the XIV Storage System, the rebuild is not focused on one disk, instead the work is spread across all the disks in the system. By spreading the work across all the disks, each disk is performing a small percentage of work, therefore the impact to the host is minimal. After the disk is repaired, the system enters a redistribution phase. In the background, the system slowly moves data back onto to the new disk, which causes the new disk to be heavily utilized as the data is written to it. Due to the background work, the host encounters a small impact to performance.
Resynchronization is the process of establishing the connection to the remote system after a link failure. In this situation, only the modified data is transferred to the remote XIV Storage System in order to speed up the recovery process. Therefore, recovery time is dependent upon the amount of data that has been changed between the time of the link failure and the time that the recovery process completes.
238
Similarly, if you connect multiple hosts and have multiple connections, make sure to spread all of the connections evenly across the Interface Modules.
239
the system experiences better performance. If the transfer is smaller than the maximum host transfer size, the host only transfers the amount of data that it has to send. Refer to Chapter 8, OS-specific considerations for host connectivity on page 179 or vendor hardware manuals for queue depth recommendations. Due to the distributed nature of the XIV Storage System, high performance is achieved by parallelism. Specifically, the system maintains a high level of performance as the number of parallel transactions occur to the volumes. Ideally, the host workload can be tailored to use multiple threads or spread the work across multiple hosts.
240
Select Statistics from the Monitor menu as shown in Figure 9-3 to display the Monitor default view that is shown in Figure 9-4 on page 242. Figure 9-4 on page 242 shows the system IOPS for the past 24 hours: The X-axis of the graph represents the time and can vary from minutes to months. The Y-axis of the graph is the measurement selected. The default measurement is IOPS. The statistics monitor also illustrates latency and bandwidth.
241
The other options in the statistics monitor act as filters for separating data. These filters are separated by the type of transaction (reads or writes), cache properties (hits compared to misses), or the transfer size of I/O as seen by the XIV Storage System. Refer to Figure 9-5 for a better view of the filter pane.
The filter pane allows you to select multiple items within a specific filter, for example, if you want to see reads and writes separated on the graph. By holding down Ctrl on the keyboard and selecting the read option and then the write option, you can witness both items displayed on the graph.
242
As shown in Figure 9-6, one of the lines represents the reads and the other line represents the writes. On the GUI, these lines are drawn in separate colors to differentiate the metrics. This selection process can be performed on the other filter items as well.
In certain cases, the user needs to see multiple graphs at one time. On the right side of the filter pane, there is a selection to add graphs (refer to Figure 9-5 on page 242). Up to four graphs are managed by the GUI. Each graph is independent and can have separate filters. Figure 9-7 on page 244 illustrates this concept. The top graph is the IOPS for the day with the reads and writes separated. The second graph displays the bandwidth for several minutes with reads and writes separated, which provides quick and easy access to multiple views of the performance metrics.
243
There are several additional filters available, such as filtering by host, volumes, or targets. These items are defined on the left side of the filter pane. When clicking one of these filters, a dialog window appears. Highlight the item that needs to be filtered and then click Click to select. It moves the highlighted item to the lower half of the dialog box. In order to generate the graph, you must click the green check mark located on the lower right side of the dialog box. Your new graph is generated with the name of the filter at the top of the graph. Refer to Figure 9-8 on page 245 for an example of this filter.
244
On the left side of the chart in the blue bar, there are several tools to assist you in managing the data. The top two tools (magnifying glasses) zoom in and out for the chart, and the second set of two tools adjusts the X-axis and the Y-axis for the chart. Finally, the bottom two tools allow you to export the data to a comma-separated file or print the chart to a printer. Figure 9-9 shows the chart toolbar in more detail.
245
C:\XIV>xcli -c MN00033 time_list Time Date Time Zone Daylight Saving Time 00:48:15 2008-07-29 US/Arizona no After the system time is obtained, the statistics_get command can be formatted and issued. The statistics_get command requires several parameters to operate. The command requires that you enter a starting or ending time point, a count for the number of intervals to collect, the size of the interval, and the units related to that size. The TimeStamp is modified from the previous time_list command. Example 9-2 provides a description of the command.
Example 9-2 The statistics_get command format
statistics_get < start=TimeStamp | end=TimeStamp > count=N interval=IntervalSize resolution_unit=< minute|hour|day|week|month > To further explain this command, assume that you want to collect 10 intervals, and each interval is for one minute. The point of interest occurred at 28 July 28 2008 starting at 25 minutes after 00 hours. It is important to note the statistics_get command allows you to gather the performance data from any time period. The time stamp is formatted as YYYY-MM-DD:hh:mm:ss, where the YYYY represents a four digit year, MM is the two digit month, and DD is the two digit day. After the date portion of the time stamp is specified, you specify the time, where hh is the hour, mm is the minute, and ss represents the seconds. In order to save the data, you have to redirect the output to a file for post-processing. Example 9-3 shows an example of this command, and Figure 9-10 on page 247 shows an example of the output of the statistics. The output displayed is a small portion of the data provided.
Example 9-3 The statistics_get command example
246
Extending this example, assume that you want to filter out a specific host defined in the XIV Storage System. By using the host filter in the command, you can specify for which host you want to see performance metrics, which allows you to refine the data that you are analyzing. Refer to Example 9-4 for an example of how to perform this operation and Figure 9-11 for a sample of the output for the command.
Example 9-4 The statistics_get command using the host filter
C:\XIV>xcli -c MN00033 statistics_get start=2008-07-28.00:25:00 host=23a5372 count=10 interval=1 resolution_unit=minute > data.out
Figure 9-11 Output from the statistics_get command using the host filter
In addition to the filter just shown, the statistics_get command is capable of filtering iSCSI names, host worldwide port names (WWPNs), volume names, modules, and many more fields. As an additional example, assume you want to see the workload on the system for a specific module. The module filter breaks out the performance on the specified module. Example 9-5 pulls the performance statistics for module 6 during the same time period of the previous examples.
Example 9-5 The statistics_get command using the module filter
C:\XIV>xcli -c MN00033 statistics_get start=2008-07-28.00:25:00 module=6 count=10 interval=1 resolution_unit=minute > data.out
247
Figure 9-12 Output from statistics_get command using the module filter
248
10
Chapter 10.
Monitoring
This chapter describes the various methods and functions that are available to monitor the XIV Storage System. It also shows how you can gather information from the system in real time, in addition to the self-monitoring, self-healing, and automatic alerting function implemented within the XIV software. Furthermore, this chapter also discusses the Call Home function and remote support and repair.
249
250
Status bar indicators located at the bottom of the window indicate the overall operational levels of the XIV Storage System: The first indicator on the left shows the amount of soft or hard storage capacity currently allocated to Storage Pools and provides alerts when certain capacity thresholds are reached. As the physical, or hard, capacity consumed by volumes within a Storage Pool passes certain thresholds, the color of this meter indicates that additional hard capacity might need to be added to one or more Storage Pools. Clicking the icon on the right side of the indicator bar that represents up and down arrows will toggle the view between hard and soft capacity. Our example indicates that the system has a usable hard capacity of 79113 GB, of which 84% or 66748 GB are actually used. You can also get more detailed information and perform more accurate capacity monitoring by looking at Storage Pools (refer to 5.3.1, Managing Storage Pools with XIV GUI on page 94). The second indicator in the middle, displays the number of I/O operations per second (IOPS). The third indicator on the far right shows the general system status and, for example, indicates when a redistribution is underway. In our example, the general system status indicator shows that the system is undergoing a Rebuilding phase, which was triggered because of a failing disk (Disk 7 in Module 7) as shown in Figure 10-2.
251
Monitoring events
To get to the Events window, select Events from the Monitor menu as shown in Figure 10-3. Extensive information and many events are logged by the XIV Storage System. The system captures entries for problems with various levels of severity, including warnings and other informational messages. These informational messages include detailed information about logins, configuration changes, and the status of attached hosts and paths. All of the collected data can be reviewed in the Events window that is shown in Figure 10-3.
Because many events are logged, the number of entries is typically huge. To get a more useful and workable view, there is an option to filter the events logged. Without filtering the events, it is extremely difficult to find the entries for a specific incident or information. Figure 10-4 shows the possible filter options for the events.
If you double-click a specific event in the list, you can get more detailed information about that particular event, along with a recommendation about what eventual action to take. Figure 10-5 on page 253 show details for a critical event where a module failed. For this type of event, you must immediately contact IBM XIV support.
252
Event severity
The events are classified into a level of severity depending on their impact on the system. Figure 10-6 gives an overview of the criteria and meaning of the various severity levels.
Severity:
= Critical = Major = Minor = Warning = Informational The Events are categorized in these five categories. Informational event is for information only without any impact or danger for system operation Warning information for the user that something in the system has changed but no impact for the system Minor an event occurred where a part has failed but system is still fully redundant and has no operational impact Major an event has occurred where a part has failed and the redundancy is temporary affected. (ex: failing disk) Critical an event has occurred where one or more parts have failed and the redundancy and machine operation can be affected.
Event configuration
The events monitor window offers a Configuration option in the Toolbar (refer to Figure 10-7) to let the configuration call home notifications and rules for specific events. Clicking the Configuration icon starts the Configuration Wizard, which guides you through the settings to define rules.
For further information about event notification rules, refer to 10.2, Call Home and remote support on page 273.
253
Monitoring statistics
The Statistics monitor, which is shown in Figure 10-8, provides information about the performance and workload of the IBM XIV.
There is flexibility in how you can visualize the statistics. Options are selectable from a control pane located at the bottom of the window, which is shown in Figure 10-9.
For detailed information about performance monitoring, refer to 9.3, Performance statistics gathering with XIV on page 240.
254
System monitoring
Several XCLI commands are available for system monitoring. We illustrate several commands next. For complete information about these commands, refer to the XCLI Users Guide, which is available at: http://publib.boulder.ibm.com/infocenter/ibmxiv/r2/index.jsp The state_list command, shown Example 10-1, gives an overview of the general status of the system. In the example, the system is operational, and no shutdown is pending.
Example 10-1 The state_list command
xcli> XCLI -c clr26 state_list Command completed successfully system_state off_type=off safe_mode=no shutdown_reason=No Shutdown system_state=on target_state=on In Example 10-2, the system_capacity_list command shows an overview of used and free capacity system-wide. In the example, the usable capacity is 79113 GB, with 11355 GB of free capacity. The difference of these two values provides the capacity used.
Example 10-2 The system_capacity_list command
xcli> XCLI -c clr26 system_capacity_list Soft Hard Free Hard Free Soft Spare Modules Spare Disks Target Spare Modules 79113 79113 11355 11355 1 3 1 In Example 10-3, the version_get command displays the current version of the XIV code installed on the system. Knowing the current version of your system assists you in determining when upgrades are required.
Example 10-3 The version_get command
C:\XIV>xcli -c clr26 version_get Version XIV_SYS-10.0-P0803 In Example 10-4 on page 256, the time_list command is used to retrieve the current time from the XIV Storage System. This time is normally set at the time of installation. Knowing the current system time is required when reading statistics or events. In certain cases, the system time might differ from the current time (at the users location), and therefore, knowing when something occurred according to the system time assists with debugging issues. In the example provided, the current system time and current user time are displayed.
255
C:\XIV>xcli -c clr26 time_list & date /T & time /T Time Date Time Zone Daylight Saving Time 08:57:15 2008-08-19 GMT no Tue 08/19/2008 10:58 AM
xcli> XCLI -c clr26 component_list filter=NOTOK Component ID Status Currently Functioning 1:Disk:4:9 Failed no Shown in Example 10-6, the disk_list command provides more in-depth information for any individual disk in the XIV Storage System, which might be helpful in determining the root cause of a disk failure. If the command is issued without the disk parameter, all the disks in the system are displayed.
Example 10-6 The disk_list command
C:\XIV>xcli -c clr26 disk_list disk=1:Disk:13:11 Component ID Status Currently Functioning Capacity (GB) 1:Disk:13:11 Failed yes 1TB C:\XIV>xcli -c clr26 disk_list disk=1:Disk:13:10 Component ID Status Currently Functioning Capacity (GB) 1:Disk:13:10 OK yes 1TB
Target Status
Target Status
In Example 10-7, the module_list command displays details about the modules themselves. If the module parameter is not provided, all the modules are displayed. In addition to the status of the module, the output describes the number of disks, number of FC ports, and number of iSCSI ports.
Example 10-7 The module_list command
C:\XIV>xcli -c clr26 module_list module=1:Module:4 Component ID Status Currently Functioning Type Data Disks 1:Module:4 OK yes p10hw_auxiliary 12 4
FC Ports 0
iSCSI Ports
In Example 10-8 on page 257, the ups_list command describes the current status of the Uninterruptible Power Supply (UPS) component. It provides details about when the last test was performed and the results. Equally important is the current battery charge level. A non-fully charged battery can be a cause of problems in case of power failure.
256
Example 10-9 shows the switch_list command that is used to display the current status of the switches.
Example 10-9 The switch_list command
The psu_list command that is shown in Example 10-10 lists all the power supplies in each of the modules. There is no option to display an individual Power Supply Unit (PSU).
Example 10-10 The psu_list command
C:\XIV>xcli -c clr26 psu_list Component ID Status Currently Functioning 1:PSU:1:1 OK yes 1:PSU:1:2 OK yes 1:PSU:2:1 OK yes 1:PSU:2:2 OK yes 1:PSU:3:1 OK yes 1:PSU:3:2 OK yes 1:PSU:4:1 OK yes 1:PSU:4:2 OK yes 1:PSU:6:1 OK yes 1:PSU:6:2 OK yes 1:PSU:7:1 OK yes 1:PSU:7:2 OK yes 1:PSU:9:1 OK yes 1:PSU:9:2 OK yes 1:PSU:10:1 OK yes 1:PSU:10:2 OK yes 1:PSU:11:1 OK yes 1:PSU:11:2 OK yes 1:PSU:12:1 OK yes 1:PSU:12:2 OK yes 1:PSU:13:1 OK yes 1:PSU:13:2 OK yes 1:PSU:14:1 OK yes 1:PSU:14:2 OK yes 1:PSU:15:1 OK yes 1:PSU:15:2 OK yes 1:PSU:8:1 OK yes 1:PSU:8:2 OK yes 1:PSU:5:1 OK yes 1:PSU:5:2 OK yes
Hardware Status OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
257
Events
Events can also be handled with XCLI. Several commands are available to list, filter, close, and send notifications for the events. There are many commands and parameters available. You can obtain detailed and complete information in the IBM XIV XCLI User Manual. We only illustrate here several options of the event_list command.
Example 10-11 The event_list command
xcli> XCLI -c clr26 event_list max_events=100 Index Code Severity Timestamp 38454 HOST_MULTIPATH_OK Informational 2008-08-11 38473 UNMAP_VOLUME Informational 2008-08-11 38549 HOST_DISCONNECTED Informational 2008-08-11 38542 SES_POWER_SUP_FAI Major 2008-07-23 39539 MODULE_FAILED Critical 2008-07-29
Alert no no no no no
Several parameters can be used to sort and filter the output of the event_list command. Refer to Table 10-1 for a list of the most commonly used parameters.
Table 10-1 The event_list command parameters Name max_events after before min_severity alerting Description Lists a specific number of events Lists events after the specified date and time Lists events before specified date and time Lists events with the specified and higher severities Lists events for which an alert was sent or for which no alert was sent Lists events for which an alert was cleared or for which the alert was not cleared Syntax and example <event_list max_events=100> <event_list after=2008-08-11 04:04:27> <event_list before 2008-08-11 14:43:47> <event_list min_severity=major> <event_list alerting=no> <event_list alerting=yes> <event_list cleared=yes> <event_list cleared=no>
cleared
These parameters can be combined for better filtering. In Example 10-12, two filters were combined to limit the amount of information displayed. The first parameter max_events only allows three events to be displayed. The second parameter is the date and time that the events must not exceed. In this case, the next closest event to the date provided is event 573. The event occurred approximately 12 minutes before the cutoff time.
Example 10-12 The event_list command with max_events and date filter
C:\XIV>xcli -c clr26 event_list max_events=3 Index Code Severity Cleared 573 DISK_FINISHED_PHASEIN Informational 574 DISK_FINISHED_PHASEIN Informational 575 DISK_FINISHED_PHASEIN Informational
before=2008-08-11.14:56:14 Timestamp Alerting 2008-08-11 14:44:11 2008-08-11 14:44:11 2008-08-11 14:44:11 no no no yes yes yes
258
The event list can also be filtered on severity. Example 10-13 displays all the events in the system that contain a severity level of Major and all higher levels, such as Critical.
Example 10-13 The event_list command filtered on severity
Certain events generate an alert message and do not stop until the event has been cleared. These events are called alerting events and can be viewed by the GUI or XCLI with a separate command. After the alerting event is cleared, it is removed from this list, but it is still visible with the event_list command.
Example 10-14 The event_list_uncleared command
Monitoring statistics
The statistics gathering mechanism is a powerful tool. The XIV Storage System continually gathers performance metrics and stores them internally. Using the XCLI, data can be retrieved and filtered by using many metrics. Example 10-15 provides an example of gathering the statistics for 10 days, with each interval covering an entire day. The system is given a time stamp as the ending point for the data. Due to the magnitude of the data being provided, it is best to redirect the output to a file for further post-processing. Refer to Chapter 9, Performance characteristics on page 235 for a more in-depth view of performance.
Example 10-15 The statistics_get command
C:\XIV>xcli -c clr26 statistics_get count=10 interval=1 resolution_unit=day end=2008-08-11.14:56:14 > perf.out The usage_get command is a powerful tool to provide details about the current utilization of pools and volumes. The system saves the usage every hour for later retrieval. This command works the same as the statistics_get command. You specify the time stamp to begin or end the collection and the number of entries to collect. In addition, you need to specify the pool name or the volume name.
Example 10-16 The usage_get command by pool
C:\XIV>xcli -c clr26 Time 2008-08-20 00:00:00 2008-08-20 01:00:00 2008-08-20 02:00:00 2008-08-20 03:00:00 2008-08-20 04:00:00
usage_get pool=WindowsPool max=10 start=2008-08-20.00:00:00 Volume Usage (MB) Snapshot Usage (MB) 32768 0 32768 0 32768 0 32768 0 32768 0
Chapter 10. Monitoring
259
0 0 0 0 0
Note that the usage is displayed in MB. Example 10-17 shows that the volume Red_vol_1 is utilizing 78 MB of space. The time when the data was written to the device is also recorded. In this case, the host wrote data to the volume for the first time on 14 August 2008.
Example 10-17 The usage_get command by volume
C:\XIV>xcli -c clr26 Time 2008-08-14 05:00:00 2008-08-14 06:00:00 2008-08-14 07:00:00 2008-08-14 08:00:00 2008-08-14 09:00:00 2008-08-14 10:00:00 2008-08-14 11:00:00 2008-08-14 12:00:00 2008-08-14 13:00:00 2008-08-14 14:00:00
usage_get vol=Red_vol_1 max=10 start=2008-08-14.05:00:00 Volume Usage (MB) Snapshot Usage (MB) 0 0 0 0 0 0 69 0 69 0 78 0 78 0 78 0 78 0 78 0
SNMP agent
An SNMP agent is a daemon process that provides access to the MIB objects on IP hosts on which the agent is running. An SNMP agent, or daemon, is implemented in the IBM XIV software and provides access to the MIB objects defined in the system. The SNMP daemon can send SNMP trap requests to SNMP managers to indicate that a particular condition exists on the agent system, such as the occurrence of an error.
SNMP manager
An SNMP manager can be implemented in two ways. An SNMP manager can be implemented as a simple command tool that can collect information from SNMP agents. An SNMP manager also can be composed of multiple daemon processes and database applications. This type of complex SNMP manager provides you with monitoring functions using SNMP. It typically has a graphical user interface for operators. The SNMP manager gathers information from SNMP agents and accepts trap requests sent by SNMP agents. In 260
IBM XIV Storage System: Concepts, Architecture, and Usage
addition, the SNMP manager generates traps when it detects status changes or other unusual conditions while polling network objects. IBM Director is an example of an SNMP manager with a GUI interface.
SNMP trap A trap is a message sent from an SNMP agent to an SNMP manager without a specific
request from the SNMP manager. SNMP defines six generic types of traps and allows you to define enterprise-specific traps. The trap structure conveys the following information to the SNMP manager: Agents object that was affected IP address of the agent that sent the trap Event description (either a generic trap or enterprise-specific trap, the including trap number) Time stamp Optional enterprise-specific trap identification List of variables describing the trap
SNMP communication
The SNMP manager sends SNMP get, get-next, or set requests to SNMP agents, which listen on UDP port 161, and the agents send back a reply to the manager. The SNMP agent can be implemented on any kind of IP host, such as UNIX workstations, routers, network appliances, and also on the XIV Storage System. You can gather various information about the specific IP hosts by sending the SNMP get and get-next requests, and you can update the configuration of IP hosts by sending the SNMP set request. The SNMP agent can send SNMP trap requests to SNMP managers, which listen on UDP port 162. The SNMP trap requests sent from SNMP agents can be used to send warning, alert, or error notification messages to SNMP managers. Figure 10-10 on page 262 illustrates the characteristics of SNMP architecture and communication.
261
IBM Director
Listen/Replies on UDP 162
SNMP MANAGER
Management Information Base (MIB)
IP Network
You can configure an SNMP agent to send SNMP trap requests to multiple SNMP managers.
262
3. The Define Destination dialog is now open. Enter a Destination Name (a unique name of your choice) and the IP or Domain Name System (DNS) of the server where the SNMP Management software is installed. Refer to Figure 10-13 on page 264.
263
4. Click Define to effectively add the SNMP Manager as a destination for SNMP traps. Your XIV Storage System is now set up to send SNMP Traps to the defined SNMP manager. The SNMP Manager software will process the received information (SNMP traps) according to the MIB file.
Management servers: One or more servers on which IBM Director Server is installed Managed systems: Servers, workstations, and any computer managed by IBM Director Management consoles: Servers and workstations, from which you communicate with one
or more IBM Director Servers
SNMP devices: Network devices, disk systems, or computers that have SNMP agents
installed or embedded (such as the XIV Storage System). Figure 10-14 on page 265 depicts a typical IBM Director management environment.
264
Management server TCP/IP Management console -IBM Director Console installed IBM Director Server installed, which installs: -IBM Director Agent -IBM Director Console
TCP/IP
Various protocols
SNMP devices
265
266
2. In the MIB Management window, click File Select MIB to Compile. 3. In the Select MIB to Compile window that is shown in Figure 10-16, specify the directory and file name of the MIB file that you want to compile, and click OK. A status window indicates the progress.
When you compile a new MIB file, it is also automatically loaded in the Loaded MIBs file directory and is ready for use. To load an already compiled MIB file, select: In the MIB Management window, click File Select MIB to load. Select the MIB (that you to load) in the Available MIBs window, click Add, Apply, and OK. This action will load the selected MIB file, and the IBM Director is ready to be configured for monitoring the IBM XIV.
267
In the Discovery Preferences window that is shown in Figure 10-18 on page 269, follow these steps to discover XIV Storage Systems: a. Click the Level 0: Agentless System tab. b. Click Add to bring up a window to specify whether you want to add a single address or an address range. Select Unicast Range. Note: Because each XIV system is presented through three IP addresses, select Unicast Range when configuring the auto-discovery preferences. c. Next, enter the address range for the XIV systems in your environment. You also set the Auto-discover period and the Presence Check period.
268
After you have set up the Discovery Preferences, the IBM Director will discover the XIV Storage Systems and add them to the IBM Director Console as seen in Figure 10-19.
At this point, the IBM Director and IBM Director Console are ready to receive SNMP traps from the discovered XIV Storage Systems. With the IBM Director, you can display general information about your IBM XIVs, monitor the Event Log, and browse more information.
269
270
Event Log
To open the Event Log, right-click the entry corresponding to your XIV Storage System and select Event Log from the pop-up menu that is shown in Figure 10-21.
The Event Log window can be configured to show the events for a defined time frame or to limit the number of entries to display. Selecting a specific event will display the Event Details in a pane on the right side of the window as shown in Figure 10-22.
271
Event actions
Based on the SNMP traps and the events, you can define different Event Actions with the Event Actions Builder as illustrated in Figure 10-23. Here, you can define several actions for the IBM Director to perform in response to specific traps and events. IBM Director offers a wizard to help you define an Event Action Plan. Start the wizard by selecting Tasks Event Action Plans Event Action Plan Wizard in the IBM Director Console window. The Wizard will guide you through the setup. The window in Figure 10-23 shows that the IBM Director will send, for all events, an e-mail (to a predefined e-mail address or to a group of e-mail addresses).
272
From the toolbar, click Configuration to invoke the Configuration wizard. The wizard will guide you through the configuration of Gateways, Destinations, and Rules. The wizard Welcome window is shown in Figure 10-25 on page 274.
273
Gateways
The wizard will first take you through the configuration of the gateways. Click Next to display the Events Configuration - Gateway dialog. The steps are: 1. Click Define Gateway. The Gateway Create Welcome panel shown in Figure 10-26 appears. Click Next.
2. The Gateway Create - Select gateway type panel displays as shown in Figure 10-27 on page 275.
274
3. Type: Here the wizard is asking for the type of the gateway, either SMTP for e-mail notification or SMS if an alert or information will initiate an SMS. Check either SMTP or SMS. 4. The next steps differ for SMTP and SMS. Our illustration from now on is for SMTP. However, the steps to go through for SMS are similarly self-explanatory. Enter: Name: Next, enter the gateway name of the SMTP or SMS Gateway, depending on the previous selected Type. Address: Enter the IP address or DNS name of the SMTP gateway for the gateway address. Sender: In case of e-mail problems, such as the wrong e-mail address, a response e-mail will be sent to this address. You can either enter an address for this server or use the system-wide global address: Use default sender Use new sender address
275
Destinations
Next, the Configuration wizard will guide you through the setup of the destinations where you configure e-mail addresses or SMS receivers. The Welcome panel is displayed. Click Next to proceed. The Select Destination type panel, shown in Figure 10-28, is displayed.
Here you configure: Type: Event notification destination type can be either a destination group (containing other destinations), SNMP manager for sending SNMP traps, e-mail address for sending e-mail notification, or mobile phone number for SMS notification: SNMP EMAIL SMS Group of Destinations
Depending on the selected type, the remaining configuration information required differs but is self-explanatory. Our illustration from now on is for SNMP. Enter: Destination Name: Enter the name of the new destination. Make sure to use a meaningful name, which you can remember. Usually, there will be many destinations in a system. SNMP: Here, you enter either the DNS name or the IP address of the SNMP manager.
Rules
At this point, the rules for the event notification can be defined. The rules are either based on the severity, an event code, or a combination of both the severity and the event code. The Welcome panel is displayed. Click Next. The Rule Create - Rule name panel shown in Figure 10-29 on page 277 is displayed.
276
To define a rule, configure: Rule Name: Enter a name for the new rule. Names are case-sensitive and can contain letters, digits, or the underscore character (_). You cannot use the name of an already defined rule. Rule condition setting: Select the severity if you want the rule to be triggered by severity, event code if you want the rule be triggered by event, or both severity and event code for events that might have multiple severities depending on a threshold of certain parameters: Severity only Event Code only Both severity and event code Select the severity trigger: Select the minimum severity to trigger the rules activation. Events of this severity or higher will trigger the rules. Select the event code trigger: Select the event code to trigger the rules activation. Rule destinations: Select destinations and destination groups to be notified when the events condition occurs. Here, you can select one or more existing destinations or also define a new destination (refer to Figure 10-30).
Rule snooze: Defines whether the system repeatedly alerts the defined destination until the event is cleared. If so, a snooze time must be selected. Either: Check Use snooze timer Snooze time in minutes
277
Rule escalation: Defines the system to send alerts via other rules if the event is not cleared within a certain time. If so, an escalation time and rule must be specified: Check Use escalation rule Escalation Rule Escalation time in minutes Create Escalation Rule
C:\XIV>xcli -c clr26 smtpgw_define smtpgw=test address=test.ibm.com from_address=xiv@us.ibm.com Command executed successfully. C:\XIV>xcli -c clr26 smtpgw_list Name Address Priority relay_de 9.149.165.228 1 test test.ibm.com 2 The SMS gateway is defined in a similar method. The difference is that the fields can use tokens to create variable text instead of static text. When specifying the address to send the SMS message, tokens can be used instead of hard-coded values. In addition, the message body also uses a token to have the error message sent instead of a hard-coded text. Example 10-19 provides an example of defining a SMS gateway. The tokens available to be used for the SMS gateway definition are: {areacode}: This escape sequence is replaced by the destinations mobile or cellular phone number area code. {number}: This escape sequence is replaced by the destinations cellular local number. {message}: This escape sequence is replaced by the text to be shown to the user. \{, \}, \\: These symbols are replaced by the {, } or \ respectively.
Example 10-19 The smsgw_define command
C:\XIV>xcli -c clr26 smsgw_define smsgw=test email_address={areacode}{number}@smstest.ibm.com subject_line="XIV System Event Notification" email_body={message} Command executed successfully.
278
When the gateways are defined, the destination settings can be defined. There are three types of destinations: SMTP or e-mail SMS SNMP Example 10-20 provides an example of creating a destination for all three types of notifications. For the e-mail notification, the destination receives a test message every Monday at 12:00. Each destination can be set to receive notifications on multiple days of the week at multiple times.
Example 10-20 Destination definitions
C:\XIV>xcli -c clr26 dest_define dest=emailtest type=EMAIL email_address=test@ibm.com smtpgws= ALL heartbeat_test_hour=12:00 heartbeat_test_days=Mon Command executed successfully. C:\XIV>xcli -c clr26 dest_define dest=smstest type=SMS area_code=555 number=5555555 smsgws=ALL Command executed successfully. C:\XIV>xcli -c clr26 dest_define dest=snmptest type=SNMP snmp_manager=9.9.9.9 Command executed successfully. C:\XIV>xcli -c clr26 dest_list Name Type Email Address User BladeC4_W2K3 SNMP relay EMAIL moscheka@de.ibm.com emailtest EMAIL test@ibm.com smstest SMS snmptest SNMP
Area Code
Phone Number
555
5555555 9.9.9.9
Finally, the rules can be set for which messages can be sent. Example 10-21 provides two examples of setting up rules. The first rule is for SNMP and e-mail messages and all messages, even informational messages, are sent to the processing servers. The second example creates a rule for SMS messages. Only critical messages are sent to the SMS server, and they are sent every 15 minutes until the error condition is cleared.
Example 10-21 Rule definitions
C:\XIV>xcli -c clr26 rule_create rule=emailtest min_severity=informational dests=emailtest,snmptest Command executed successfully. C:\XIV>xcli -c clr26 rule_create rule=smstest min_severity=critical dests=smstest snooze_time=15 Command executed successfully. C:\XIV>xcli -c clr26 rule_list
279
Name Minimum Severity Event Codes Active Escalation Only markus_rule none all no emailtest Informational no smstest Critical no
Except Codes
Destinations
Example 10-22 shows illustrations of deleting rules, destinations, and gateways. It is not possible to delete a destination if a rule is using that destination, and it is not possible to delete a gateway if a destination is pointing to that gateway.
Example 10-22 Deletion of notification setup
C:\XIV>xcli -c clr26 -y rule_delete rule=smstest Command executed successfully. C:\XIV>xcli -c clr26 -y rule_delete rule=emailtest Command executed successfully. C:\XIV>xcli -c clr26 -y dest_delete dest=smstest Command executed successfully. C:\XIV>xcli -c clr26 -y dest_delete dest=emailtest Command executed successfully. C:\XIV>xcli -c clr26 -y dest_delete dest=snmptest Command executed successfully. C:\XIV>xcli -c clr26 -y smsgw_delete smsgw=test Command executed successfully. C:\XIV>xcli -c clr26 -y smtpgw_delete smtpgw=test Command executed successfully.
280
Remote connection
The Remote support center has two ways to connect the system. Depending on the clients choice, the support specialist can either connect using a modem dial-up connection or, if provided and agreed to by the client, a secure, high-speed connection VPN. These possibilities are depicted in Figure 10-31. In case of problems, the remote specialist is able to analyze problems and also assist an IBM SSR dispatched on-site in repairing the system or in replacing field-replaceable units (FRUs). Remote access is protected by different passwords for different access levels to prevent unauthorized remote access. For details, you can also refer to Chapter 6, Security on page 125.
IB M X IV to ra g e S y s te m
S e c u re h ig h -s p e e d c o n n e c tio n
IN T E R N E T
IN T
RA
NE
T
F ire w a ll USER
VPN
IN
F ire w a ll IB M
TR
AN
ET
P H O N E L IN E
M odem M odem
IB M X IV R e m o te S u p p o rt C e n te r
D ia l u p c o n n e c tio n
To enable remote support, you must allow an external connection, such as either: A telephone line An Internet connection through your firewall that allows IBM to use a VPN connection to your XIV Storage System.
281
IBM XIV send problem notification IBM XIV Support Center USER report problem to IBM
Support Center initiate parts for send on-site and inform CMC (Call management center)
Problem is solved
282
Either a call from the user or an e-mail notification will generate an IBM internal problem record and alert the IBM XIV Support Center. A Support Center Specialist will remotely connect to the system and evaluate the situation to decide what further actions to take to solve the reported issue: Remote Repair: Depending on the nature of the problem, a specialist will fix the problem while connected. Data Collection: Start to collect data in the system for analysis to develop an action plan to solve the problem. On-site Repair: Provide an action plan, including needed parts, to the call management center (CMC) to initiate an IBM SSR repairing the system on-site. IBM SSR assistance: Support the IBM SSR during on-site repair via remote connection. The architecture of the IBM XIV is self-healing. Failing units are logically removed from the system automatically, which greatly reduces the potential impact of the event and results in service actions being performed in a fully redundant state. For example, if a disk drive fails, it will be automatically removed from the system. The process has a minimal impact on performance, because only a small part of the available resources has been removed. The rebuild time is fast, because most of the remaining drives will participate in redistributing the data. Due to this self-healing mechanism, with most failures, there is no need for urgent action and service can be performed at a convenient time. The IBM XIV will be in a fully redundant state, which mitigates issues that might otherwise arise if a failure occurs during a service action.
283
284
11
Chapter 11.
Copy functions
The XIV Storage System has a rich set of copy functions suited for various data protection scenarios, which enables clients to set up their business continuance, data migration, and online backup solutions. This chapter provides an overview of the snapshot and Volume Copy functions for the XIV product. Furthermore, it describes the requirements, application range, and implementation of the two copy functions in the enterprise environment. The Remote Mirror function is covered in Chapter 12, Remote Mirror on page 323.
285
11.1 Snapshots
A snapshot is a point in time copy of a volumes data. The XIV snapshot is based on several innovative technologies to ensure minimal degradation or impact in system performance.
286
The alternative to redirect on write is the copy on write function. Most systems do not move the location of the volume data. Instead, when the disk subsystem receives a change, it copies the volumes data to a new location for the point in time copy. When the copy is complete, the disk system commits the newly modified data. Therefore, each individual modification does take longer, because the entire block must be copied before the change can be made. As the storage assigned to the snapshot is completely utilized, the XIV Storage System implements a deletion mechanism to protect itself from overutilizing the set pool space. It is important to note that the snapshot pool does not need to be full before a deletion occurs. If the space assigned for the snapshots reaches the limitation on a single physical disk, this limit causes the deletion of the entire snapshot to occur on all disks (refer to Figure 11-2). If you know in advance that an automatic deletion is possible, a pool can be expanded to accommodate additional snapshots. This function requires that there is available space on the system for the Storage Pool. Refer to Resizing Storage Pools on page 97 for details about adding more space to an existing Storage Pool.
Snapshot space on a single disk
Snapshot 2 Snapshot 1
Snapshot 3 Snapshot 2
Snapshot free partition
Snapshot 3 allocates a partition and Snapshot 1 is deleted, because there must always be at least one free partition for any subsequent snapshot.
Each snapshot has a deletion priority property that is set by the user. There are four priorities with 1 being the highest priority and 4 being the lowest priority. The system uses this priority to determine which snapshot to delete first. The lowest priority becomes the first candidate for deletion. If there are multiple snapshots with the same deletion priority, the XIV system deletes the snapshot that was created first. Refer to Deletion priority on page 291 for an example of working with deletion priorities. A snapshot also has a unique ability to be unlocked. By default, a snapshot is locked on creation and is only readable. Unlocking a snapshot allows the user to modify the data in the snapshot for post-processing. When unlocked, the snapshot takes on the properties of a volume and can be resized or modified. As soon as the snapshot has been unlocked, the modified property is set. The modified property cannot be reset after a snapshot is unlocked, even if the snapshot is relocked without modification.
Chapter 11. Copy functions
287
In some cases, it might be important to duplicate a snapshot. When duplicating a snapshot, the duplicate snapshot points to the original data and has the same creation date as the original snapshot, if the first snapshot has not been unlocked. This feature can be beneficial when the user wants to have one copy for a backup and another copy for testing purposes. If the first snapshot is unlocked and the duplicate snapshot already exists, the creation time for the duplicate snapshot does not change. The duplicate snapshot points to the original snapshot. If a duplicate snapshot is created from the unlocked snapshot, the creation date is the time of duplication and the duplicate snapshot points at the original snapshot. An application can utilize many volumes on the XIV Storage System, for example, a database application can span several volumes for journaling and user data. In this case, the snapshot for the volumes must occur at the same moment in time so that the journal and data are synchronized. The Consistency Group allows the user to perform the snapshot on all the volumes assigned to the group at the same moment in time, therefore, enforcing data consistency. The XIV Storage System creates a special snapshot related to the Remote Mirroring functionality. During the recovery process of a lost link, the system creates a snapshot of all the volumes in the system. This snapshot is used if the synchronization process fails. The data can be restored to a point of known consistency. A special value of the deletion priority is used to prevent the snapshot from being automatically deleted. Refer to 11.1.4, Snapshot with Remote Mirror on page 310 for an example of this snapshot.
Creating a snapshot
Snapshot creation is a simple and easy task to accomplish. Using the Volumes and snapshots view, right-click the volume and select Create Snapshot. Figure 11-3 depicts how to make a snapshot of volume redbook_markus_01.
288
The new snapshot is displayed in Figure 11-4. The XIV Storage System uses a specific naming convention: The first part is the name of the volume followed by the word snapshot and then a number or count of snapshots for the volume. Also, the snapshot is the same size as the master volume; however, it does not display how much space has been used by the snapshot. From this view shown in Figure 11-4, there are three other details: First is the locked property of the snapshot. By default, a snapshot is locked at the time of creation. Secondly, the modified property is displayed to the right of the locked property. In this example, the snapshot has not been modified. Third, the creation date is displayed. For this example, the snapshot was created on 6 August 2008 at 19:17.
After making a snapshot, the next option is to create a duplicate snapshot for backup purposes. The duplicate has the same creation date as the first snapshot, and it also has a similar creation process. From the Volumes and snapshots view, right-click the snapshot to duplicate. Select Duplicate from the menu to create a new duplicate snapshot. Figure 11-5 provides an example of duplicating the snapshot redbook_markus_01.snapshot_00001.
After selecting Duplicate from the menu, the duplicate snapshot is displayed directly under the original snapshot. Note, the creation date of the duplicate snapshot in Figure 11-6 is the same creation date as the original snapshot. Even though it is not shown, the duplicate snapshot points to the master volume, not the original snapshot.
289
An example of creating a snapshot and a duplicate snapshot with the Extended Command Line Interface (XCLI) is provided in Example 11-1.
Example 11-1 Creating a snapshot and a duplicate with the XCLI
xcli -c MZ_PFE_1 snapshot_create vol=redbook_markus_03 xcli -c MZ_PFE_1 snapshot_duplicate snapshot=redbook_markus_03.snapshot_00003 After the snapshot is created, it needs to be mapped to a host in order to access the data. This action is performed in the same way as mapping a normal volume. Refer to 5.5, Host definition and mappings on page 113 for the process to map a volume to a host. It is important to note that a snapshot is an exact replica of the original volume. Certain hosts do not properly handle having two volumes with the same exact metadata describing them. In these cases, you must map the snapshot to a different host to prevent failures. Creation of a snapshot is only done in the volumes Storage Pool. A snapshot cannot be created in a different Storage Pool than the one that owns the volume. If a volume is moved to another Storage Pool, the snapshots are moved with the volume to the new Storage Pool (provided there is enough space).
290
The GUI displays all the volumes in a list. Scroll down to the snapshot of interest and select the snapshot by clicking the name. Details of the snapshot are displayed in the upper right panel. Looking at the volume redbook_markus_01, it contains a snapshot 00001 and a duplicate snapshot 00002. The snapshot and the duplicate snapshot have the same creation date of 2008-07-30 13:29:35 as shown in Figure 11-8. In addition, the snapshot is locked, it has not been modified, and it has a deletion priority of 1 (which is the highest priority for deletion). Along with these properties, the tree view shows a hierarchal structure of the snapshots. This structure provides details about restoration and overwriting snapshots. Any snapshot can be overwritten by any parent snapshot, and any child snapshot can restore a parent snapshot or a volume in the tree structure. In Figure 11-8, the duplicate snapshot is a child of the original snapshot, or in other words, the original snapshot is the parent of the duplicate snapshot. This structure has nothing to do with how the XIV Storage System manages the pointers with the snapshots but is intended to provide an organizational flow for snapshots.
Example 11-2 is an example of viewing the snapshot data in the XCLI. Due to space limitations, only a small portion of the data is displayed from the output.
Example 11-2 Viewing the snapshots on the XCLI
xcli -c MZ_PFE_1 snapshot_list vol=redbook_markus_03 Name redbook_markus_03.snapshot_00003 redbook_markus_03.snapshot_00004 Size (GB) 17 17 Master Name redbook_markus_03 redbook_markus_03
Deletion priority
Deletion priority enables the user to rank the importance of the snapshots within a pool. For the current example, the duplicate snapshot redbook_markus_01.snapshot_00002 is not as important as the original snapshot redbook_markus_01.snapshot_00001. Therefore, the deletion priority is reduced. If the snapshot space is full, the duplicate snapshot is deleted first even though the original snapshot is older. To modify the deletion priority, right-click the snapshot in the Volumes and snapshots view and select Change Deletion Priority as shown in Figure 11-9 on page 292.
291
After clicking Change Deletion Priority, select the desired deletion priority from the dialog window and accept the change by clicking OK. Figure 11-10 shows the four options that are available for setting the deletion priority. The lowest priority setting is 4, which causes the snapshot to be deleted first. The highest priority setting is 1, and these snapshots are deleted last. All snapshots have a default deletion priority of 1, if not specified on creation.
Figure 11-11 confirms that the duplicate snapshot has had its deletion priority lowered to 4. As shown in the upper right panel, the Delete Priority is reporting a 4 for snapshot redbook_markus_01.snapshot_00002.
To change the deletion priority for the XCLI, just specify the snapshot and new deletion priority, as illustrated in Example 11-3 on page 293.
292
Snapshot restoration
The XIV Storage System provides the ability to restore the data from a snapshot back to the master volume, which can be helpful for operations where data was modified incorrectly, and you want to restore the data. From the Volumes and snapshots view, right-click the snapshot and click Restore. This action causes a dialog box to appear, click OK to perform the restoration. Figure 11-12 illustrates selecting the Restore action on the snapshot redbook_marcus_01.snapshot_00001. After you perform the restore action, you return to the Volumes and snapshots panel. The process is instantaneous, and none of the properties (creation date, deletion priority, modified properties, or locked properties) of the snapshot or the volume have changed. Specifically, the process modifies the pointers to the master volume so that they are equivalent to the snapshot pointer. This change only occurs for partitions that have been modified. On modification, the XIV Storage System stored the data in a new partition and modified the master volumes pointer to the new partition. The snapshot pointer did not change and remained pointing at the original data. The restoration process restores the pointer back to the original data and frees the modified partition space.
The XCLI provides more options for restoration than the GUI. With the XCLI, you can restore a snapshot to a parent snapshot (Example 11-4). The GUI only allows a snapshot to be restored to the master volume. If the target snapshot is not specified, the data is restored to the master volume. In addition, you must specify the -y option when issuing the command, which tells the XCLI to respond affirmatively when prompted for validation to run the command. Important: The XCLI provides more functionality with the snapshots than the GUI. In this case, the XCLI allows the snapshot to be restored to another snapshot and not just the master volume.
Example 11-4 Restoring a snapshot to another snapshot
293
Overwriting snapshots
Certain situations require the snapshot to be refreshed or updated with the latest changes to the data. For instance, a backup application requires the latest copy of the data to perform its backup operation. This operation modifies the pointers to the snapshot data to be reset to the master volume. Therefore, all pointers to the original data are lost, and the snapshot appears as new. From the Volumes and Snapshots view, right-click the snapshot to overwrite. Select Overwrite from the menu, and a dialog box appears. Click OK to validate the overwriting of the snapshot. Figure 11-13 illustrates overwriting the snapshot named redbook_marcus_01.snapshot_00001.
It is important to note that the overwrite process modifies the snapshot properties and pointers when involving duplicates. Figure 11-14 shows two changes to the properties. The snapshot named redbook_marcus_01.snapshot_00001 has a new creation date. The duplicate snapshot still has the original creation date. However, it no longer points to the original snapshot; instead, it points to the master volume according to the snapshot tree, which prevents a restoration of the duplicate to the original snapshot. If the overwrite occurs on the duplicate snapshot, the duplicate creation date is changed, and the duplicate is now pointing to the master volume.
Figure 11-14 Snapshot tree after the overwrite process has occurred
The XCLI performs the overwrite operation through the snapshot_create command. There is an optional parameter in the command to specify which snapshot to overwrite. If the optional parameter is not used, the master volume is overwritten.
Example 11-5 Overwriting a snapshot
Unlocking a snapshot
At certain times, it might be beneficial to modify the data in a snapshot. This feature is useful for performing tests on a set of data or performing other types of data mining activities. There are two scenarios that you need to investigate when unlocking snapshots. The first scenario is to unlock the duplicate. By unlocking the duplicate, none of the snapshot properties are modified, and the structure remains the same. This method is straightforward 294
IBM XIV Storage System: Concepts, Architecture, and Usage
and provides a backup of the master volume along with a working copy for modification. To unlock the snapshot, simply right-click the snapshot and select Unlock, as shown in Figure 11-15.
The results in the Snapshots Tree window show that the Locked property is off and the Modified property is on for redbook_markus_01.snapshot_00002. Even if the volume is relocked or overwritten with the original master volume, the modified property remains on. Also, note that in Figure 11-16, the structure is unchanged; the parent of the duplicate is still redbook_markus_01.snapshot_00001. If an error occurs in the modified duplicate snapshot, the duplicate snapshot can be deleted, and the original snapshot duplicated a second time to restore the information.
For the second scenario, the original snapshot is unlocked and not the duplicate. Figure 11-17 on page 296 shows the new property settings for redbook_markus_01.snapshot.00001. At this point, the duplicate snapshot mirrors the unlocked snapshot, because both snapshots still point to the original data. While the unlocked snapshot is modified, the duplicate snapshot references the original data. If the unlocked snapshot is deleted, the duplicate snapshot remains, and its parent becomes the master volume. Because the hierarchal snapshot structure was unmodified, the duplicate snapshot can be overwritten by the original snapshot. The duplicate snapshot can be restored to the master volume. Based on the results, this process is no different than the first scenario. There is still a backup and a working copy of the data.
295
To unlock a snapshot is the same as unlocking a volume. Again the -y parameter needs to be specified in order to provide an affirmative response to the validation request.
Example 11-6 Unlocking a snapshot
Locking a snapshot
If the changes made to a snapshot need to be preserved, you can lock an unlocked snapshot. Figure 11-18 shows locking the snapshot named redbook_markus_01.snapshot.00001. From the Volumes and snapshots panel, right-click the snapshot to lock and select Lock. The snapshot is locked immediately.
The locking process completes immediately, preventing further modification to the snapshot. In Figure 11-19, the snapshot redbook_markus_01.00001 shows that both the lock property is on and the modified property is on. Even though there has not been a change to the snapshot, the system does not remove the modified property.
296
The XCLI lock command (vol_lock), which is shown in Example 11-7, is almost a mirror operation of the unlock command. Only the actual command changes, but the same operating parameters are used when issuing the command.
Example 11-7 Locking a snapshot
Deleting a snapshot
When a snapshot is no longer needed, you need to delete it. Figure 11-20 illustrates how to delete a snapshot. In this case, the modified snapshot redbook_markus_01.snapshot.00001 is no longer needed. To delete the snapshot, right-click it and select Delete from the menu. A dialog box appears requesting that you validate the operation.
Figure 11-21 no longer displays the snapshot redbook_markus_01.snapshot.00001. Note that the volume and the duplicate snapshot are unaffected by the removal of this snapshot. In fact, the duplicate becomes the child of the master volume. The XIV Storage System provides the ability to restore the duplicate snapshot to the master volume or to overwrite the duplicate snapshot from the master volume even after deleting the original snapshot.
The delete snapshot command (snapshot_delete) operates the same as the creation snapshot. Refer to Example 11-8. The -y parameter needs to be specified so that the validation response is accepted.
Example 11-8 Deleting a snapshot
297
With this scenario, a duplicate does not cause the automatic deletion to occur. Because a duplicate is a mirror copy of the original snapshot, the duplicate does not create the additional allocations in the Storage Pool.
Approximately one minute later, the oldest snapshot (XIV_ORIG_VOL.snapshot_00006) is removed from the display. The Storage Pool is 51 GB in size, with a snapshot size of 34 GB, which is enough for one snapshot (refer to Storage Pool relationships on page 23). If the master volume is unmodified, many snapshots can exist within the pool, and the automatic deletion does not occur. If there were two snapshots and two volumes, it might take longer to cause the deletion, because the volumes utilize different portions of the disks, and the snapshots might not have immediately overlapped. To examine the details of the scenario at the point where the second snapshot is taken, a partition is in the process of being modified. The first snapshot caused a redirect on write, and a partition was allocated from the snapshot area in the Storage Pool. Because the second snapshot occurs at a different time, this action generates a second partition allocation in the Storage Pool space; this second allocation does not have available space and the oldest snapshot is deleted. Figure 11-23 on page 299 shows that the master volume XIV_ORIG_VOL and the newest snapshot XIV_ORIG_VOL.snapshot.00007 are present. The oldest snapshot XIV_ORIG_VOL.snapshot.00006 was removed.
298
To determine the cause of removal, you must go to the Events panel under the System menu. Refer to Chapter 10, Monitoring on page 249 for more details about managing events. As shown on Figure 11-24 on page 300, the event SNAPSHOT_DELETED_DUE_TO_POOL_EXHAUSTION is logged. The snapshot name XIV_ORIG_VOL.snapshot.00006 and time 2008-07-31 15:17:31 are also logged for future reference.
299
300
After selecting the Create option from the menu, a dialog window appears. Enter the name of the Consistency Group. Because the volumes are added during creation, it is not possible to change the pool name. Figure 11-26 shows the process of creating a Consistency Group. After the name is entered, click Create.
Viewing the volumes displays the owning Consistency Group. As in Figure 11-27, the two volumes contained in the xiv_volume_copy pool are now owned by the xiv_db_cg Consistency Group. The volumes are displayed in alphabetical order and do not reflect a preference or internal ordering.
In order to obtain details about the Consistency Group, the GUI provides a panel to view the information. Under the Volumes menu, select Consistency Groups. Figure 11-28 on page 302 illustrates how to access this panel.
301
This selection sorts the information by Consistency Group. The panel allows you to expand the Consistency Group and see all the volumes owned by that Consistency Group. In Figure 11-29, there are two volumes owned or contained by the xiv_db_cg Consistency Group. In this example, a snapshot of the volumes has not been created.
From the Consistency Group view, you can create a Consistency Group without adding volumes. On the menu bar at the top of the window, there is an icon to add a new Consistency Group. By clicking the Add Consistency Group icon shown in Figure 11-30, a creation dialog box appears as shown in Figure 11-26 on page 301. You then provide a name and the Storage Pool for the Consistency Group.
302
When created, the Consistency Group appears in the Consistency Groups view of the GUI (Figure 11-31). The new group does not have any volumes associated to it. A new Consistency Group named xiv_db_cg is created. The Consistency Group cannot be expanded yet, because there are no volumes contained in the Consistency Group xiv_db_cg.
Using the Volumes view in the GUI, select the volumes to add to the Consistency Group. You can select multiple volumes by holding Ctrl down and clicking the desired volumes. After selecting the desired volumes, right-click the volumes and select Add to Consistency Group. Figure 11-32 shows two volumes being added to a Consistency Group: xiv_vmware_1 and xiv_vmware_2.
After selecting the volumes to add, a dialog box appears asking for the Consistency Group to which to add the volumes. Figure 11-33 on page 304 adds the volumes to the xiv_db_cg group. Clicking OK completes the operation.
303
Using the XCLI, the process must be done in two steps. First, create the Consistency Group and then the volumes are added. Example 11-9 provides an example of setting up a Consistency Group and adding volumes using the XCLI.
Example 11-9 Creating Consistency Groups and adding volumes with the XCLI
xcli -c MZ_PFE_1 cg_create cg=xiv_new_cg pool=redbook_markus xcli -c MZ_PFE_1 cg_add_vol cg=xiv_new_cg vol=redbook_markus_03 xcli -c MZ_PFE_1 cg_add_vol cg=xiv_new_cg vol=redbook_markus_04
The new snapshots are created and displayed beneath the volumes in the Consistency Group view (Figure 11-35 on page 305). These snapshots have the same creation date and time. Each snapshot is locked on creation and has the same defaults as a regular snapshot. The snapshots are contained in a group structure (called a snapshot group) that allows all the snapshots to be managed by a single operation.
304
Adding volumes to a Consistency Group does not prevent you from creating a single volume snapshot. If a single volume snapshot is created, it is not displayed in the Consistency Group view. The single volume snapshot is also not consistent across multiple volumes. However, the single volume snapshot does work according to all the previous rules defined by 11.1.2, Volume snapshots on page 288. With the XCLI, when the Consistency Group is set up, it is simple to create the snapshot. One command creates all the snapshots within the group at the same moment in time.
Example 11-10 Creating a snapshot group
305
To obtain details about a Consistency Group, you can select Snapshots Group Tree from the Volumes menu. Figure 11-37 on page 307 shows where to find the group view.
306
From the Snapshots Group Tree view, you can see many details. Select the group to view on the left panel by clicking the group snapshot. The right panes provide more in-depth information about the creation time, the associated pool, and the size of the snapshots. In addition, the Consistency Group view points out the individual snapshots present in the group. Refer to Figure 11-38 on page 308 for an example of the data that is contained in a Consistency Group.
307
To display all the Consistency Groups in the system, issue the cg_list command.
Example 11-11 Listing the Consistency Groups
xcli -c MZ_PFE_1 cg_list Name Group1 EXCH_CLU_CONSGROUP snapshot_test Tie mirror_cg xiv_db_cg MySQL Group xiv_new_cg Pool Name GCHI_THIN_01 GCHI_THIN_01 snapshot_test xiv_pool redbook_mirror xiv_volume_copy redbook_markus redbook_markus
More details are available by viewing all the Consistency Groups within the system that have snapshots. The groups can be unlocked or locked, restored, or overwritten. All the operations discussed in the snapshot section are available with the snap_group operations. Example 11-12 on page 309 illustrates the snap_group_list command.
308
xcli -c MZ_PFE_1 snap_group_list Name xiv_db_cg.snap_group_00001 MySQL Group.snap_group_00001 xiv_new_cg.snap_group_00001 CG xiv_db_cg MySQL Group xiv_new_cg Snapshot Time 2008-08-07 18:59:06 2008-08-08 18:16:53 2008-08-08 20:39:57 Deletion Priority 1 1 1
In order to delete a Consistency Group with the XCLI, you must first remove all the volumes one at a time. As in Example 11-13, each volume in the Consistency Group is removed first. Then, the Consistency Group is available for deletion. As with the GUI, the snapshots do not have to be deleted in order to delete the Consistency Group. Deletion of the Consistency Group does not delete the individual snapshots.
Example 11-13 Deleting a Consistency Group
xcli -c MZ_PFE_1 cg_remove_vol vol=redbook_markus_03 xcli -c MZ_PFE_1 cg_remove_vol vol=redbook_markus_04 xcli -c MZ_PFE_1 cg_delete cg=xiv_new_cg It is also possible to automate the process of removing and deleting the Consistency Group using the XCLI. Working with the Linux version of the XCLI, the data is extracted from the output of the volume list command, and then, a removal process is performed. In Example 11-14, you need to specify the Consistency Group to delete. The Consistency Group name must be unique from the volume names so that the script finds the accurate data.
Example 11-14 Automated Consistency Group deletion script
cg_name=CHRIS1_CG volume=$(xcli -c xiv_esp vol_list | grep $cg_name | awk '{ print $1 }') for v in $volume; do XCLI -c xiv_esp cg_remove_vol vol=$v; done xcli -c xiv_esp cg_delete cg=$cg_name
309
For more details about Remote Mirror, refer to Chapter 12, Remote Mirror on page 323. Important: The special snapshot is created regardless of the amount of pool space on the target pool. If the snapshot causes the pool to be overutilized, the mirror remains inactive. The pool must be expanded to accommodate the snapshot, and then, the mirror can be reestablished.
310
<systems> <system> <name value="XIV V10.0 MN00050"/> <management id="1"> <ip value="9.155.56.100"/> <port value="7778"/> </management> <management id="2"> <ip value="9.155.56.101"/> <port value="7778"/> </management> <management id="3"> <ip value="9.155.56.102"/> <port value="7778"/> </management> <active value="2"/> <serial value="MN00050"/> <last> <noRacks value="1"/> <utilizationP value="84"/> </last> </system> </systems> During the installation of the XIV Storage System API for VSS, the process requests the XML file containing the configuration details. Use the full path of the XML file defined in step one. The Windows server is now ready to perform snapshots on the XIV Storage System. Refer to your application documentation for completing the VSS setup.
311
On the Linux host, the two volumes are mapped onto separate file systems. The first volume xiv_pfe_1 maps to volume redbook_markus_09, and the second volume xiv_pfe_2 maps to redbook_markus_10. These volumes belong to the Consistency Group MySQL Group so that when the snapshot is taken, snapshots of both volumes are taken at the same moment. To perform the backup, you need to configure: The XIV XCLI must be installed on the server. This way, the backup script can invoke the snapshot instead of relying on human intervention. Secondly, the database needs to have the incremental backups enabled. To enable the incremental backup feature, MySQL must be started with the --log-bin feature (Example 11-16). This feature enables the binary logging and allows database restorations.
Example 11-16 Starting MySQL
./bin/mysqld_safe --no-defaults --log-bin=backup The database is installed on /xiv_pfe_1. However, a pointer in /usr/local is made, which allows all the default settings to coexist, and yet the database is stored on the XIV volume. To create the pointer, use the command in Example 11-17. Note, the source directory needs to be changed for your particular installation. You can also install the MySQL application on a local disk and change the default data directory to be on the XIV volume.
Example 11-17 MySQL setup
312
The backup script is simple, and depending on the implementation of your database, the following script might be too simple. However, the following script (Example 11-18) does force an incremental backup and copies the data to the second XIV volume. Then, the script locks the tables so that no more data can be modified. When the tables are locked, the script initiates a snapshot, which saves everything for later use. Finally, the tables are unlocked.
Example 11-18 Script to perform backup
# Report the time of backing up date # First flush the tables this can be done while running and # creates an incremental backup of the DB at a set point in time. /usr/local/mysql/bin/mysql -h localhost -u root -ppassword < ~/SQL_BACKUP # Since the mysql daemon was run specifying the binary log name # of backup the files can be copied to the backup directory on another disk cp /usr/local/mysql/data/backup* /xiv_pfe_2 # Secondly lock the tables so a Snapshot can be performed. /usr/local/mysql/bin/mysql -h localhost -u root -ppassword < ~/SQL_LOCK # XCLI command to perform the backup # ****** NOTE User ID and Password are set in the user profile ***** /root/XIVGUI/xcli -c xiv_pfe cg_Snapshots_create cg="MySQL Group" # Unlock the tables so that the database can continue in operation. /usr/local/mysql/bin/mysql -h localhost -u root -ppassword < ~/SQL_UNLOCK When issuing commands to the MySQL database, the password for the root user is stored in an environment variable (not in the script, as was done in Example 11-18 for simplicity). Storing the password in an environment variable allows the script to perform the action without requiring user intervention. For the script to invoke the MySQL database, the SQL statements are stored in separate files and piped into the MySQL application. Example 11-19 provides the three SQL statements that are issued to perform the backup operation.
Example 11-19 SQL commands to perform backup operation
SQL_BACKUP FLUSH TABLES SQL_LOCK FLUSH TABLES WITH READ LOCK SQL_UNLOCK UNLOCK TABLES Before running the backup script, a test database, which is called redbook, is created. The database has one table, which is called chapter, which contains the chapter name, author, and pages. The table has two rows of data that define information about the chapters in the redbook. Figure 11-42 on page 314 show the information in the table before the backup is performed.
313
Now that the database is ready, the backup script is run. Example 11-20 is the output from the script. Then, the snapshots are displayed to show that the system now contains a backup of the data.
Example 11-20 Output from the backup process
[root@x345-tic-30 ~]# ./mysql_backup Mon Aug 11 09:12:21 CEST 2008 Command executed successfully. [root@x345-tic-30 ~]# /root/XIVGUI/xcli -c xiv_pfe snap_group_list cg="MySQLGroup" Name CG Snapshot Time Deletion Priority MySQL Group.snap_group_00006 MySQL Group 2008-08-11 15:14:24 1 [root@x345-tic-30 ~]# /root/XIVGUI/xcli -c xiv_pfe time_list Time Date Time Zone Daylight Saving Time 15:17:04 2008-08-11 Europe/Berlin yes [root@x345-tic-30 ~]# To show that the restore operation is working, the database is dropped (Figure 11-43 on page 315) and all the data is lost. After the drop operation is complete, the database is permanently removed from MySQL. It is possible to perform a restore action from the incremental backup. For this example, the snapshot function is used to restore the entire database.
314
The restore script, which is shown in Example 11-21, stops the MySQL daemon and unmounts the Linux file systems. Then, the script restores the snapshot and finally remounts and starts MySQL.
Example 11-21 Restore script
[root@x345-tic-30 ~]# cat mysql_restore # This resotration just overwrites all in the database and puts the # data back to when the snapshot was taken. It is also possible to do # a restore based on the incremental data; this script does not handle # that condition. # Report the time of backing up date # First shutdown mysql mysqladmin -u root -ppassword shutdown # Unmount the filesystems umount /xiv_pfe_1 umount /xiv_pfe_2 #List all the snap groups /root/XIVGUI/xcli -c xiv_pfe snap_group_list cg="MySQL Group" #Prompt for the group to restore echo "Enter Snapshot group to restore: " read -e snap_group # XCLI command to perform the backup # ****** NOTE User ID and Password are set in the user profile ***** /root/XIVGUI/xcli -c xiv_pfe snap_group_restore snap_group="$snap_group"
Chapter 11. Copy functions
315
# Mount the FS mount /dev/dm-2 /xiv_pfe_1 mount /dev/dm-3 /xiv_pfe_2 # Start the MySQL server cd /usr/local/mysql ./configure The output from the restore action is shown in Example 11-22.
Example 11-22 Output from the restore script
[root@x345-tic-30 ~]# ./mysql_restore Mon Aug 11 09:27:31 CEST 2008 STOPPING server from pid file /usr/local/mysql/data/x345-tic-30.mainz.de.ibm.com.pid 080811 09:27:33 mysqld ended Name CG Snapshot Time Deletion Priority MySQL Group.snap_group_00006 MySQL Group 2008-08-11 15:14:24 1 Enter Snapshot group to restore: MySQL Group.snap_group_00006 Command executed successfully. NOTE: This is a MySQL binary distribution. It's ready to run, you don't need to configure it! To help you a bit, I am now going to create the needed MySQL databases and start the MySQL server for you. If you run into any trouble, please consult the MySQL manual, that you can find in the Docs directory. Installing MySQL system tables... OK Filling help tables... OK To start mysqld at boot time you have to copy support-files/mysql.server to the right place for your system PLEASE REMEMBER TO SET A PASSWORD FOR THE MySQL root USER ! To do so, start the server, then issue the following commands: ./bin/mysqladmin -u root password 'new-password' ./bin/mysqladmin -u root -h x345-tic-30.mainz.de.ibm.com password 'new-password' Alternatively you can run: ./bin/mysql_secure_installation which also gives the option of removing the test databases and anonymous user created by default. strongly recommended for production servers. See the manual for more instructions. You can start the MySQL daemon with: cd . ; ./bin/mysqld_safe &
This is
316
You can test the MySQL daemon with mysql-test-run.pl cd mysql-test ; perl mysql-test-run.pl Please report any problems with the ./bin/mysqlbug script! The latest information about MySQL is available on the Web at http://www.mysql.com Support MySQL by buying support/licenses at http://shop.mysql.com Starting the mysqld server. You can test that it is up and running with the command: ./bin/mysqladmin version [root@x345-tic-30 ~]# Starting mysqld daemon with databases from /usr/local/mysql/data When complete, the data is restored and the redbook database is available, as shown in Figure 11-44.
317
11.2.1 Architecture
The Volume Copy feature provides an instantaneous copy of data from one volume to another volume. By utilizing the same functionality of the snapshot, the system modifies the target volume to point at the source volumes data. After the pointers are modified, the host has full access to the data on the volume. After the XIV Storage System completes the setup of the pointers to the source data, a background copy of the data is performed. The data is copied from the source volume to a new area on the disk, and the pointers of the target volume are then updated to use this new space. The copy operation is done in such a way as to minimize the impact to the system. If the host performs an update before the background copy is complete, a redirect on write occurs, which allows the volume to be readable and writable before the Volume Copy completes.
318
From the dialog box, select redbook_chris_01 and click OK. The system then asks that you validate the copy action. The XIV Storage System instantly performs the update process and displays a completion message. When the copy process is complete, the volume is available for use. Figure 11-46 provides an example of the volume selection.
To create a Volume Copy with the XCLI, the source and target volumes must be specified in the command. In addition, the -y parameter must be specified to provide an affirmative response to the validation questions.
Example 11-23 Performing a Volume Copy
319
To perform the Volume Copy, use the follow sequence: 1. Validate the configuration for your host. With VMware, you need to ensure that the hard disk assigned to the virtual machine is a Mapped Raw LUN. For a disk directly attached to a server, the SAN boot needs to be enabled, and the target server needs the XIV volume discovered. 2. Shut down the source server or OS. If the source remains active, there might be data in memory that is not synchronized to the disk. If this step is skipped, unexpected results can occur. 3. Perform Volume Copy from the source volume to the target volume. 4. Power on the new system. A demonstration of the process is simple using VMware. Starting with the VMware resource window, power off the virtual machines for both the source and the target.The summary described in Figure 11-48 shows that both XIV Source VM (1), the source, and XIV Source VM (2), the target, are powered off.
Looking at the XIV Storage System before the copy (Figure 11-49), xiv_vmware_1 is mapped to the XIV Source VM (1) in VMware and has utilized 1 GB of space. This information shows that the OS is installed and operational. The second volume xiv_vmware_2 is the target volume for the copy and is mapped to XIV Source VM (2) and is 0 in size. At this point, the OS has not been installed on the virtual machine, and thus, the OS is not usable.
320
Because the virtual machines are powered off, simply initiate the copy process as just described. Selecting xiv_vmware_1 as the source, copy the volume to the target xiv_vmware_2. The copy completes immediately and is available for usage. To verify that the copy is complete, the used area of the volumes must match as shown in Figure 11-50.
After the copy is complete, simply power up the new virtual machine to use the new operating system. Both servers usually boot up normally with only minor modifications to the host. In this example, the server name needed to be changed, because there were two servers on the network with the same name. Refer to Figure 11-51.
Figure 11-52 on page 322 shows the second virtual machine console with the Windows operating system powered on.
321
322
12
Chapter 12.
Remote Mirror
This chapter describes the basic characteristics, the options, and the available interfaces for Remote Mirroring. It also includes step-by-step procedures for setting it up and removing the mirror.
323
As mentioned earlier, synchronous copy ensures that a host write operation is written to both the primary and secondary systems. Synchronous copy issues an acknowledgement of the write to the host (application) only after both copies are written to maintain consistent data. 324
IBM XIV Storage System: Concepts, Architecture, and Usage
The storage system can serve dual functions as both primary and secondary machines at the same time. For example, if System A is primary for Application 1 and System B is the remote copy for Application 1, System B can also be the primary for Application 2 with System A as the remote site for Application 2, providing a bidirectional capability for Remote Mirroring. However, this type of setup is not appropriate for protecting against a complete disaster at the primary site. There are multiple features of Remote Mirroring on the XIV Storage System that include disaster recovery, backups, recovery from single volume media errors, role switchovers, synchronization, and coupling, which will be discussed in detail later in this chapter. The target volumes on the secondary system will need to be created before any mirroring can be configured. This task is usually performed by the storage administrator who will verify that the target volumes are equal in size to the source volumes on the primary system. After a Remote Mirror has been created, it will first have to complete the initialization before it is activated. The initialization process begins by copying all the data from the primary volume to the secondary volume. This initialization is performed only once during the initial setup of the Remote Mirror, and once complete, the volumes are considered synchronized. At this time, Remote Mirroring is considered active and will continue to keep the copies synchronized by writing all data to the primary copy followed by writing the data to the secondary copy and then acknowledging to the host that the write operation has completed.
12.1.2 Boundaries
Currently, the XIV Storage System has the following boundaries or limits: Maximum remote systems: The maximum number of remote systems that can be attached to a single primary is four. Number of Remote Mirrors: The maximum number of Remote Mirrors allowed on a single primary at any one time is 128. Distance: Distance is only limited by the response time of the medium used. Lack of Consistency Groups: There is no support for Consistency Groups within Remote Mirroring. Snapshots: Snapshots are allowed with either the primary or secondary volumes without stopping the mirror.
325
located near each other. Bandwidth considerations must be taken into account when planning the infrastructure to support the Remote Mirroring implementation. Knowing when the peak write rate occurs for systems attached to the storage will help with the planning for the number of paths needed to support the Remote Mirroring function and any future growth plans. There must always be a minimum of two paths configured for redundancy within Remote Mirroring, and these paths must be dedicated for Remote Mirror. When the protocol has been selected, it is time to determine which ports on the XIV Storage System will be used. The port settings are easily displayed using the Extended Command Line Interface (XCLI) command fc_port_list for Fibre Channel or ipinterface_list for iSCSI. This leads us into the second item in the list. Fibre channel paths for Remote Mirroring have slightly more requirements for setup, and we will look at this interface first. As seen in Example 12-1, in the column titled Role, each Fibre Channel port is identified as either a target or initiator. Simply put, a target in a Remote Mirror configuration is the port that will be receiving data from the primary system (also known as the secondary system) while an initiator is the port that will be doing the sending of data (the primary system). In this example, there are three initiators configured. Initiators, by default, are configured on FC:X:4 (X is the module number). In this highlighted example, port 4 in module 6 is configured as the initiator.
Example 12-1 The fc_port_list output command
WWPN
Port ID
0001000D 000000EF 00000000 006D0B13 00FFFFFF 000000EF 000000EF 00681A13 00FFFFFF 00FFFFFF 000000EF 00060213 000000EF 0075000B 00611913 00760000 000000EF 00613913 0075000A 00060214 0001000E 000000EF 000000EF 00711000
Role
5001738000210143 5001738000210142 5001738000210141 5001738000210140 5001738000210163 5001738000210162 5001738000210161 5001738000210160 5001738000210173 5001738000210172 5001738000210171 5001738000210170 5001738000210183 5001738000210182 5001738000210181 5001738000210180 5001738000210193 5001738000210192 5001738000210191 5001738000210190 5001738000210153 5001738000210152 5001738000210151 5001738000210150
Target Target Target Target Initiator Target Target Target Target Target Target Target Initiator Target Target Target Initiator Target Target Target Target Target Target Target
The iSCSI connections are shown in Example 12-2 on page 327 using the command ipinterface_list. The output has been truncated to show just the iSCSI connections in which we are interested here. The command will also display all Ethernet connections and settings. In this example, we have two connections displayed for iSCSI, one connection in module 7 and one connection in module 8.
326
C:\Documents and Settings\Administrator\My Documents\xcli>xcli -c "XIV MN00033" ipinterface_list Name Type IP Address Network Mask Default Gateway MTU Module Ports nextrabam4 iSCSI 19.11.237.208 255.255.254.0 0.0.0.0 9000 1:Module:7 1,2 nextrabam5 iSCSI 19.11.237.209 255.255.254.0 0.0.0.0 9000 1:Module:8 1,2 Alternatively, a single port can be queried by selecting a system in the GUI, followed by selecting Targets Connectivity. Right-click a specific port and select Properties, the output of which is shown in Figure 12-2. This particular port is configured as a target suitable for the secondary port.
Another way to query the port configuration is to select the desired system, click the curved arrow (at the bottom right of the window) to display the ports on the back of the system, and mouse over a port as shown in Figure 12-3 on page 328. This view displays all the information that is shown in Figure 12-2.
327
Similar information can be displayed for the iSCSI connections using the GUI as shown in Figure 12-4. This view can be seen by either right-clicking the Ethernet port (similar to the Fibre Channel port shown in Figure 12-3) or by selecting the system, then selecting Hosts and Luns iSCSI Connectivity. This sequence displays the same two iSCSI definitions that are shown with the XCLI command.
By default, Fibre Channel ports 3 and 4 (target and initiator respectively) from every module are designed to be used for Remote Mirroring. For example, port 4 module 5 (initiator) on the local machine is connected to port 3 module 4 (target) on the remote machine. When setting up a new system, it is best to plan for any Remote Mirroring to reserve these ports for that purpose. In the event that a port role does need to be changed (which is the third item in the list), you can change the port role with both the XCLI and the GUI. Use the XCLI command fc_port_config to change a port as shown in Example 12-3 on page 329. Using the output from fc_port_list, we can get the fc_port name to be used in the command, changing the port role to be either initiator or target as needed.
328
C:\Documents and Settings\Administrator\My Documents\xcli>xcli fc_port_config fc_port=1:FC_Port:4:4 role=initiator Command completed successfully C:\Documents and Settings\Administrator\My Documents\xcli>xcli list Component ID Status Currently Functioning WWPN 1:FC_Port:4:4 OK yes 5001738000210143
-c MN00033
To perform the same function with the GUI, select the primary system and choose Remote Targets Connectivity as shown in Figure 12-5. Alternatively, this same view can be accessed via Hosts and LUNs Targets Connectivity.
With the Targets Connectivity view displayed, select the desired port to change, right-click the port, and select Configure from the pop-up menu, which is shown in Figure 12-6 on page 330.
329
This option opens a configuration window as shown in Figure 12-7, which allows the port to be enabled (or disabled), its role defined as Target, Initiator, or Dual, and finally, the speed for the port configured (Auto, 1Gbps, 2Gbps, or 10Gbps).
Planning for the Remote Mirroring is important when determining how many copy pairs will exist, which is the fourth item for discussion in this section. The current limit on simultaneous copies is 128. In addition, a single primary system is limited to a maximum of four secondary systems. In this configuration, the maximum number of Remote Mirrors allowed is still 128 for the primary. These can be spread across multiple secondaries. Each secondary is also limited to 128 Remote Mirrors (so it can, for example, be a secondary to more than one primary or a secondary to one system and a primary to another). If the XIV Data Migration feature is also used, the combined number of Remote Mirrors and volumes in a data migration is limited to 128 on a single system, which is also covered in Chapter 13, Data migration on page 353.
330
12.1.4 Coupling
There are two ways to configure coupling, which define the action that Remote Mirroring will take upon a failure:
Best effort coupling: Writes to the primary volume will continue when there is a failure between the primary and secondary links. Remote Mirroring is set to an unsynchronized state. All updates to the primary volume are recorded so that only these updates are written to the secondary volume after the problem has been resolved. Mandatory coupling: When a failure in the communication between the primary and
secondary system is detected, all writes to the primary volume are prohibited. This extreme action is taken only when the two systems must be synchronized at all times, and this action is only rarely done. Coupling settings can be changed at anytime, even after a failure. However, a mandatory coupling configuration cannot be deactivated without first changing it to a best effort coupling type. Coupling will always start off in standby mode, which means that the mirror has not yet been activated so no data has been written to the secondary. The primary system will keep track of all changes to the volumes so that at a later time, when the mirroring has been activated, all updates on the primary since creating the mirror will be written to the secondary system as soon as the mirror is activated. This mode can also be used during system maintenance at the secondary so that the primary system will not generate alerts. Coupling can only be removed during standby mode. Transitions between standby and active modes are performed on the primary volume using the XCLI or the GUI. An example of configuring mirroring and coupling is shown in Figure 12-8 and Figure 12-9.
331
12.1.5 Synchronization
The synchronization process runs when the Remote Mirror is first set up, as well as any time that there has been a failure that has been recovered and the mirroring operations are able to continue. Synchronization ensures that the secondary volume will receive all the changes that were done to the primary volume while the coupling was not available. Synchronization consists of the following states:
Initialization: Data from the primary is being copied to the secondary. Synchronized: Both copies are consistent. Timestamp: Taken to keep track of when the coupling has become non-operational. Unsynchronized: Remote Mirroring is non-operational.
Figure 12-10 illustrates the states of initialization from a newly created mirror and synchronized. Figure 12-10 also displays the role of the volume and the volume name on the remote volume, as well as the remote system.
Note: The terms Primary and Secondary denote the possible two mirroring designated roles of a peer. In contrast, the Master and Subordinate terms denote, respectively, the active role of the peer that accepts write requests from hosts, and the active role of the peer that is being synchronized. The purpose of this scheme is to record the original peer roles, while being enabled to change the active role as required.
When a link failure occurs, the primary system must start tracking changes to Remote Mirror source volumes so that these changes can be copied to the secondary once recovered. These changes are known as the uncommitted data. When recovering from a link failure, the best effort coupling will perform the following steps to synchronize the data: If the secondary is still in a synchronized state, a snapshot of the last consistent volume is created and renamed using the time stamp created at the time of the link failure (we discuss more about this last consistent snapshot later). This action allows the use of the secondary volume in the event of a failure at the primary site during the synchronization process. Although the secondary might not have all the updates from the primary, it will at least have a consistent copy to work with. The primary system will synchronize the secondary volume, copying all the uncommitted data. After synchronization is complete between the two systems for all volumes, the primary will delete the snapshot. An example of this process can be seen in 12.1.10, Recovering from a failure on page 351 at the end of this chapter. 332
333
If the link was broken (due to a failure in the physical paths or a loss of the primary system), a snapshot of the last consistent state will exist, as explained in 12.1.5, Synchronization on page 332. The storage administrator has the option of using the most updated version, which might not be consistent or reverting to the last consistent snapshot, which we illustrate later in this chapter. After the old primary system has been recovered, its role will need to be switched so that updates to the new primary can be copied to the old primary. In addition, any data that was not committed from the old primary to the old secondary (new primary) will be in a data list that the old primary will send to the new primary. This list is made up of any writes that occurred on the old primary before the role switchover. It is up to the new primary to synchronize this list with the list of updates that it has been creating to bring the two systems into a synchronized state. To illustrate this process, we use an example of a link failure between the two sites: 1. A link failure occurs between the two sites (A is the primary and B is the secondary). 2. Production writes continue on A, and A starts to generate a list of changes that are being made. 3. Role switchover occurs on B. B now becomes the new primary, and production writes are switched to this site. 4. B now starts to generate a list of changes that need to be copied to the secondary (A). 5. Role switchover occurs on A (A becomes secondary), and the links are recovered. 6. A sends its list of uncommitted updates to B. 7. B merges the two lists, and updates are applied to the primary (B) as needed, as well as sending updates to the secondary (A). 8. The mirror becomes synchronized.
GUI example
At this point, we are continuing after the setup reviewed in 12.1.3, Initial setup on page 325, which assumes that the Fibre Channel ports have been properly defined as source and targets, the Ethernet switch has been updated to jumbo frames, and all the physical paths are in place. Follow these steps: 1. We start by selecting a primary system and then choosing Remote Targets Connectivity as shown in Figure 12-11 on page 335.
334
2. After this option is selected, a new window is displayed as shown in Figure 12-12 on page 336. In this example, we see the available Fibre Channel ports on the primary system. At the top of the window, there are two options: Add Target (+ symbol) and Select Target. Selecting Add Target results in the box, which allows the target system to be defined, as shown in Figure 12-13 on page 336.
335
3. In Figure 12-13, the Target Type is selected from the drop-down list. The two options in this list are Mirroring and Data Migration. Mirroring is the first type in the list and will be the default. Select Mirroring to set up Remote Mirror. The next item required is the Target Name. The drop-down list displays all the systems configured in the GUI. If the target system has not yet been added to the GUI, it will not be displayed in the list.
Choose the desired target after it is displayed in the list. The final required item is the Target Protocol. There are two protocols contained in the drop-down list: FC (Fibre Channel) and iSCSI. In this example, we have chosen FC as the protocol. After the selection has been made, click Define. After the target system has been defined, it will be displayed in the window as shown in Figure 12-14 on page 337.
336
4. The next step is to create the paths (Figure 12-15 on page 338). At this point, we assume that all the physical paths and zoning have been completed.
337
5. To set up the Remote Mirror paths using iSCSI, you must first get the iSCSI name of the secondary system to be used in the setup on the primary. To get the name, select the secondary system in the GUI view and at the top of the window, click Configure System. This option will open a window as shown in Figure 12-16. Copy the entry in the box called iSCSI Name to use later.
6. The next step will be to configure the IP address of the port or ports that will be used for Remote Mirroring. After obtaining the network information for the network administrator (Name, IP, Gateway, and Default Mask), select Hosts and LUNs iSCSI Connectivity. The results are displayed in Figure 12-17 on page 339. This particular example already has two iSCSI connections defined named nextrabam4 and nextrabam5.
338
7. At the top of the window in Figure 12-17 is an option called + Define. Click Define to get the window where you define a new iSCSI connection (Figure 12-18). 8. In Figure 12-18, we see the new interface with all the options defined. There are five required fields for this window: Name, Address, Netmask, Module, and Port Number. The name, address, and netmask were obtained from the network administrator. The module and port numbers are defined based on where the Remote Mirror connections will physically be plugged in. One other important field that needs to be included for Remote Mirror is the Maximum Transmission Unit (MTU). The default listed when opening this window is 4500. Change this value to a higher number for jumbo frames. In this example, we entered 9000.
339
9. After selecting Remote Targets Connectivity, we get the IP panel as shown in Figure 12-19. Once again, select Add Target as we did with Fibre Channel.
10.This process is the same as creating a Fibre Channel target; however, this time we will select iSCSI as the protocol, which prompts us to enter the iSCSI Initiator Name that we found in Figure 12-16 on page 338. Enter this name as shown in Figure 12-20.
11.After selecting Define, we see the Target system now displayed in the window as shown in Figure 12-21 on page 341.
340
12.We are now ready to create the iSCSI path for Remote Mirroring. The ports available for Remote Mirror paths are those shown in white, which have previously been configured with IP information. Click with the mouse and hold the mouse down over the IP Interface in the desired module on the primary system. Drag from this IP Interface to the IP Interface in the target module and then release the left mouse button. This step defines or results in a path as shown in Figure 12-22 on page 342. By right-clicking the path using the mouse, you have the option of deleting, activating, or deactivating the path. Now that the paths are set up, the remaining steps will be the same for both Fibre Channel and iSCSI.
341
We are now ready to create the mirrors. 13.Verify the target pool information on the secondary system by selecting the secondary system in the GUI, and then selecting Pools Storage Pools. This step is shown in Figure 12-23 on page 343.
342
14.After this information is determined, select the primary system in the GUI and then choose Remote Remote Mirroring, which results in the window that is shown in Figure 12-24 on page 344.
343
15.At the top of the window in Figure 12-24, choose Create Mirror. 16.This action will open the Create Mirror configuration window as shown in Figure 12-25 on page 345. In the Master Volume line, select the name of the primary or source volume that will be mirrored to the secondary from the drop-down list. Then, select the Target System in the pull-down list. This list will contain any secondary systems configured in one of the previous steps. Next, enter the slave Pool name, which is the target pool on the secondary system. The Slave Volume name will automatically be filled in and will be the same name as the Master Volume name. This name can be left as the default name or changed. Notice under these lines that there is a box that you can select if the target volume on the secondary system needs to be created. If this box is selected, the Slave Volume name can be any name that we choose, such as leaving the default, appending RM to the default to indicate that it is a Remote Mirror, or any other name meaningful to the storage administrator. If the target volume already exists on the secondary system, this name is the name that needs to be entered on the Slave Volume line, which also means the Create slave box will not be checked. An example of this window is seen in Figure 12-25 on page 345.
344
17.After the mirror has been created, it is displayed in the Remote Mirroring window as seen in Figure 12-26. By default, the mirror is inactive after created. This list includes the name of the source volume on the primary system, its size in gigabytes (GB), the volumes current role (M for master or S for slave), a link to indicate the status of the paths (link up or link down), the state of the mirror (active or deactivated), the status of the mirror (unsynchronized, initializing, or synchronized), the remote volume name, which was just defined, and the name of the secondary system. The GUI window can be resized to better display any of the information in this table if needed.
18.To activate the mirror, right-click anywhere on the link, which will show a context menu as shown in Figure 12-26.
345
19.There are many options in the context menu that is shown in Figure 12-26 on page 345: Configure Mirror (available once the mirror has reached the synchronized state) Delete Mirror (available only if the mirror is not active) Activate (available if the mirror is not active) Deactivate Switch Roles (only available if the mirror is synchronized) Change Roles (the mirror must be deactivated to change roles) Show Target Connectivity (displays the window that shown in Figure 12-22 on page 342 for iSCSI and Figure 12-15 on page 338 for Fibre Channel) Properties (displays all the properties of the mirror) For our purposes, we select Activate to start the initialization of the mirror. Figure 12-27 displays mirrors in different states. We can see that the source volume cirrus12__02 has not been activated yet, volume cirrus12__03 is in the Initialization state, and the remaining volumes are in the Synchronized state.
20.Checking the state of the mirrors on the secondary system shows the status of the mirror as Consistent for those mirrors that are displayed as Synchronized on the primary system, and then the other mirrors are Inactive and in an Initialization state (which is the same as on the primary system). This information is shown in Figure 12-28.
XCLI example
Here, we describe the steps that are required to set up a single Remote Mirror copy using the XCLI: 1. We first need to check the initiator and target configuration of the ports to be used on the primary and secondary systems. Using the XCLI command, fc_port_list, we can see that the system in Example 12-4 on page 347, named XIV1, is configured correctly to be used as the primary (port 4 = initiator) system.
346
2. In Example 12-5, we have the output from the same command for the secondary system, which is also ready to be used (port 3 = target).
Example 12-5 Port listing information on Secondary Storage 1:FC_Port:6:4 1:FC_Port:6:3 1:FC_Port:6:2 1:FC_Port:6:1 1:FC_Port:5:4 1:FC_Port:5:3 1:FC_Port:5:2 1:FC_Port:5:1 1:FC_Port:8:4 1:FC_Port:8:3 1:FC_Port:8:2 1:FC_Port:8:1 1:FC_Port:4:4 1:FC_Port:4:3 1:FC_Port:4:2 1:FC_Port:4:1 1:FC_Port:9:4 1:FC_Port:9:3 1:FC_Port:9:2 1:FC_Port:9:1 1:FC_Port:7:4 1:FC_Port:7:3 1:FC_Port:7:2 1:FC_Port:7:1 OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes 5001738000CB0163 5001738000CB0162 5001738000CB0161 5001738000CB0160 5001738000CB0153 5001738000CB0152 5001738000CB0151 5001738000CB0150 5001738000CB0183 5001738000CB0182 5001738000CB0181 5001738000CB0180 5001738000CB0143 5001738000CB0142 5001738000CB0141 5001738000CB0140 5001738000CB0193 5001738000CB0192 5001738000CB0191 5001738000CB0190 5001738000CB0173 5001738000CB0172 5001738000CB0171 5001738000CB0170 000000E8 000000E8 00120E00 00011100 000000E8 00FFFFFF 00310000 00FFFFFF 00FFFFFF 00FFFFFF 00120D00 00300000 00FFFFFF 00FFFFFF 00300F00 00FFFFFF 00130F00 000000E8 00130D00 00310F00 00120F00 000000E8 00130E00 00011100 Initiator Target Target Target Initiator Target Target Target Initiator Target Target Target Initiator Target Target Target Initiator Target Target Target Initiator Target Target Target
347
3. Next, we check the primary system to determine if any targets have been configured. We use the XCLI command, target_list. Example 12-6 illustrates the expected output if the targets have not yet been configured to the system.
Example 12-6 The target_list command
C:\Documents and Settings\Administrator\My Documents\xcli>xcli -c XIV1 target_list 2008-07-25 11:58:22,265 DEBUG (XIVProperties.java:buildSystemsFromFile.292) Going to build systems from file: C:\Documents and Settings\Administrator\My Docum ents\xcli\xivconfigs.xml No Remote Targets are defined 4. Next, we set up the secondary machine as an available target using the XCLI command, target_mirroring_allow. In Example 12-7, we use the target_list command against the XIV1 system to check for any existing definitions.
Example 12-7 The target_list command
xcli>xcli -c XIV1 target_list No Remote Targets are defined 5. In Example 12-8, we see the output of the same command on a system that has two targets defined. In this case, each target is defined as a different SCSI type (iSCSI and FC).
Example 12-8 The target_list command with defined targets
xcli>xcli -c "XIV MN00033" target_list Name SCSI Type Connected XIV V10.0 MN00035 iSCSI yes prime FC no 6. Next, we will set up the secondary machine as an available target using the XCLI command target_mirroring_allow. In Example 12-9, we see the command executed on the primary system XIV1 using xiv_lab_02 as the secondary system.
Example 12-9 Command to set up secondary machine as target
xcli>xcli -c XIV1 target_mirroring_allow target=xiv_lab_02 Command completed successfully 7. As with the GUI, we must get the iSCSI name of the target system before setting up the Remote Mirror paths. We get this name by executing the XCLI command config_get, as shown in Example 12-10. The output that we will need is indicated in bold in this example. The output from iscsi_name will be used to define the secondary.
Example 12-10 listing iSCSI name
xcli>xcli -c "XIV V10.0 MN00035" config_get Command executed successfully. default_user= dns_primary= dns_secondary= email_reply_to_address= email_sender_address= email_subject_format={severity}: {description} iscsi_name=iqn.2005-10.com.xivstorage:000035 machine_model=A14 machine_serial_number=6000035 348
IBM XIV Storage System: Concepts, Architecture, and Usage
machine_type=2810 ntp_server= snmp_community=XIV snmp_contact=Unknown snmp_location=Unknown system_id=35 system_name=XIV V10.0 MN00035 timezone=0 8. The iSCSI port can then be configured using the command ipinterface_create, which will require the IP address, netmask, module, and port. After this command has been executed, we can list the configured iSCSI target ports on the secondary using the target_port_list command as shown in Example 12-11.
Example 12-11 iSCSI target listed in XCLI
xcli>xcli -c "XIV MN00033" target_port_list target="XIV V10.0 MN00035" Target Name Port Type Active WWPN iSCSI Address Port XIV V10.0 MN00035 iSCSI yes 0000000000000000 9.11.237.157 XIV V10.0 MN00035 iSCSI yes 0000000000000000 9.11.237.158
9. We next need to make sure the MTU settings are correct for both iSCSI interfaces on the primary and secondary systems. We verify the settings by using the command ipinterface_list, which will display all IP and iSCSI interfaces configured. For brevity, we have just shown the output for iSCSI in Example 12-12. The settings for this primary system are correct for each iSCSI interface configured (MTU = 9000). As previously stated, the secondary and Ethernet switch will also require this change to run Remote Mirroring with iSCSI paths.
Example 12-12 Check iSCSI MTU settings
"XIV MN00033" ipinterface_list Type IP Address Network Mask Ports iSCSI 9.11.237.208 255.255.254.0 1,2 iSCSI 9.11.237.209 255.255.254.0 1,2
10.We are now ready to define the secondary system from the primary, which is done by using the target_define command as shown in Example 12-13. In this example, we have specified the protocol as iSCSI. This command can also be executed using the protocol FC, which is shown in Example 12-14. After the target or secondary has been defined with either protocol, the mirrors can be created. No unique commands for iSCSI or FC are required after this point.
Example 12-13 Defining the iSCSI target
xcli>xcli -c "XIV MN00033" target_define target="XIV V10.0 MN00035" protocol=iSCSI Command executed successfully.
Example 12-14 Defining the FC target
xcli>xcli -c "XIV MN00033" target_define target="XIV V10.0 MN00035" protocol=FC Command executed successfully.
349
11.Next, we can define the mirror by using the mirror_create command. In this case, unlike GUI, the volume must exist on the secondary system (GUI gives the option to create the volume on the secondary system). However, later we will show how to create the volume on the secondary system with XCLI. As seen in Example 12-15, we have specified the volume on the primary system (cirrus12__06), the secondary system, and the volume on the secondary (cirrus12__06RM).
Example 12-15 Create a mirror
xcli>xcli -c "XIV MN00033" mirror_create local_volume=cirrus12__06 target="XIV V10.0 MN00035" slave=cirrus12__06RM Command executed successfully. 12.We can list the mirrors at this point using the command mirror_list, which is shown in Example 12-16. In this example, we have several mirrors that are unsynchronized, and the newly created mirror shows its status as Initializing. We now have to activate that new mirror to start the copy by using the mirror_activate command as shown in Example 12-17.
Example 12-16 List mirrors
xcli>xcli -c "XIV MN00033" mirror_activate vol=cirrus12__06 Command executed successfully. 13.All of the Remote Mirror commands executed are logged to the event list and can be viewed with the XCLI command event_list. An example of this event list is shown in Example 12-18. This command is useful to view the commands and when they were issued.
Example 12-18 The event_list in XCLI from the primary system while setting up iSCSI Remote Mirror
TARGET_PORT_ADD yes admin TARGET_ISCSI_CONNECTIVITY_CREATE yes admin TARGET_CONNECTION_ESTABLISHED yes HOST_NO_MULTIPATH_ONLY_ONE_MODULE yes HOST_NO_MULTIPATH_ONLY_ONE_MODULE yes HOST_MULTIPATH_OK yes HOST_MULTIPATH_OK yes
2008-08-19 14:54:36 2008-08-19 14:54:36 2008-08-19 14:54:37 2008-08-19 14:58:03 2008-08-19 14:58:14 2008-08-19 15:00:42 2008-08-19 15:05:36
350
Alternatively, to create the volume on the secondary using XCLI, we need to include two additional options with the mirror_create command. The one piece of information required, as with GUI, is the name of the pool on the secondary system, which can always be found by issuing the XCLI command pool_list. In Example 12-19, we have created another mirror and indicated that the volume on the secondary must be created (create_slave=yes) in the pool called target_test.
Example 12-19 Create Remote Mirror
$ xcli -c "XIV MN00033" mirror_create local_volume=cirrus12__01 target="XIV V10.0 MN00035" slave=cirrus12__01 create_slave=yes remote_pool=target_test Command executed successfully. Using the XCLI commands to set up the mirrors can be extremely convenient if the maximum number of mirrors will be created by putting all the commands into one script for you to execute.
351
352
13
Chapter 13.
Data migration
This chapter introduces the XIV Storage System embedded data migration function.
353
13.1 Overview
As with any data center change, whatever the reason for your data migration, it is preferable to avoid disrupting or disabling active applications where possible. While there are many options available for migrating data off one storage system to another, the XIV Storage System includes a Data Migration feature that enables the easy movement of data, at times large amounts of data, from an existing storage box to the XIV Storage System. This feature enables the production environment to continue functioning during the transfer of data with very little downtime for the business applications. Figure 13-1 illustrates what the data migration environment might look like.
The IBM XIV Data Migration solution offers a smooth data transfer, because it: Enables the immediate connection of a host server to the XIV storage, providing the user with direct access to all the data even before it has been copied to the XIV Storage System Synchronizes between the two storage systems using a transparent copying of the data to the XIV Storage System as a background process with minimal performance impact Supports data migration from any storage vendor Can be set up with either Fibre Channel or iSCSI interfaces During the entire process, the host server is connected to the XIV Storage System. The XIV Storage System handles all read and write requests from the host server, even if the data is not resident on the XIV Storage System. In other words during the data migration, the data transfer is transparent to the host and the data is available for immediate access. The XIV Storage System manages the data migration by simulating host behavior. When connected to the storage device containing the source data, it looks and behaves like an initiator, or host. After the connection is established, the storage device containing the source data believes it is receiving read requests from a host, when in fact the XIV Storage System is reading the data and storing the information within its storage. If the XIV Storage System detects a media error on the non-IBM XIV Storage System, this error is reflected on the XIV Storage System at the same block, even though the block has not actually failed. 354
IBM XIV Storage System: Concepts, Architecture, and Usage
It is important that the connections between the two storage systems remain intact during the entire migration process. If at any time during the migration process, the communication between the storage systems fails, the process will also fail and all writes from the host will also fail. 13.2, Handling I/O requests on page 355 discusses this process in detail. The process of migrating data is performed at a volume level as a background process. As with Remote Mirror, the Data Migration facility has the following boundaries: Maximum of four targets configured for the system, including any combination of Remote Mirroring and Data Migration Maximum number of logical unit numbers (LUNs) that can be configured at any one time between Remote Mirroring and Data Migration is 128 During this chapter discussion, the XIV Storage System is considered the target while the other storage system is known as the source for data migration. This terminology is also used in Remote Mirroring, and both functions share the same terminology for setting up paths for transferring data. To maintain consistency with the way that the commands are used, the source system in a Data Migration scenario is referred to as a target when setting up paths between the XIV Storage System and the storage from which data will be migrated.
Source updating
This method for handling write requests ensures that both storage systems are updated with the writes. The source system remains updated during the migration process, and the two storage systems are identical throughout the background copy process. Similar to synchronous Remote Mirroring, the write commands are only acknowledged from the XIV Storage System to the host after writing to itself, writing to the source array, and receiving an acknowledgement from the source array. An important aspect of selecting this option is that if there is a communication failure between the target and source storage systems or any other error that causes a write to fail to the source system, the XIV Storage System also fails the write operation to the host. By failing the update, the systems are guaranteed to remain consistent. However, if there is a failure between the source and target systems, the host is no longer capable of updating the data on the XIV Storage System. Important: This option might cause an interruption to the production systems in the case of a failure on the source device.
355
No source updating
This method for handling write requests is much more tolerant of communication failures between the two storage systems. In this scenario, the source array is not updated with any write requests from the host system. The source and target arrays are not synchronized at any time during the data migration. An example of selecting which method to use is shown in Figure 13-2. The box must be checked to select source updating, shown here as Keep Source Updated. Without this box checked, write requests are only written to the target system.
356
Host definitions
First, define the host server that is involved in the data migration by using the same methods that you use to define any server attached to the XIV Storage System. Chapter 7, Host connectivity on page 147 describes defining the host server. In addition, the XIV Storage System must be defined on the source storage system as a host. The volumes created on the XIV Storage System for the data migration must be equal in the number of blocks to the volumes on the system being migrated, which is verified upon activation.
Connectivity
The next step is to set up the connectivity between the two storage systems. If the data migration is to be done over iSCSI (the target storage must also support iSCSI), both storage devices must have the ports defined and configured. It is also important to enable jumbo frames on the Ethernet switch to which the systems are connected, as well as on the systems themselves when the ports are configured. If the data migration is to be carried out over Fibre Channel, zones must be created between the two storage systems, as well as the host server to the XIV Storage System. If you use more than one path (recommended for Fibre Channel), create two zones to include one port from each storage system.
After defining the target, you must add ports to the target and then create the paths. Using the GUI, right-click the target system (also referred to as the source storage system) and select Add Port as shown in Figure 13-5 on page 358. After the port is defined, the worldwide port name (WWPN) or iSCSI definition must be included. The example shown in Figure 13-6 on page 358 shows the window where you define the WWPN for a Fibre Channel port definition.
357
Note: The host and data migration paths can be created with either iSCSI or Fibre Channel or mixed as long as only one protocol is used for each path.
358
After creating the definition, we are ready to test the data migration.
After the test completes successfully, we are ready to activate the migration.
13.3.3 Activate
It is now time to connect the host server to the XIV Storage System and activate the data migration. This option is shown in Figure 13-8 for GUI and can also be done using the XCLI command dm_activate. From this point on, with the data migration in progress, all the host servers reads and writes are sent to the XIV Storage System, which can also retrieve and optionally write data to the source storage system.
359
360
Related publications
The publications listed in this section are considered particularly suitable for a more detailed discussion of the topics covered in this book.
Other publications
These publications are also relevant as further information sources: IBM XIV Storage System Installation and Service Manual, GA32-0590 IBM XIV Storage System XCLI Manual, GC27-2213 IBM XIV Storage System Introduction and Theory of Operations, GC27-2214 IBM XIV Storage System Host System, GC27-2215 IBM XIV Storage System Model 2810 Installation Planning Guide, GC52-1327-01 IBM XIV Storage System Pre-Installation Network Planning Guide for Customer Configuration, GC52-1328-01 XCLI Reference Guide, GC27-2213-00 Host System Attachment Guide for Windows- Installation Guide: http://publib.boulder.ibm.com/infocenter/ibmxiv/r2/index.jsp The iSCSI User Guide : http://download.microsoft.com/download/a/e/9/ae91dea1-66d9-417c-ade4-92d824b871 af/uguide.doc AIX 5L System Management Concepts: Operating System and Devices: http://publib16.boulder.ibm.com/pseries/en_US/aixbman/admnconc/hotplug_mgmt.htm #mpioconcepts System Management Guide: Operating System and Devices for AIX 5L: http://publib16.boulder.ibm.com/pseries/en_US/aixbman/baseadmn/manage_mpio.htm Host System Attachment Guide for Linux, which can be found at the XIV Storage System Information Center: http://publib.boulder.ibm.com/infocenter/ibmxiv/r2/index.jsp Sun StorEdge Traffic Manager Software 4.4 Release Notes: http://dlc.sun.com/pdf/819-5604-17/819-5604-17.pdf Fibre Channel SAN Configuration Guide: http://www.vmware.com/pdf/vi3_35/esx_3/r35u2/vi3_35_25_u2_san_cfg.pdf
361
Basic System Administration (VMware Guide): http://www.vmware.com/pdf/vi3_35/esx_3/r35u2/vi3_35_25_u2_admin_guide.pdf Configuration of iSCSI initiators with VMware ESX 3.5 Update 2: http://www.vmware.com/pdf/vi3_35/esx_3/r35u2/vi3_35_25_u2_iscsi_san_cfg.pdf ESX Server 3 Configuration Guide: http://www.vmware.com/pdf/vi3_35/esx_3/r35u2/vi3_35_25_u2_3_server_config.pdf
Online resources
These Web sites are also relevant as further information sources: IBM XIV Storage Web site: http://www.ibm.com/systems/storage/disk/xiv/index.html System Storage Interoperability Center (SSIC): http://www.ibm.com/systems/support/storage/config/ssic/index.jsp SNIA (Storage Networking Industry Association) Web site: http://www.snia.org/ IBM Director Software Download Matrix page: http://www.ibm.com/systems/management/director/downloads.html IBM Director documentation: http://www.ibm.com/systems/management/director/
362
Index
A
Active Directory integration 126 address space 18 admin 86, 91, 127128, 131, 157, 164165 Agent See IBM Director, Agent AIX 179, 193194 fileset 196 supported version 193 alerting event 259 application administrator 127129 user groups 128 applicationadmin 127128, 134 automatic deletion 287288, 298 autonomic features 34 availability 8, 10, 15 connectivity 148150 connectivity adapters 46 Consistency Group 22, 42, 100102, 143, 288, 300301 create 300301 delete 309 expand 302 new snapshots 304 snapshot groups 102 Storage Pool 300, 302 xiv_windows_1 volume 305 context menu 132133, 164, 174175, 345 select events 142 cooling 53 copy functions 285 copy on write 16, 287 copy pairs 330 Core Services 266 coupling 325, 332 created Storage Pool actual size 96
B
background copy 318, 355 backup 285, 288289 backup script 312314 bandwidth 13, 15, 46, 52, 60 Basic configuration 64 battery 48, 256 best-effort coupling 331 block 16, 18, 103, 108, 112 block-designated capacity 18 boot device 53 buffer 57
D
Data Collection 283 data distribution algorithm 47 data integrity 15, 22, 43 data migration 4, 11, 16, 43, 56, 143, 285, 330, 353355 host definitions 357 Data Module 9, 35, 5051, 55, 66, 236 data redundancy 1516, 33 data stripe 237 deamon 260 default IP address 86, 92 gateway 74 definable 56 delete snapshot 287, 297 deletion priority 23, 33, 287288, 290292 depleted capacity 97, 103 depletion 23 destage 22, 36 destination 262, 264, 276 destination type 276 direct connection 154, 162 Director Console 266267, 269 dirty data 3536, 44 Disaster recovery 35, 42 disaster recovery 35, 42, 325, 333 disk drive 57, 93, 236, 238, 283 reliability standards 43 disk scrubbing 43 disk_list 256 ditribution 3739 dm_activate 359
C
cache 89, 11, 46, 5253 buffer 57 page 237 caching management 236 call home 249, 253 capacity 79, 81, 84 depleted 95 unallocated 24, 26, 29 category 126127, 136 CE/SSR 70, 7778, 283 cg_delete 309 cg_list 308 cg_move 101102 client 265266 Command Line Interface 80 Compact Flash Card 53 component_list 256 computing resource 10, 12 configuration flow 84
Index
363
dm_deactivate 359 dm_define 358 dm_list 359 dm_test 359 DM-MPIO 208, 213 DNS 263, 275276 duplicate 288290 duplicate snapshot 288289, 291 creation date 288289, 291 creation time 288
H
hard capacity 79, 81, 84 depletion 33 hard pool size 2728 hard size 94, 98, 107 hard space 23, 25 hard storage capacity 89, 251 hard system size 2829, 32 hard_size 102 hardware 4547 high availability 34, 42 host transfer size 239 host connectivity 55, 149, 152 Host definition 116 host HBA 156, 239 host server 152, 169170, 354355, 357 example power 176 I/O requests 355
E
E-mail notification 85 enclosure management card 53 encryption 282 Ethernet connection 162 Ethernet fabric 89 Ethernet switch 56, 148, 161162, 334, 357 event 252253, 258259 alerting 259 severity 250, 252253, 258259 event code 276 event_list 258259
I
IBM development and support 127 IBM Director 260, 262, 264 Agent 265266 components 264265 Core Services 266 enhanced functionality 265 main component 265 MIB file 266267 Server 264266 IBM Redbooks publication Introduction 155 IBM SSR 6970, 74 IBM System Storage Interoperability Center 151152 IBM XIV 46, 4849, 6465, 251252, 254 data migration solution 354 Data Module 66 FC HBAs 172 final checks 78 hardware 61, 66 hardware component 61 installation 77 internal network 62 iSCSI IPs 172 iSCSI IQN 172 personnel 75 power components 62 rack 77 remote support 70 remote support center 281 repair 61 SATA disks 58 Serial Number 158 software 4, 260, 273 Software System 92
F
failure 324, 331332 fan 53 fc_port_config 328329 fc_port_list 326, 328, 346 Feature Code overview 65 Fibre Channel host I/O 70 Fibre Channel connectivity 55 Fibre Channel port definition 357 Fibre Channel ports 55, 328, 334335 filter 242244 free capacity 97, 99 full installation 82 Function icons 88
G
Gateway 85, 274275, 278 gateway 56 GHz clock 52 GigE adapter 51 given disk drive transient, anomalous service time 44 given XIV Storage System common command execution syntax 91 Goal Distribution 20 goal distribution 3739 priority 38 graceful shutdown 3637 graceful shutdown sequence 36 grid architecture 8, 10, 12 grid topology 13, 34, 36 GUI 8081, 84
364
storage 66, 355, 360 Storage Manager 4 Storage System 70 support 56, 61, 70, 109, 273 Support Center 61, 273 Support Structure 61 technician 38 XCLI User Manual 258 IBM XIV Storage Manager 4, 80, 90, 122 Manager GUI 5 Manager window 131 Subsystem 15 System 7274 System patch panel 170 XCLI 5 IBM XIV Storage System grid overview 12 initialization 325, 346 initiator 325326, 328, 354 Intel Xeon 52 interface 1213 Interface Module 9, 35, 5051, 55, 66, 69, 71, 85, 117, 122, 147149 2-port 4Gb FC PCI Express Adapters 55 FC port number 117 inutoc 196 IOPS 240241, 243 IP address 56, 69, 7374, 86, 160161, 189190, 201, 261, 275276, 349 space 75 ipinterface_list 326, 349 iSCSI 5456, 147149, 354, 357358 connectivity 56 initiator 56 ports 5556 target 56 iSCSI connection 7374, 159, 174, 177, 339 iSCSI connectivity 122, 160161, 163 iSCSI HBA 147, 164, 166 iSCSI host 165, 169, 174 iSCSI name 166167, 338, 348 iscsid.conf 216
local site 325, 332333 lock Snapshot 313 lock_behavior 103 locking 19, 33 logical structure 16, 20, 22 logical volume 34, 15, 18 LUN Map 115, 117 identification number 115 LUNs table 117119
M
MacOS 80, 85 Main display 88 Maintenance Module 49, 61 managed system 265266 See also IBM Director, Agent management console 80, 265 See also IBM Director, Console Management Information Base (MIB) 262 management server 265 See also IBM Director, Server Management workstation 85 management workstation 80, 8586 Mandatory Coupling 333 mandatory coupling 331 mapping 1516, 84, 90, 111 master volume 16, 20, 23, 99, 138, 238, 286, 289, 293 duplicate snapshot 289 maximum volume count 18 memory 5253, 62 Menu bar 88 meta data 51 metadata 9, 286, 290 metrics 240, 243, 247 MIB 262 MIB extensions 262 MIB file 266267 migration 92, 108 migration paths 357358 mirror_create local_volume 350351 Modem 61 modem 273, 281282 module_list 256 monitor statistics 255 monitoring 249251 monolithic architecture 10 monolithic system 11 MTU 56, 85, 327, 339, 349 default 159, 163 maximum 159, 163 multipathing 194, 196, 204 MySQL 308309, 312
J
just-in-time 93
K
KB 16, 237, 286
L
latency 238, 241 left pane 175 link aggregation 56 link aggregation group 56 Linux 179, 204205 queue depth 181, 197198 load balancing 12, 14
N
naming convention 289 Network mask 85
Index
365
O
OK button 120, 292293 On-site Repair 283 original snapshot 288289, 291 duplicate snapshot points 288 mirror copy 298
P
parallelism 8, 10, 14, 47, 60, 62, 235, 238, 240 partition 1617 patch panel 5556, 59, 70, 78, 147149, 250 PCI Express 52, 55 PCI-e 236 performance 235237 metrics 240, 243, 247 phase-out 15, 38, 41 phases-out 34 phone line 61 physical capacity 34, 8, 15, 18, 102 physical disk 1617 pointer 286, 293, 312 pool size 2628, 9698 hard 98 soft 98 pool soft size 26 pool_change_config 101, 103 pool_delete 101102 pool_rename 101102 pool_resize 101102 port configuration 327 port role 328 ports 328, 334335 Power on sequence 37 power outage 49 power supply 48, 50, 54 power-off 49 predefined user admin 129 role 126, 129 prefetch 13 pre-fetching 237 primary site 325, 332333 primary system 325326, 329 source volume 345 source volumes 332 primary volume 20, 325, 331332 problem record 283 pseudo random distribution MB partitions 236 pseudo-random distribution algorithm 16
RAID striping 42 RAM disk 206, 211 raw capacity 46 raw read error count 44 RBAC 126 read command 111 readonly 127128, 135 rebuild 15, 19, 37 redirect on write 16, 20, 286, 298, 318 redirect-on-write 4 redistribution 21, 34, 38, 238 redundancy 7, 1416, 33 redundancy-supported reaction 35, 44 Registered State Change Notification 155 regular pool 24, 26, 28, 98 regular Storage Pool 26, 2930, 107 final reserved space 31 snapshot reserve space 31 remote connection 61 Remote Mirror 35, 151152, 285, 310, 324, 326, 355 initiator 336, 341 target 336, 341 Remote Mirroring 151152 function 326 implementation 326 window 343, 345 remote mirroring 9, 56, 76, 151, 158, 237238, 324325, 328, 360 consistency groups 325 multiple features 325 Remote Repair 283 remote site 324, 333 Remote Support 250, 281 remote support 249, 273, 281 reserve capacity 23, 25, 27 resiliency 7, 14, 34 resize Volume 109 resource utilization 14 resume 23, 37 Right click 90, 132, 327 role 127128 Role Based Access Control 126 role switchover 333334 roles 126128 RSCN 155 rule 277279
S
same LUN 116117, 122, 151 SAN boot 319 SAS adapter 52 SATA disk 46, 5758 scalability 10, 13, 15 script 80, 90, 122 scrubbing 21, 43 secondary site 324, 333 secondary system 325326, 331 same command 347 target pool 342 secondary volume 325, 331332
Q
queue depth 181, 197198
R
rack 4547 rack door 48 RAID 15, 19, 21, 103
366
sector count 44 Secure Sockets Layer 86 security 125126, 137 self-healing 3, 3334, 43 Serial-ATA specification supporting key features 57 Server See IBM Director, Server serviceability 33, 43 severity 250, 252253 shell 80, 101, 112 shutdown sequence 36 sizing 240 SMS 250, 273, 275 message tokens 278 SMS Gateway 275 SMTP 275, 278279 SMTP gateway 144, 275, 278 snap_group 308, 315 Snapshot 16, 23, 33 automatic deletion 287288, 298 creation 288, 307 deletion priority 287, 291292 details 307 duplicate 288290 lock 313 performance 238 restore 315316 size 307 snapshot 13, 16, 19, 285287 delete 287, 297 duplicate 288290 locked 287, 289 naming convention 289 reserve capacity 23, 25, 27 snapshot group 304305, 307 snapshot_delete 297 SNMP 260262 destination 262, 264, 276 SNMP agent 260261 SNMP communication 261 SNMP manager 75, 262, 264 SNMP trap 260261 SNMP traps 262, 264, 269 soft capacity 79, 81, 84 soft pool size 2728 soft size 27, 94, 9798 soft system size 2829, 32 soft_size 102 software services 12 software upgrade 84 Solaris 179, 218219 iSCSI 218, 221 source updating 355356, 360 space depletion 23 space limit 18 spare capacity 2021, 41 spare disk 237 SSL 86 standby mode 331, 333
state 109, 111, 117 state_list 255 static allocation 107, 110 statistics 235, 240241 monitor 255 statistics_get 246248 statistics_get command 246247 Status bar 89, 251 STMS 218 storage administrator 1415, 17, 79, 86, 129, 131, 138, 273, 325, 333, 344 Storage Management software 8081, 89 installation 81 Storage Networking Industry Association 126 Storage Pool 2223, 89, 93, 170, 173, 177, 251, 287, 290, 297 additional allocations 298 and hard capacity limits 29 available capacity 33 capacity 2425 delete 99 future activity 96 logical volumes 23 overall information 94 required size 97 resize 98 resource levels 110 snapshot area 102 snapshot capacity 97 Snapshot Sets 101 space allocation 25 system capacity 9697 unused hard capacity 31 XCLI 101, 121 Storage Pools 14, 18 create 96 storage space 22, 93, 97, 103 storage system 84, 103, 108, 126, 129, 133, 149, 151152, 325, 333, 354355, 357 Storage System software 80 storage virtualization 3, 1415 innovative implementation 14 storageadmin 127128, 131 striping 239 suspend 22 switch_list 257 switching 13 switchover 333334 synchronization 325, 332, 351 synchronous 324 SYSFS 210, 212, 214 System level thin provisioning 2829 System Planar 5153 system quiesce 36 system services 9 system size 2829, 32 hard 2829 soft 2829 system time 255
Index
367
system_capacity_list 255
T
target 325327 Target Protocol 336 target volume 111112, 318, 320, 344 source volume 111 target_list 348 target_mirroring_allow 348 TCO 43 TCP port 86 technician 127128, 135 telephone line 281282 thick-to-thin provisioning 16 Thin Provisioning 94, 108 thin provisioning 34, 16, 24, 26, 94, 102 system level 28 time 255 time_list 246, 255256 TimeStamp 246 timestamp 332 TLS 86 token 278 Toolbar 88 toolbar 115, 117, 120, 263, 273 total cost of ownership 43 transfer size 239 Transient system 41 Transport Layer Security 86 trap 261
vol_move 101102 Volume 99100, 104 resize 109 soft size 27 state 109, 111, 117 Volume Copy 285, 317, 319 OS image 319 volume count 18 Volume Shadow Copy Service (VSS) 311 volume size 25, 107109 VPN 61, 273, 281282 VSS 23
W
Welcome panel 274, 276
X
XCLI 5, 26, 80, 85, 90, 126, 134135, 156, 163164, 240, 246, 255256, 258, 290292, 326, 328, 331 XCLI command dm_delete 359360 event_list 141 XCLI utility 9092 XIV GUI 84, 86, 94, 128129, 140, 157, 163164, 260, 263, 273, 311 Viewing events 142 XIV Storage hardware 80, 85, 88 Management 75, 78 Management GUI 88, 108, 110 Management software 8081, 90 Management software compatibility 80, 180 Manager 80 Manager installation file 81 port 86, 116 System 13, 810, 4547, 6364, 80, 125127, 147148, 151, 235237, 250, 252, 255, 285287, 324326, 353355 System API 311 System grid architecture 1213 System installation 70 system reliability 42 System time 246 Systems 103, 140, 151, 164, 166 XIV storage administrator 150 XIV Storage Manager 8081, 84 XIV Storage System 7980, 84, 249251 administrator 86 API 311 architecture 8, 10, 12 configuration 103, 116 design 3 grid architecture 12 hardware 64, 74 installation 64 internal operating environment 14, 33 iSCSI configuration 73 logical architecture 16
U
UDP 261 unallocated capacity 24, 26, 29 uncommitted data 332 unlocked 106, 109, 111 UNMAP_VOLUME 145, 258 unsynchronized 331, 333, 345 UPS module 46, 48 UPS module complex 48 ups_list 256257 usable capacity 21, 46 usable space 21 USB to Serial 60 user group 127129 Access Control 132133 Unauthorized Hosts/Clusters 133 users 126128 users location 255
V
version_get 255 Virtual Private Network 61 virtualization 1415, 103 VMware 319320 VMware ESX 223224, 231, 362 vol_copy 319 vol_lock 297
368
logical hierarchy 17 main GUI window 164, 173 management functionality 85 Management main window 87 Overview 1 point 176, 178 rack 6061 reserves capacity 21 serial number 166 snapshot functionality 311 software 4, 80 stripes data 236 support 65 time 237238, 246 use 91, 116, 122 virtualization 1415 WWPN 157158, 170 XIV subsystem 239, 286287 XIV V10.0 MN00050 145, 157, 164165, 311 xiv_development 127128 xiv_maintenance 127128, 136 XML configuration file 311
Z
zoning 155156, 172
Index
369
370
Back cover