Basic Storage Networking Technology Certified Storage Engineer (Scse, s10-201)

Study Guide / Book for SNIA Certified Storage Engineer (SCSE, S10-201) http://www.rootkit.nl/files/book_snia_certified_storage_engineer_s10-...
SNIA Certified Storage Engineer (SCSE) book / study guide (S10-201)

Michael Boelen - rootkit.nl Usage notes Last updated Goal Audience License Notes 23 June 2009 Provide a study guide for SNIA SCSE, S10-201 Storage administrators and architects Creative Commons license All information in this study guide is collected from books and internet sources. Although terms and data was checked before, information can incorrect or missing. This book is guide to collect information about SAN and NAS technology and as preparation guide for the S10-201 exam. This book is a work in progress. Suggestions or input are appreciated (Contact form). Progress: Stage 1: Initial writing Stage 2: Markup Stage 3: Extend information Stage: 1 2 3 100% 70% 1%
1. Explain and recognize basic Storage Networking Technology Components and Concepts (9%)
1.1 Compare and contrast how the disk technologies of Fibre Channel, ATA, SATA, SCSI, and SAS operate
ATA (IDE) Also known as parallel ATA (PATA) 8 or 16-bits interface Maximum theoretical speed 100MB/s (ATA-6) Fibre Channel: A 24-bit address consists of the following 3 parts (in order): Domain (1-239), Area (0-255) and Node Address (the AL_PA) 8 Bit Domain ID, 8 Bit Area ID, and 8 Bit Port ID Domain The domain is a unique number assigned to each switch in a logical fabric. A domain ID assigned to a switch can range from 1 to 239. This number comprises the first 8 bits of the FCID. Area -The 8-bit area field is assigned by the switch as well. It can range from 0 to 256. In some third-party switches this number is assigned by using the physical port number (that is, port 3 out of 16 ports), limiting availability on some operating systems. The Cisco MDS assigns these sequentially regardless of the physical port number. Port -The port field is also 8 bits ranging from 0 to 256. This field is unique in that it also is used to assign the arbitrated loop physical address (ALPA) for devices that use loop. In the context of a device that is not using arbitrated loop, it is common to see the field set to 0, although this is not required. http://www.cisco.com/en/US/prod/collateral/ps4159/ps6409/ps4358 /prod_white_paper0900aecd80285738_ns512_Networking_Solutions_White_Paper.html SAS (Serial Attached SCSI): Max 128 devices (first generation), max 256 devices (second generation) Max 3 Gb/s, will be 6 Gb/s in near future Hot-pluggable SAS devices can communicate with both SATA and SCSI devices (the backplanes of SAS devices are identical to SATA devices). A specific difference between SCSI and SAS devices is the addition in SAS devices of two data ports, each of which resides in a different SAS domain. This enables to use redudancy (failover possibility). If one path fails, there is still communication along a separate and independent path.
1 sur 17
14/01/2012 18:37
SATA (Serial ATA): Serial link Current standard maximum 6 Gbit/s speed Most disks currently can't saturate the 1.5 Gbit/s Uses native command queuing to deal with incoming actions 7-pins connector for data, 15-pins connector for power When converting SAS to SATA use an adapter or cable Example: http://www.cs-electronics.com/sas-products.htm SCSI Parallel Up to 320 MB/s (Ultra-320 SCSI) or even 640 MB/s (Ultra-640 SCSI)
Define differences between serial and parallel approaches within a configuration

PATA: Master/Slave, shared bus SATA: Serial ATA, point-to-point topology, no shared bus Parallel technologies have disadvantages like skewing (bits don't arrive at the same time) Serial approaches use often 8b/10b encoding to avoid skewing issues which parallel solutions have. The 2 extra bits are also used for: Clock recovery DC balance Special characters (localization) Error detection SAS expander : forwarding http://www.freebsd.org/doc/en/articles/storage-devices/scsi.html http://www.storagereview.com/articles/200406/20040625TCQ_1.html? page=0%2C4 http://support.dell.com/support/edocs/storage/p62517/en/chapterb.htm Related terms Tagged Command Queuing (TCQ) Technology built into some ATA and SCSI hard drives. It allows the operating system to queue up multiple read and write requests to a hard drive at the same time. This helps the system to optimize the order in which it can execute read and write commands, without having the operating system to take care of the queuing. SCSI tagged command queuing (TCQ) applies to the device, device controller, firmware, device driver Native Command Queuing It's a more intelligent queuing mechanism than TCQ. It works by incorporating queuing into the disk, device controller, firmware and device driver (operating system). All these parts work together to achieve a maximum effiency. See NCQ http://www.wdc.com/en/library/sata/2579-001076.pdf
1.2 Describe Array Technology/Virtualization

Goal: Hiding real disks from application Virtualization knows several layers, including: Host: Application, HBA, OS Network: Switch, Router, Gateway Storage: Array, Library, Device File/Record virtualization: one or more objects are visible as one File system virtualization: combining multiple data sources to one big chunk Tape media: better utilization of tape drives Pro's of virtualization: Backup & Restore Clustering Snapshots Replication Migration Transformation Caching Security Quality of Storage Services & Policies Pooling
Describe virtualization implementation techniques and management strategies (e.g., in-band and
2 sur 17
14/01/2012 18:37
out-of-band)
host-based: storage-based: main reasons for segmentation and security. Segmentation/virtualization helps in performing upgrades, migrating data etc. Switch-based virtualization (in-band / out-of-band): in-band: control and data travel the same path. Pro's are easier installation (no specific software required), offloading and performance optimizations in data path possible. out-of-band: control and data have their own path
1.3 Define SAS and SATA technology

See 1.1 SATA: using Native Command Queueing. See http://searchstorage.techtarget.com/tip/0,289483,sid5_gci1131788,00.html SAS devices cannot plug into SATA controllers SATA devices can plug into SAS controllers
Identify a legal vs. illegal SAS topology layout

Legal topology: Directly attached to initiator Attached to expander Illegal topology: More than one fan-out expander per SAS domain
Explain the routing mechanism that occurs in a SAS expander topology

Direct routing: SAS host to directly attached devices Table routing: SAS host to other expander devices Subtractive routing: forward unresolved connection requests when neither direct nor table routing succeeds Fan-out expanders Never uses subtractive routing, but table routing instead. Usually fan-out expanders have a bigger routing table Maximum of one fan-out expander in a SAS domain Often at the top of the chain Edge expanders May use subtractive routing. Subtractive routing happens upstream (to other expanders) and direct routing downstream.
2. Perform Storage Networking Administration (24%)

2.1 Optimize redundancy within a switched environment; adapt to changing needs and demands
Use multi pathing software that supports both load balancing and path fail over. Red Hat Linux (and others as well) has device mapper multi path, Solaris 10 has XPATH and IRIX has XVM. Another pro can be upgrading firmware, without disruption of the service. This can be achieved by using multiple paths to a target and disable one path temporarily.
2.2 Explain HBA configuration parameters; justify the reasons for each parameter setting
QueueDepth If the number of outstanding I/Os per device is expected to be above 32, then QueueDepth needs to be increased. Usually the vendor of the storage and/or HBA's have documents describing how to adjust the value and how to measure the value with the best performance. Usually dividing the total of the storage array's queue lenght with the amount of HBA's. If QueueDepth is undersized, there can be a performance degradation due to Storport throttling of its device queue. I/O coalesce
3 sur 17
14/01/2012 18:37
IO coalesce controls the number of CPU interrupts, for more efficient CPU utilization. Turn on the I/O coalesce parameter in high-performance environments. However when adjusting the related parameters it's important to find the most suitable values. Reducing the number of interrupts can cause poor performance. It depends mainly on the workload. CoalesceMsCnt is the count in milliseconds, CoalesceRspCntis the count of pending responses. ConnectionOption DataRate FrameSize HardLoopID ResetDelay EnableBIOS EnableHardLoopID EnableFCPErrRecovery ExecutionThrottle EnableExtendedLogging LoginReTryCount EnableLipReset PortDownRetryCount EnableLIPFullLogin LinkDownTimeOut EnableTargetReset MaximumLUNsPerTarget LinkDownError FastErrorReporting Parameter Data Rate Execution Throttle Connection options (topology) Loop Reset Delay Enable LIP Full Login Enable Target Reset Port Down Retry Count Link Down Timeout LUNs Per Target Adapter Hard Loop ID Hard Loop ID Descending Search LoopID Operation Mode Interrupt Delay Times Enable Interrupt (24xx HBAs) CO DR FR HD RD EB HL EF ET EL LR LP PD FL LT TR ML LD FE 0-3 0-3 512,1024,2048 0-125 0-255 0,1 0,1 0,1 1-65535 0,1 0-255 0,1 0-255 0,1 0-240 0,1 0,8,16,32,64,128,256 0,1 0,1 See note 1 below See note 2 below
See See See See See
note note note note note
3,6 below 3 below 3 below 5 below 3,4 below
See note 5 below See note 3 below See See See See note note note note 3,5 below 5 below 3,5 below 3,5 below
Qlogic default setting 0 (1 Gb/s) 16 2 (Loop preferred, otherwise pointto-point) 5 Yes No 8 30 8 Enabled 0 0 0 0 No
EMC-approved setting 2 (AutoSelect) 256 2 (Loop preferred, otherwise pointto-point) 5 Yes Yes 45 45 256 Disabled 0 1 0 0 No
Execution Throttle: Specifies the maximum number of I/O commands allowed to execute on a HBA port. When a ports execution throttle is reached, no new commands are executed until the current command finishes 256 1256 Windows Frame Size Specifies the size of a Fibre Channel frame per I/O. 2048 5122048 All Fibre Channel Data Rate Specifies the HBA adapter data rate. When set to Auto, the adapter auto-negotiates the data rate with the connecting SAN device. Auto 1 (Auto), 2 (1Gb), 3 (2Gb), 4 (4Gb) All Maximum Queue Depth Specifies the maximum number of I/O commands allowed to execute/queue on a HBA port. 32 1-65535 VMware ESX Maximum Scatter Gather List Size Specifies the size of the list of DMA items that are reported to SCSI mid-level per I/O request. 32 1-255 VMware ESX Maximum Sectors Specifies the maximum number of disk sectors that are reported to SCSI mid-Level per I/O request. 512 512, 1024, 2048 VMware ESX
2.3 Define troubleshooting methodologies and tools within scenarios

SAN zoning problems cause the majority of issues. Common problems are: Missing targets from the host zone Host zone configured to see the wrong targets Incorrect WWN alias(es) resulting from new or replaced hardware New zone(s) not added to the active configuration Switch zoning modifications are the most common change that occurs in a SAN, which explains the increased chance for mistakes. Also, there is also no way to automate zoning since it requires human decisions to determine initiator and target accessibility. Host HBA issues occur almost as frequently as SAN zoning problems. Disk zoning / lun masking provide another layer of manual configuration that can lead to problems. FC cabling problems Use a clear naming and cable convention to avoid problems and speeds up debugging issues.
Explain reasons to add or remove Inter Switch Links (ISLs)
4 sur 17
14/01/2012 18:37
Adding and removing ISLs is the result of connecting or disconnecting E-ports (Expansion port). Reasons: Load sharing Fail over Connecting fabrics, increasing throughput. Or adding links to an existing ISL trunk.
Analyze port log-in, fabric log-in and process log-in

Fabric Login (FLOGI): Login after connecting to a fabric switch. Related ports: F_port to N_Port (or NL_Port) Related information: WWN S_ID Protocol Fibre Class Zoning Port Login (PLOGI): Two node ports establish a connection between (often fibre channel HBA connection to a switch). Related ports: N_port to N_port Related information: WWN S_ID ULP Fibre Class BB Credit Process Login (PRLI): Process login is used to set up the environment between related processes on an originating N_Port and a responding N_Port. Related ports: ULP( scsi-3 to scsi-3) Related information: LUN
Isolate bandwidth issues and errors related to time outs

Bandwidth issues are often found on the ISLs, where paths are coming together. Monitoring of the bandwidth usage is important in tracing the source of these kind of problems. Common symptons: One of the symptoms to this kind of problems are SCSI time out errors.
Identify process to add a configured switch to an existing fabric

Brocade: Clear configuration (configDefault or cfgClear) Copy configuration from another switch (or backup) Save configuration (cfgSave)
Set time out values, buffer-to-buffer settings

Configure network parameters Configure fabric parameters (BB Credit, R_A_TOV , E_D_TOV, switch PID format, Domain ID) Enable/Disable ports Configure port speeds Configure Zoning
BB Credit
Configure the number of buffers that are available to attached devices for frame receipt default 16. Values range 1-16.
R_A_TOV
Resource allocation time out value. This works with the E_D_TOV to determine switch actions when presented with an error condition
E_D_TOV
Error detect time out value. This timer is used to flag potential error condition when an expected response is not received within the set time
5 sur 17
14/01/2012 18:37
Set communications mode between two fabrics

Brocade switches: interopmode set to 1 to talk to other vendors (note: it needs to be enabled on all switches within the fabric) M-EOS switches: use open mode Notes: According to the documentation the domain ID must be between 97..127 for interoperability (depending on mode and vendor) Changes after activation of interoperability mode: Switch Feature Changes if Interoperability Is Enabled Domain IDs = Some vendors cannot use the full range of 239 domains within a fabric. For example in with McData switches domain IDs are restricted to the range 97-127. This is to accommodate McData's nominal restriction to this same range. They can either be set up statically (the Cisco MDS switch accept only one domain ID, if it does not get that domain ID it isolates itself from the fabric) or preferred. (If it does not get its requested domain ID, it accepts any assigned domain ID.) Timers All Fibre Channel timers must be the same on all switches as these values are exchanged by E ports when establishing an ISL. The timers are F_S_TOV, D_S_TOV, E_D_TOV, and R_A_TOV. F_S_TOV Verify that the Fabric Stability Time Out Value timers match exactly. D_S_TOV Verify that the Distributed Services Time Out Value timers match exactly. E_D_TOV Verify that the Error Detect Time Out Value timers match exactly. R_A_TOV: Verify that the Resource Allocation Time Out Value timers match exactly. Trunking Trunking is not supported between two different vendor's switches. This feature may be disabled on a per port or per switch basis. Default zone The default zone behavior of permit (all nodes can see all other nodes) or deny (all nodes are isolated when not explicitly placed in a zone) may change. Zoning attributes Zones may be limited to the pWWN and other proprietary zoning methods (physical port number) may be eliminated. Note Brocade uses the cfgsave command to save fabric-wide zoning configuration. This command does not have any effect on Cisco MDS 9000 Family switches if they are part of the same fabric. You must explicitly save the configuration on each switch in the Cisco MDS 9000 Family. Zone propagation Some vendors do not pass the full zone configuration to other switches, only the active zone set gets passed. Verify that the active zone set or zone configuration has correctly propagated to the other switches in the fabric. VSAN Interop mode only affects the specified VSAN. TE ports and PortChannels TE ports and PortChannels cannot be used to connect Cisco MDS to non-Cisco MDS switches. Only E ports can be used to connect to non-Cisco MDS switches. TE ports and PortChannels can still be used to connect an Cisco MDS to other Cisco MDS switches even when in interop mode. FSPF The routing of frames within the fabric is not changed by the introduction of interop mode. The switch continues to use src-id, dst-id, and ox-id to load balance across multiple ISL links. Domain reconfiguration disruptive This is a switch-wide impacting event. Brocade and McData require the entire switch to be placed in offline mode and/or rebooted when changing domain IDs.
6 sur 17
14/01/2012 18:37
Domain reconfiguration nondisruptive This event is limited to the affected VSAN. Only Cisco MDS 9000 Family switches have this capabilityonly the domain manager process for the affected VSAN is restarted and not the entire switch. Name server Verify that all vendors have the correct values in their respective name server database. IVR IVR-enabled VSANs can be configured in any interop mode. Brocade's msplmgmtdeactivate command must explicitly be run prior to connecting from a Brocade switch to either Cisco MDS 9000 Family switches or to McData switches. This command uses Brocade proprietary frames to exchange platform information, which Cisco MDS 9000 Family switches and McData switches do not understand. Rejecting these frames causes the common E ports to become isolated.
Validate interoperability among vendors

ARP can be an issue: two protocols: FARP ARP over FCP FCIP can assist in combining hardware from several vendors
Validate domain IDs on switches

Each switch has an unique domain ID. A SAN permits up to 239 switches in a SAN and therefore allows 239 Domain IDs. Even when using separated fabrics, it's good practice to avoid using the same domain IDs to make merging of fabrics in future a lot easier.
Connect switch to a fabric

Before connecting a switch, clear it's configuration first. Brocade: 1. Login as root 2. switchdisable 3. cfgdisable 4. cfgclear 5. passwddefault 6. portstatsclear 7. portlogclear 8. reboot 9. configUpload
2.5 Identify results of ISL oversubscription

Common oversubscription ration: 7:1 ISL ports should be monitored. A ISL port performing at 80% capacity could indicate possible oversubscription.
2.6 Create/configure and modify zone sets

Brocade Create initial Fabric configuration: Switch1:admin>cfgcreate "Fabric1", "LinuxNode1Zone1" Once the configuration is created, additional zones can be added with the cfgadd command: Switch1:admin> cfgadd "Fabric1", "LinuxNode1Zone2" Switch1:admin> cfgsave
Effective configuration: active set, loaded in memory. Can be saved with cfgSave. Defined configuration: saved set on flash, can be loaded with cfgEnable.
Implement zoning for single server and cluster applications

xxx
Create backup of zone database prior to zone modification

Brocade: configUpload (to FTP)
Configure zones within a redundant fabric

Important: First apply configuration change to fabric 1. When the change is successful it can be applied to fabric 2.
7 sur 17
14/01/2012 18:37
Explain how zone is stored and distributed throughout the fabric

A new switch will gain the configuration of an existing fabric. Default zone membership includes all ports or WWNs that do not have a specific membership association. Access between default zone members is controlled by the default zone policy.
Explain the possible zoning conflicts that cause fabric segmentation

Brocade switch: fabstatsshow (show reasons for fabric segmentation) Type mismatch: Occurs when the name of a zone object in one fabric is also used for a different type of zone object in the other fabric. Example: Fabric A: alias: Mkt_Host 1,16 Fabric B: zone: Mkt_Host 1,16 Content mismatch: Occurs when the name and type of a zone object in one fabric is also used in the other fabric but the content or order is different. Example: Fabric A: alias: Eng_Stor wwn1; wwn2 Fabric B: alias: Eng_Stor wwn2; wwn1
Perform fabric merge without zoning conflict

Tips: Clear device if it was part of another fabric Brocade: Switches in a fabric will not merge unless the PID formats are exactly the same Different time out values on E-ports can cause fabric segmentation Segmentation errors can exist if a switch has a bigger zone database than the allowed maximum size. Usually the oldest/lightest switch determines how big the database can be within a fabric. Different VSAN's on both fabrics. ACL/allow list on VSAN, blocking (valid) traffic. The name of a zone in Fabric A should not be used for a different type of zone in Fabric B. For example, if you create a zone named myZone in Fabric A, you should not use the same name as an alias, zone configuration, or zoneset name in Fabric B. In this scenario, merging the fabrics will cause a zone type mismatch. If an alias, zone, zoneset, or zone configuration name is the same on both Fabric A and Fabric B, but the content between the two fabrics is different, the fabrics will not merge. Follow the following steps as you prepare to merge SAN fabrics: 1. Check for conflicting Domain IDs on both fabrics before merging. Usually lowest WWN will get the principal role. 2. Check for conflicting zone definitions before merging. 3. Verify that the Fabric islands have the same feature licenses before merging. 4. Verify that all switch parameters are compatible with the fabric before merging. 5. When possible, use the same hardware as much as possible. 6. Merge the fabrics using one ISL at a time.
Explain instances of zone name clash

- Clash can happen when: - pWWN and FC ID are not unique between fabrics - Same zone name is used, but with different members or different order
Configure active zone sets

Zone set consists of one or more zones. Often only one zone set can be active (SAN should be idle or shutdown to change configuration).
2.8 Identify best practices for storage allocation in Fibre Channel SAN Adding storage to a new host
EMC: Create raid pool Bind LUN Create storage pool Register host
8 sur 17
14/01/2012 18:37
Present LUN to host Upgrading EMC: Extend LUN NetApp: Extend volume or iSCSI LUN
3. Manage Storage Networks (21%)

3.1 Compare Storage Device Management to Storage Network Management
Discriminate among the components, characteristics and functions

Hub: older devices which send incoming data to all ports Switch: common devices which have an increased throughput compared with hubs, due the point-to-point connection. Director: chassis with switch blades
Create volumes in NAS environment

NetApp: Create aggregate and add disks to it Create volume Configure characteristics of volume (minimal read-ahead, snapshots etc)
Contrast scalability issues between SAN and NAS

NAS: file based (commonly NFS/CIFS, sometimes iSCSI) SAN: block based (Fibre Channel, iSCSI) SANs scale better, since they don't reach practical limits that easily/quickly. NAS filers have a maximum current users / data throughput, before additional filers have to be added. NAS filers are usually easier to manage and provide an easy access to data for Unix and Windows clients via NFS/CIFS.
Identify business context for NAS (e.g., email repository, content archiving)
NAS is often used for sharing documents, file stores, content archiving, email repositories, backups
Identify business context for SAN (e.g., database repository, data replication)
Storage with low latency demands like databases and OLTP. Also mass storage demands including data replication.
3.2 Describe Configuration Management Elements

xxx
Explain HBA Configuration Management Elements

xxx
Construct host-side configuration of HBAs

xxx
Identify Virtual HBA (e.g., iSCSI, VN Port)

Virtual HBA is a port within for example a virtual machine guest. VN port: Virtual Node port, connected to a virtual node (e.g. host or storage device).
Define OS-based technology concepts

xxx
3.3 Explain Change Management Process (ITIL)
9 sur 17
14/01/2012 18:37
Identify steps needed to bring environment back to a controlled situation (e.g., host is swapped out or a device is changed)
xxx
Implementing decommission of hardware (e.g., classify information to understand proper disposal methods, erasure of passwords, configs and zone sets, disk, tape, and data
Cisco devices: clear zone database (clears zone information of VSAN) Passwords: clear passwords Configs: clear configuration before reusing or throwing hardware away. Zone sets: xxx Disk: xxx Tape:Remove from catalog (remove or 'expire' the tape media) and use the company's disposal method.
3.4 Optimize redundancy within a switched environment

At least 2 HBA's in each host / storage array, if possible Don't use too much ISL's
3.5 Apply steps to add a configured switch to an existing fabric (e.g., verify that domain ID is unique, insure zone names are unique, backup existing zone before changes, validate existing admin account has unique username/password on new switch) 3.6 Using scenarios, illustrate reasons to add or remove ISLs (Inter Switch Links)
Increasing throughput, connecting more fabrics together.
Determine impact of adding an ISL (e.g., more options for SAN expansion, allows configuration to take full advantage of ports)
More ISLs means a better usage of the ports (and less oversubscription needed). Also expansion of the SAN is possible.
Determine impact of removing an ISL (e.g., degraded performance)

Degraded performance, possible increased latency
3.7 Identify processes that occur on a switch during a fabric merge (e.g., name services, protocol sequence, and principle switch selection)
While merging, the following processes happen: Zoneset passing Name server distribution Negotiation of (shortest) paths principal switch selection/negiotiation (lowest WWN wins usually)
3.8 Using scenarios, illustrate common blocking problems to fabric merge

xxx
Selection of switch as primary (e.g., lowest worldwide name)

Lowest domain id Lowest worldwide name
Awareness of fabric behavior upon merge (e.g., takes 5-10 minutes to stabilize because of background processes)
Tips: - Use one ISL at a time
Activation of new production zone sets once the merge is complete (e.g., two switches on Fabric A, and one HBA going to each fabric) 3.9 Using scenarios, determine appropriate methodologies and tools for troubleshooting zone sets Validation of host and LUNs Validation of HBA logged into fabric
10 sur 17
14/01/2012 18:37
Validation of zone set

Brocade: zoneShow
Validation of active zone library

Brocade: cfgShow
Validation of storage subsystem being logged into the switch 3.10 Predict the symptoms when the distance limitations between long-wave and shortwave fiber has been exceeded Explain why there is excessive SCSI re-transmit errors (e.g., intermittent loss of signal)
- Signal loss - Oversubscription
3.11 Create or modify zone sets using best practices

xxx
3.12 Using scenarios, illustrate additional conflicts that could cause fabric segmentation
(see initial reasons in 2.7) If an Extended Fabrics port is to be installed on a SilkWorm 2000 Series switch, the fabric wide configuration parameter fabric.ops.mode.longDistance must be set to 1 on all switches operating within the fabric. Additionally, each long distance port must be set using the portCfgLongDistance command. Each of the two ports within a long distance ISL must be configured identically, otherwise fabric segmentation will occur.
Validate switch modes are set to be the same

xxx
Verify ISLs are working correctly

Example messages on Brocade: 0x1023fc60 (tThad): Apr 3 22:11:44 WARNING FW-ABOVE, 3, eportTXPerf004 (E Port TX Performance 4) is above high boundary. current value : 95462 KB/s. (faulty) Normal message: 0x1023fc60 (tThad): Apr 3 22:11:52 WARNING FW-BELOW, 3, eportTXPerf004 (E Port TX Performance 4) is below low boundary. current value : 12591 KB/s. (normal) Brocade: portErrShow
frames enc crc too too bad enc disc link loss loss frjt fbsy tx rx in err shrt long eof out c3 fail sync sig ===================================================================== 4: 617m 2.8g 0 2 0 0 0 268k 0 0 2 9 0 0 4: 2.8g 617m 0 29 0 0 0 1 333 0 1 5 0 0
<< switch_one << switch_two
Possible causes: Length of cabling GBIC issue Dirty SPF
More information: Brocade portErrShow.pdf
4. Perform Data Protection and Recovery (14%)
4.1 Describe the different back-up and restore configurations

Make daily/weekly backups of all available configurations. Most vendors have a way to download the configuration of switches and store it. If needed, adjust available tooling.
Describe the technical advantages and disadvantages of each configuration (i.e., performance)
11 sur 17
14/01/2012 18:37
xxx
Identify external requirements that are uniquely satisfied by serverless backup or third-party copy
xxx
4.2 Analyze potential backup problems (e.g., open file, out of space, virus scanner)
xxx
Using scenarios, analyze the trade-offs with disk-to-tape, back-up window, media, silo (e.g., low cost, portable, but slow) xxx Using scenarios, explain advantages of disk-to-disk method (e.g., physical space, space on media, security and access to data)
xxx
Using scenarios, explain the advantages of off-host (e.g., dedicated back-up server, speed vs. cost)
xxx
Using scenarios, explain advantage of LAN-free (e.g., tapes and disks on a dedicated fabric)
Low overhead on servers High speed Tape devices and backup disks could be zoned or placed in a dedicated fabric.
Explain ways to maximize user time and minimize back-up window

Use LAN-free, serverless backups, snapshot technology, or backup from a passive node.
4.3 Ensure Fibre Channel Security

Physical security: do not allow physical access to unauthorized people. Prevent physical access Prevent remote access through IP security measures (i.e. putting devices into a specific VLAN) Hard Zone the devices Lock Down E_port creation (Brocade: portCfgEport) Disable ports (Brocade: portCfgPersistantDisable) Data encryption: store data encrypted when needed. If needed, encrypt data before putting it on the wire. Zoning: hard soft mixed LUN masking: exports a LUN only to the systems which are allowed to use it.
Show how to implement port authentication protocols

CHAP FCAP
Perform processes to secure a fabric

Host isolation refers to ensuring only one initiator (host) per SAN zone, which prevents a misbehaving HBA or host driver from interfering with any of the other hosts in the SAN.
Compare the difference between hard and soft zoning regarding security
Hard zoning: members of a zone are physical ports, also known as port zoning Soft zoning: WWN of PWWN are members of zone, happens within a fabric switch. Software zoning lets you create symbolic names for the zones and zone members.
Explain the process to configure secure management access to Fibre Channel switches
Use protocols with encryption like SSH (instead of telnet) and HTTPS (instead of HTTP).
4.4 Explain how to recover a clustered storage configuration

xxx
12 sur 17
14/01/2012 18:37
5. Implement Storage Networks (17%)

5.1 Define the role of bridges and the differences between PCI-X and PCI-e
PCIe-to-PCIX bridges allow access for legacy devices PCI-X uses conventional PCI technology, and is the double-wide version of PCI with up to 4 times the clock speed. It was needed for hardware like gigabit, fiberchannel and Ultra320 SCSI cards. PCI-X v1.0 slot is 133 MHz If a conventional PCI card is installed in a PCI-X slot then the clock speed of other PCI-X slots may be reduced. PCI express is a totally new approach, so PCI Express cards can neither be installed in conventional PCI or PCI-X slots, nor can conventional PCI cards or PCI-X cards be installed in a PCI Express slot. PCI Express 1x PCI-e cards will fit in 1x, 4x, 8x and 16x PCI-e slots. 4x PCI-e cards will fit in 4x, 8x and 16x PCI-e slots. 8x PCI-e cards will fit in 8x en 16x PCI-e slots. 16x PCI-e cards will fit in 16x PCI-e slots. So a fast 16x PCI-e card will not work in a 8x (or lower) slot.
5.2 Compare the RAID levels and implementation (e.g., hardware, software, host-based)
Raid 0: Raid 1: Raid 2: Raid 3: Raid 4: Raid 5: Raid 6: Raid 0+1: Raid 1+0: Hardware VS software: hardware has better performance and doesn't let the CPU do all the work.
Describe technical benefits and limitations of the different RAID levels

RAID 5: slow with writing, as all disks are used to write data, but also are needed to write the parity information. With an even amount of disks, this means only half of the write actions are possible (8 disks = 8 reads or 4 writes, at the same time).
5.3 Implementing Switch Technology Differentiate among Core/Edge, Cascaded and Mesh designs
Cascaded: inexpensive, easy to extend. However, low reliability and low scalability. Ring: same as Cascaded topology, but with better reliability Core/Edge: best flexibility and reliability. Multi-layer design. Examples: tiered hybrid Mesh: can be full or partially crossed. Good for any-any traffic. The downside is ISLs using valuable ports.
Explain fan-in and fan-out ratios

Fan-out : ratio of storage ports to hosts (1:4) Fan-in : ratio of hosts to storage ports (7:1)
Identify the slot to place the HBA for maximum performance and reliability
When using SSD: ALWAYS use a single port per PCI-E HBA card. Do not attempt to use multiple ports on your HBA cards, as the SSD bandwidth will be limited by the PCI bus Avoid putting more HBAs on a server than the bus throughput can support
5.4 Implementing Virtualization

xxx Tape libraries can be virtualized (VTL: virtual tape library), to make applications believe they are writing to a normal tape unit. Instead these virtual tapes are disks (or parts of disks) and have a way better performance than conventional tape units.
Explain the reasons for virtualizing servers (e.g., ability to failover, load balance, fully utilize physical assets
Better utilizing hardware, less power, more central management possible, load balancing, clustering and failover possibilities by placing VM's on different hosts.
5.5 Implementing NAS

xxx
List NFS/CIFS common parameters (e.g., which OS, journaling level, statefull/stateless
13 sur 17
14/01/2012 18:37
NFS: UDP or TCP, port 2049, versions 2, 3, 4, usually Linux/Solaris, stateful (TCP), but no intervention needed when failing over. NFS is stateless, as in: failure is transparant for client and server. Recovering doesn't need actions like rebooting the system to free up resources or states. CIFS: TCP, port 445, usually Windows, stateful, intervention required at failover, due state recovery. With CIFS, the client maintains the connection and open file names, directories and various other aspects of the files and directories. CIFS is a "stateful" protocol, which is also a problem when the underlying connection is lost. The client does not know when to recreate the connecting. File content is cached via a cooperative process between client and server code, and this is where problems can occur. The state survives only as long as the session between the server and the client survives, and this session survives only as long as the underlying network connection (generally TCP/IP) survives. See http://www.snia.org/images/tutorial_docs/Networking/JimPinkertonSMB2_Big_Improvements_Remote_FS_Protocol-v3.pdf
Explain when no block level access is significant or insignificant (e.g., FSCK-CHKDSK, forensics)
When using file level protocols, the NAS will have to perform the local integrity of a file system. However, when performing forensics or file system checks, and data is being served via block based access (SAN/iSCSI), the guest system has to perform the operations.
Compare NDMP with standard NAS file level back-up (e.g., scalability, block vs. file, offloading of work to NAS unit)
xxx
6. Monitor Storage Networking Performance (9%)
6.1 Use tools to access the performance of a network storage environment for analysis
Switch performance: Brocade example: switch1:admin> portPerfShow 5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Total ------------------------------------------------------------------------------------0 0 21m 28m 31m 0 8.4m 0 28m 21m 31m 0 8.4m 0 0 0 178m 0 0 20m 29m 31m 0 10m 0 29m 20m 31m 0 10m 0 0 0 182m 0 0 18m 36m 31m 0 14m 0 36m 18m 31m 0 14m 0 0 0 201m 0 0 17m 34m 30m 0 7.0m 0 34m 17m 31m 0 7.0m 0 0 0 179m
HBA performance: xxx
Establish baselines (e.g., performance-based, trending, configuration, as built)

Use tools like MRTG, Cacti and RRDTOOL, to create initial baselines.
Use a time server across environments for log correlation, security, discovery process and troubleshooting
Time synchronization is important for troubleshooting, when trying to debug issues and compare log events with error messages. Also interesting for security breaches and/or events, to trace back all steps in a investigation. Protocol: NTP Port: 123 Brocade switches: configure time on principal switch. Other switches will use principal switch to synchronize time. Another use for having the correct time is the discovery process happening with RSCN. When a new disk array is attached to the fabric (ONLY the switch with the connected array), the HBA's registered within the switch's notification list, will be notified and can start discovering new devices/LUN's. Discovery process SCSI discovery process In the modern SCSI transport protocols, there is an automated process of "discovery" of the IDs. SSA initiators "walk the loop" to determine what devices are there and then assign each one a 7-bit "hop-count" value. Serial Storage Architecture (SSA) is an IBM developed serial interface. SSA is a serial technology which basically runs the SCSI-2 software protocol. The good news about SSA compared to SCSI is: it is far easier configured and cabled -- no termination needed! it is built with HA features. The SSA loop architecture (as opposed to a SCSI bus) has no SPOF (see diagram below). If part of a loop fails, the device driver will automatically and transparently reconfigure itself to make sure all SSA devices can be accessed without any noticable interruption. it uses no SCSI ID addressing which means no hassle with setting up the adapters. the SSA loop can transport 4 times 20 MByte/s -- two independent reads and two independant writes across
14 sur 17
14/01/2012 18:37
each loop direction. Current actual adapter implementations allow for 35 MByte/s per adapter. SSA uses no bus arbitration as opposed to SCSI. Rather than that, a network-like scheme is used. Data is sent and received in 128 Byte packets, and all devices on the loop can request time slots independantly. SCSI in turn needs bus arbitration which can lead to performance deadlocks if an initiator doesn't release the bus in time. SSA allows for 25 meters between each two devices. Plus, there is a fiberoptical extender which allows for data transfers across 50 Micrometer optical cables over distances up to 2.4 km. This makes it even suitable for site disaster recovery if configured properly. Most SSA adapters support two independent loops which makes it possible to attach mirrored disks to different loops for higher availability. The SSA loops are symmetrical, twisted-pair, potential free. No TERMPWR potential shift problem.
FC-AL initiators use the LIP (Loop Initialization Protocol) to interrogate each device port for its WWN ( World Wide Name ). For iSCSI, because of the unlimited scope of the (IP) network, the process is quite complicated. These discovery processes occur at power-on/initialization time and also if the bus topology changes later, for example if an extra device is added.
Analyze performance implications on the fabric involving RAID, caching and connectivity configurations (i.e., identifying potential bottlenecks among these indicators)
xxx Cache Optimizing the cache usage can have a great performance gain on the storage. More data can be quickly served from the cache, instead of the much slower disks. While having cache memory is usually a good thing, it should be disabled if only small random reads are being used. NetApp: sysstat -x 5 EMC Navisphere (CLI): navicli -h XXX getcache Example: # navicli -h 192.168.29.133getcache Prct Dirty Cache Pages = High Watermark: Low Watermark: -pdp -high -low 51 80 60
If 80% of cache is dirty, then it will flush cache down to 60%, currently it is at 51%. RAID level Using the best RAID level optimized for safety and read and/or write speed is important. By creating several different RAID levels within the storage tiers, much of the data processing can be improved.
Monitor, collect, and analyze trending information to avoid bottlenecks or resource constraints on the system architecture
Monitoring logs is probably the most basic form of tracking the health of any system. Also checking trends by using tools like RRD, SNMP can give valuable information about the health and grow speed of affected systems. Also monitoring tools like Nagios, Zabbix etc are useful to respond to problems in time. Brocade switches provide the commands portperfshow and porterrshow.
6.2 Develop and follow steps for problem resolution

xxx
Analyze Resolve problem; document problem tracking, root cause analysis, problem resolution, problem prevention timeline
Root cause analysis (RCA): document describing events happened after a big issue/problem. Often with additional information about follow up actions, problem description, timeline of events, problem resolution/solution.
Analyze and document compliance/non-compliance to customer Service Level Agreement

xxx
6.3 Asses methods to reduce performance impacts when adding long distance connections
Use a proper amount of buffer-to-buffer credits. Use asynchronous replication instead of synchronous, to prevent huge (application) delays, if the RPO can be higher than zero. Set speed on both sides of the link to a fixed value (instead of auto negotiation)
Analyze when an increase in buffer-to-buffer credit is necessary

The buffer-credit method, a form of storage distance extension. If the length of the fiber optic cable span exceeds this limit, the throughput drops sharply. The buffer-credit method gets around this problem. Unacknowledged frames (buffer credits) determine how many packets can be sent, before an acknowledgment has to come. It's
15 sur 17
14/01/2012 18:37
compare with window size (in TCP connections). The value can be increased when the link is stable (or shorter). Brocade formula: Buffer Credits = ((Distance in km) * (Data Rate) * 1000) / 2112 Brocade switches can also use LD mode (Dynamic long distance mode) to automatically adjust the buffer-to-buffer credit value.
Use LSANs or VSANs to isolate traffic such that only required traffic is transferred
VSAN: virtual SAN or virtual fabric, to achieve isolation without having the need to setup a physical separated fabric. If a switch does not support VSANs, create a SAN as small as possible, but with room for growth. LSAN: sharing (zone) information across fabrics (zones are usually prefixed with "lsan_").
Explain when to use compression/encryption and in which sequence

Order: Compression first, then encryption. Compression is useful for information which is text based and have a high compress rate. Compression is not useful for encrypted links (like VPN tunnels), or compact formats like audio, video and images.
7. Provide Storage Networking Business Continuance (6%)

7.1 Describe archiving/nearline
Nearline storage is used to tier storage using cheaper storage, but usually with a bigger storage capacity. It can also apply to information which does not need high performance storage at that moment and has to be stored on a lower performance (and cheaper) array. One of the common used purposes is archiving of information or additional backups.
Define Content Addressable Storage (CAS) (e.g., hand-offs)

Content Addressable Storage/Content Addressed Storage (CAS) and Fixed Content Storage (FCS) are different acronyms for storage of documents which don't change in time and the related location based addressing. If the same document would available on multiple places, it is only placed once. Information is accessed by using specific ID's, generated at the time of creation on the CAS system.
7.2 Identify protocols and technologies best used for implementing business recovery solutions
DWDM or IP extenders (in combination with FCIP or iFCP).
7.3 Identify techniques and processes to be used as part of a business continuance solution
Host-based replication: LAN-based replication: SAN-based replication: CDP (Continuous Data Protection)
7.4 Explain how to perform data transfers, migrations, and replications

Synchronous replication: source and target both need to acknowledge data transfer, before application is being notified. Asynchronous replication: source acknowledges write and notified application, afterwards data gets replicated to target device.
Resolving Fabric Merge Conflicts

Whenever two fabrics merge SDV merges its database. A merge conflict can occur when there is a run-time information conflict or configuration mismatch. Run-time conflicts can occur due to: Identical pWWNs being assigned to different virtual devices The same virtual devices are assigned different pWWNs The virtual device and virtual FC ID are mismatched A blank commit is a commit operation that does not contain configuration changes, and enforces the SDV configuration of the committing switch fabric-wide. A blank commit operation resolves merge conflicts by pushing the configuration from the committing switch throughout the fabric, thereby reinitializing the conflicting virtual devices. Exercise caution while performing this operation, as it can easily take some virtual devices offline. Merge failures resulting from a pWWN conflict can cause a failure with the device alias as well. A blank commit operation on a merge-failed VSAN within SDV should resolve the merge failure in the device alias. You can avoid merge conflicts due to configuration mismatch by ensuring that:
16 sur 17
14/01/2012 18:37
The pWWN and device alias entries for a virtual device are identical (in terms of primary and secondary). There are no virtual device name conflicts across VSANs in fabrics. Zoning conflict parameters When merging two fabrics, zoning information from the two previously separated fabrics is merged as much as possible into the new fabric. Sometimes, zoning inconsistency can occur and zoning information cannot be merged. Segmentation due to zoning will usually be flagged by an error message that says "Fabric segmented, zone conflict" appearing in the error logs. One of the solutions is to make sure zoning information on both switches is consistent before bringing up the ISL. Upgrading firmware on Brocade switches: The internal process will be as follows 1. firmware -s download command is entered, and you respond to prompts. 2. Firmware is downloaded to Secondary Partition 3. Primary and Secondary boot pointers are swapped 4. CP boots from firmware in new Primary partition. Say no to autocommit and yes to reboot after download. After a few days of cool operation, run the firmwareCommit command and then the new firmware is copied to the seconday partition as well. http://www.cisco.com/en/US/products/ps5989/prod_troubleshooting_guide_chapter09186a008067a309.html Sources used: http://www.scsita.org/aboutscsi/sas/tutorials/SAS_General_overview_public.pdf http://www.directron.com /ncqvstcq.html
17 sur 17
14/01/2012 18:37

Basic Storage Networking Technology Certified Storage Engineer (Scse, s10-201)

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Basic Storage Networking Technology Certified Storage Engineer (Scse, s10-201)

Uploaded by

Copyright:

Available Formats

Study Guide / Book for SNIA Certified Storage Engineer (SCSE, S10-201) http://www.rootkit.nl/files/book_snia_certified_storage_engineer_s10-...

SNIA Certified Storage Engineer (SCSE) book / study guide (S10-201)

Define differences between serial and parallel approaches within a configuration

1.2 Describe Array Technology/Virtualization

1.3 Define SAS and SATA technology

Identify a legal vs. illegal SAS topology layout

Explain the routing mechanism that occurs in a SAS expander topology

2. Perform Storage Networking Administration (24%)

See See See See See

note note note note note

3,6 below 3 below 3 below 5 below 3,4 below

2.3 Define troubleshooting methodologies and tools within scenarios

Explain reasons to add or remove Inter Switch Links (ISLs)

Analyze port log-in, fabric log-in and process log-in

Isolate bandwidth issues and errors related to time outs

Identify process to add a configured switch to an existing fabric

Set time out values, buffer-to-buffer settings

Set communications mode between two fabrics

Validate interoperability among vendors

Validate domain IDs on switches

Connect switch to a fabric

2.5 Identify results of ISL oversubscription

2.6 Create/configure and modify zone sets

Implement zoning for single server and cluster applications

Create backup of zone database prior to zone modification

Configure zones within a redundant fabric

Explain how zone is stored and distributed throughout the fabric

Explain the possible zoning conflicts that cause fabric segmentation

Perform fabric merge without zoning conflict

Explain instances of zone name clash

Configure active zone sets

3. Manage Storage Networks (21%)

Discriminate among the components, characteristics and functions

Create volumes in NAS environment

Contrast scalability issues between SAN and NAS

3.2 Describe Configuration Management Elements

Explain HBA Configuration Management Elements

Construct host-side configuration of HBAs

Identify Virtual HBA (e.g., iSCSI, VN Port)

Define OS-based technology concepts

3.3 Explain Change Management Process (ITIL)

3.4 Optimize redundancy within a switched environment

Determine impact of removing an ISL (e.g., degraded performance)

3.8 Using scenarios, illustrate common blocking problems to fabric merge

Selection of switch as primary (e.g., lowest worldwide name)

Validation of zone set

Validation of active zone library

3.11 Create or modify zone sets using best practices

Validate switch modes are set to be the same

Verify ISLs are working correctly

<< switch_one << switch_two

Possible causes: Length of cabling GBIC issue Dirty SPF

More information: Brocade portErrShow.pdf

4. Perform Data Protection and Recovery (14%)

4.1 Describe the different back-up and restore configurations

Explain ways to maximize user time and minimize back-up window

4.3 Ensure Fibre Channel Security

Show how to implement port authentication protocols

Perform processes to secure a fabric

4.4 Explain how to recover a clustered storage configuration

5. Implement Storage Networks (17%)

Describe technical benefits and limitations of the different RAID levels

Explain fan-in and fan-out ratios