You are on page 1of 97

SAN Troubleshooting

Rene Burema Brocade Communications March, 2008

Product Knowledge is Valuable


Problem determination requires you to be able to identify Products, associated port numbers, and LED status Switch and port status License requirements Related compatibility information

Available resources include Brocade FOS Documentation Brocade Connect and/or Brocade Partner Sites Training materials including Products, FRUs and LEDs (Webbased training module associated with this course) Brocade switch provider information including compatibility matrices

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

Common SAN Problems


Many common SAN problems are related to - in alphabetical order Configuration - Port, device, switch is not correctly configured Problems accessing a switch or connecting switches or end devices can be related to configuration problems

Firmware Download - FTP configuration and release.plist confusion Licensing - Customers do not have the license to do what they are attempting Problems connecting switches can be related to licensing problems Problems related to performance or problems that occur when connecting switches or end-devices can be related to marginal links Problems that occur when end-devices are not able to access each other can be related to zoning Marginal Links - Bad or marginal cables/GBICs/SFPs

Zoning - Zoning is not configured correctly

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

What does the switch status tell you?

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

What can port status LEDs tell you?

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

Adding/Replacing a Switch in a Fabric and Resolving Fabric Segmentations

When to Add or Replace a Switch


Faulty hardware
Components on a switch that are not FRUs Motherboard, including FC ports Damaged chassis

Upgrading to new hardware


2 Gbit/sec to 4 Gbit/sec Port density Increased availability New features: FCR, FCIP, iSCSI Replacing EOL hardware

Growing your fabric


Increased port density per switch Increased number of switches

March 2008

Whenever your switch provider recommends


SAN Troubleshooting Basics
2008 Brocade Communications Systems, Inc. All rights reserved.

Adding or Replacing a Switch

Any switch added to an existing fabric must be configured properly LAN configuration information Fabric configuration information Your configuration plan should include a checklist that answers the following questions:
Special port configurations required? Are the correct license keys installed? What versions of firmware are running in the fabric? Will you be using any additional capabilities i.e. ACLs, ADs, FCIP?

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

Adding or Replacing a Switch (cont.)


Clear previous configuration from the switch
Zoning: cfgdisable; cfgclear; cfgsave Switch configuration: configdefault

Gather all required information for new or replacement switch using a switch connection checklist Configure new or replacement switch to join an existing fabric

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

Methods for Configuring


Use appropriate Fabric OS commands or Web Tools to configure the new or replacement switch Use configdownload command to copy a previously saved back up file to a new or replacement switch and also restore a configuration to an existing switch Fabric Manager baseline utility can copy the configuration of another switch or a previously saved configuration file

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

10

10

Merging Two Fabrics


Successful merge will create a single fabric with four switches

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

11

11

Fabric Segmentation
Fabric segmentation is generally caused by one of the following conditions:
1. 2. 3. 4. 5. 6. 7. Licensing problems: Switches segment due to value line license limitations Zoning conflicts: The zoning configuration in both fabrics cannot be merged Admin Domain (AD) conflict: The AD configuration and/or AD zoning configurations cannot be merged Fabric parameters conflict: fabric.ops parameters do not match Port parameters conflict: ISL port settings are not compatible. FCIP tunnel settings must match. Domain ID overlap: Two or more switches have the same domain ID Access Control List (ACL): If configuration is strict all switches must comply

In addition, all switches in a fabric with user-defined ADs 1-254, ACLs, and/or a zoning database size greater than 256K must support the Reliable Commit Service (RCS) protocol
SAN Troubleshooting Basics
2008 Brocade Communications Systems, Inc. All rights reserved.

March 2008

12

12

Identify Fabric Segmentations


Primary sources for identifying fabric segmentations switchshow output
E_Port state will identify the state of all E_Ports possible segmentations errors are: Domain Overlap, Zone Conflict or Op Mode Incompatible errshow and errdump will capture fabric segmentation events Lists all the criteria that is exchanged during the ELP process and flags any parameter that is mismatched between the two switches Fabric merge check will identify a fabric segmentation cause

Switch error logs


fabstatsshow output

Fabric Manager

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

13

13

switchshow Output
RSL1_ST01_B20:admin> switchshow switchName: RSL1_ST01_B20 switchState: Online switchMode: Native switchRole: Principal switchDomain: 1 switchId: fffc01 switchWwn: 10:00:00:05:1e:02:12:2c zoning: ON (lab1) Area Port Media Speed State ============================== 0 0 id N4 Online F-Port 10:00:00:00:c9:53:c6:c5 1 1 id N2 Online E-Port segmented, (domain overlap) (Trunk master)

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

14

14

Error Logs Capture Segmentation Events


RSL1_ST01_B20:admin> errshow r Fabric OS: v5.1.0c 2006/08/15-11:52:12, [FABR-1001], 204,, WARNING, RSL1_ST01_B20, port 1, domain IDs overlap 2006/08/15-11:45:57, [FABR-1001], 203,, WARNING, RSL1_ST01_B20, port 1, incompatible VC translation link init, ensure it is set to 1 (2) 2006/08/15-11:37:54, [FABR-1001], 202,, WARNING, RSL1_ST01_B20, port 1, Zone conflict

RSL1_ST10_B41:admin> errshow r Fabric OS: v5.2.0a 2007/01/31-12:50:27, [FABR-1001], 4,, WARNING, rsl1_st10_b41_1, port 8, ELP rejected by the other switch
March 2008 SAN Troubleshooting Basics
2008 Brocade Communications Systems, Inc. All rights reserved.

15

15

fabstatsshow Output
RSL1_ST01_B20:admin> fabstatsshow Description domain ID forcibly changed: E_Port offline transitions: Reconfigurations: Segmentations due to: Loopback: Incompatibility: Overlap: Zoning: E_Port Segment: 0 8 < Identifies mismatch 0 0 0 Count 0 7 (Last on port 14) 6 -----------------------------------------

What parameters would you compare next? fabric.ops


March 2008 SAN Troubleshooting Basics
2008 Brocade Communications Systems, Inc. All rights reserved.

16

16

Licensing Conflicts
Switches can be purchased with value line licenses
A value line 2 license enables the switch to exist in a two domain fabric A value line 4 license enables the switch to exist in a four domain fabric

Prior to Fabric OS v3.1.2/4.2 value line licensed switches in fabrics that exceeded the allowable number of domains segmented After Fabric OS v3.1.2/4.2 value line licensed switches in fabrics that exceeded the allowable number of domains have a grace period
The switch is allowed to join the fabric but Web Tools access is disabled after 45 days

quietmode on:

The following messages continuously display at the CLI even with

0x102b9f00 (tFcph): Jan 31 18:44:15 CRITICAL FABRIC-SIZE_EXCEEDED, 1, Critical fabric size (3) exceeds supported configuration (2). Switch status marginal. Contact Technical Support. 0x102b9f00 (tFcph): Jan 31 18:44:15 CRITICAL FABRIC-WEBTOOL_LIFE, 1, Webtool will be disabled in 44 days 23 hours and 50 minutes

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

17

17

Identify Zoning Conflicts


There are three general types of zoning conflicts: Type 1. Configuration mismatch: the enabled zone configurations are different
Fabric A: cfgcreate "cfg4", "Red_Zone" Fabric B: cfgcreate "cfg4", "Red_Zone; Blue_Zone"

sw4100:admin> cfgshow Defined configuration: <truncated output> Effective configuration: cfg: cfg4 zone: Red_Zone; 1,4; 1,5

sw4900:admin> cfgshow Defined configuration: <truncated output> Effective configuration: cfg: cfg4 zone: Red_Zone; 1,4; 1,5 zone: Blue_Zone; 2,8; 2,11

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

18

18

Identify Zoning Conflicts (cont.)


Type 2. Type mismatch: The name of a zone object (alias, zone, cfg.) in one fabric is used for a different zone object in the other fabric
Fabric A: alicreate Device1, 1,1 Fabric B: zonecreate Device1, 1,1; 2,3

sw4100:admin> cfgshow Defined configuration: <truncated output> alias: Device1 1,1 <truncated output> Effective configuration: No effective configuration

sw4900:admin> cfgshow Defined configuration: <truncated output> zone: Device1 1,1; 2,3 <truncated output> Effective configuration: No effective configuration

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

19

19

Identify Zoning Conflicts (cont.)


Type 3. Content mismatch: The definition of a zone object in one fabric is different from a zone object with the same name in the other fabric (including the order of the zone members)
Fabric A: zonecreate Green_Zone, 1,1; 2,3 Fabric B: zonecreate Green_Zone, 2,3; 1,1

sw4100:admin> cfgshow Defined configuration: <truncated output> zone: Green_Zone 1,1; 2,3 <truncated output> Effective configuration: No effective configuration

sw4900:admin> cfgshow Defined configuration: <truncated output> zone: Green_Zone 2,3; 1,1 <truncated output> Effective configuration: No effective configuration

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

20

20

Identify Zoning Conflicts


Begin by running the switchshow and errshow commands
Segmentations caused by zoning conflicts are noted as such

sw4100:admin> errshow -r Fabric OS: v5.1.0c 2006/08/15-11:37:54, [FABR-1001], 202,, WARNING, sw4100, port 1, Zone conflict

To identify zoning conflict cause, perform the following actions on both fabrics:
Display the current zone configuration in both fabrics (cfgshow) Review the zone configurations in both fabrics for configuration, type, and content mismatches Verify that the Advanced Zoning license is installed (licenseshow)

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

21

21

Identify Zoning Conflicts (cont.)


Use Fabric Manager 5.2 Fabric Merge to check and analyze and Offline zoning management tool to correct
Copy the existing zoning configuration from an installed switch, and push it to the new switch.

defzone - check this setting before you connect


sw4100:admin> defzone --show Default Zone Access Mode committed - No Access transaction - No Transaction

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

22

22

Resolve Zoning Conflicts


Use Web Tools or zone editing commands to resolve the mismatches (ali*, cfg*, zone*, defzone*) To prevent zone conflicts clear the zoning database on the new/replacement switch, cfgdisable, cfgclear, cfgsave
Set defzone parameters to match existing fabric

Use Fabric Manager 5.2+ offline zoning capabilities

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

23

23

Incompatible Switch Parameters


Incompatible switch parameters are reported as incompatibility To verify the flow control settings without disrupting the fabric, run the configshow command in both fabrics and look at the fabric.ops parameters:
R_A_TOV fabric.ops.R_A_TOV E_D_TOV fabric.ops.E_D_TOV Data field size fabric.ops.dataFieldSize Disable device probing fabric.ops.mode.fcpprobedisable Suppress class F traffic fabric.ops.mode.noClassF Per-frame route priority fabric.ops.UseCsCtl BB credit fabric.ops.BBcredit Interop mode switch.interopMode PID format fabric.ops.mode.pidFormat Long distance fabric.ops.mode.longDistance

You can also review these values by uploading the switch configuration file with the configupload command or Fabric Manager baseline
SAN Troubleshooting Basics
2008 Brocade Communications Systems, Inc. All rights reserved.

March 2008

24

24

Incompatible Switch Parameters (cont.)


To change these values at the command line (disruptively): First, disable the switch (switchdisable) Next, use the Fabric parameters menu in the configure command
sw4100:admin> switchdisable; configure Configure... Fabric parameters (yes, y, no, n): [no] yes Domain:(1..239) [1] R_A_TOV: (4000..120000) [10000] E_D_TOV: (1000..5000) [2000] WAN_TOV: (0..30000) [0] MAX_HOPS: (7..19) [7] Data field size: (256..2112) [2112] Sequence Level Switching: (0..1) [0] Disable Device Probing: (0..1) [0] Switch PID Format: (1..2) [2] 1 Per-frame Route Priority: (0..1) [0] BB credit: (1..16) [16]

March 2008

Finally, re-enable the switch (switchenable)


SAN Troubleshooting Basics
2008 Brocade Communications Systems, Inc. All rights reserved.

25

25

Incompatible Port Parameters


Port-level parameters will cause a segmentation if not set to the same values:
Basic connections: Port speed, type, licensed, and enabled Long-distance connections: Long distance mode, VC Link Init, ISL R_RDY mode, and FCIP tunnel configurations
rsl1_st10_b41_1:admin> portcfgshow 8 Area Number: Speed Level: Trunk Port Long Distance VC Link Init Desired Distance Locked L_Port Locked G_Port Disabled E_Port ISL R_RDY Mode RSCN Suppressed Persistent Disable NPIV capability Mirror Port
March 2008 SAN Troubleshooting Basics

Verify the current settings by running the portcfgshow command


8 AUTO ON LS ON 40 Km OFF OFF OFF OFF OFF OFF ON OFF
2008 Brocade Communications Systems, Inc. All rights reserved.

26

26

Incompatible Port Parameters (cont.)


Fabric OS v5.2 Extended Fabrics long-distance modes were revised:
Modes L0, LE, LD, and LS are supported and can be configured on any FC port Modes L0.5, L1 and L2 are supported, but can not be configured

When upgrading from Fabric OS v5.1 to v5.2, what happens to ports set to mode L0.5, L1, or L2?
The long-distance mode is still displayed in command line output (switchshow, etc.), but modes L0.5, L1, and L2 cannot be configured To change the distance on these ports, use mode LD or LS

When connecting a Fabric OS v5.2 switch to a pre-Fabric OS v5.2 switch both ports on the link must have the same mode
Result: Use mode LS or LD

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

27

27

Incompatible Switch Parameters (cont.)


Change these settings with the following commands:
Port speed: portcfgspeed Reset to defaults: portcfgdefault Port type (L_Port only): portcfglport Port type (E_Port or F_Port only): portcfggport Port type (E_Port disabled): portcfgeport Port disable/enabled: portdisable, portenable Port persistently disabled/enabled: portcfgpersistentdisable, portcfgpersistentenable Long-distance mode, VC link initialization: portcfglongdistance ISL R_RDY mode: portcfgislmode

Verify settings are the same by invoking portcfgshow on both switches and comparing output
SAN Troubleshooting Basics
2008 Brocade Communications Systems, Inc. All rights reserved.

March 2008

28

28

Domain ID Conflicts

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

29

29

Domain ID Conflicts (cont.)


Duplicate domain IDs are reported as Domain Overlap or Overlap. To resolve domain ID conflicts, follow these steps:
In each fabric, display the assigned domain IDs with the fabricshow or switchshow command Review the command output, and determine those switches whose domain ID must be changed Disable the switch (switchdisable), run the configure command to change the domain ID manually, then enable the switch (switchenable) The switch will now join the fabric with the unique domain ID you assigned

Option: set Insistent domain ID (required for FICON)


SAN Troubleshooting Basics
2008 Brocade Communications Systems, Inc. All rights reserved.

March 2008

30

30

End Device Troubleshooting

31

Run supportsave Before and After


Run supportsave as soon as you experience a problem in your SAN Critical data will be captured if supportsave is run right away Run supportsave prior to all problem determination steps If unable to resolve during problem then run supportsave again

If you have to escalate problem send escalation team both supportsave files

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

32

32

End Device Troubleshooting


End device troubleshooting requires the following: Is there light from the host or device? A powered off or failed device may not provide light. Without light there will never be a login. Does the switch port speed configuration match the attached device speed configuration? Devices and switch ports typically autonegotiate. Verify that the switch port is not locked to a speed the device cannot handle. Are the transmission characters synchronized with the switch port? How far has the login process progressed? Did the device log in properly as a loop and/or fabric device? Are the FOS v5.2 ACLs, specifically Device Connection Control (DCC) policies, preventing device from receiving a response to a login?

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

33

33

End Device Troubleshooting (cont.)


With the maturation of Fibre Channel, most devices login as point-topoint via a Fabric Login (FLOGI). Has this occurred? Even if the device logs in as loop, it should still proceed to the FLOGI stage to get a Public Loop Address (24-bit address)

If the end device logs in as loop or Fabric, it will be assigned a 24-bit address Until then, it has no source ID (SID) with which to initiate communication in the fabric

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

34

34

End-to-End Device Connectivity


Use LLFD to Divide and Conquer

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

35

35

End-to-End Device Connectivity (cont.)


Link, Login, Fabric, Devices
Link Physical and logical connection of device to switch Transmission of light/signal Negotiation of speed Synchronization of characters and words
Loop/Fabric initialization primitives

Login Device to switch connectivity FLOGI to Fabric Port (FFFFFE) Security Policy Check Device Connection Control POLICY (DCC_POLICY) Access Control List (ACL);
Switch responses:
Accept: Assign fabric unique 24-bit address No response: Do not assign fabric address

Port Login (PLOGI) to Name Server (FFFFFC)

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

36

36

End-to-End Device Connectivity (cont.)


Link, Login, Fabric, Devices
Fabric Name Server Registration (FFFFFC)
Device registers to local Name Server Name Server is distributed within the fabric If user-defined Virtual Fabric Admin Domains (ADs) are enabled, the Name Server will only show devices within the current AD

AD255 is the Physical Fabric view AD0-AD254 will have a filtered view of the Name Server Device attribute data may be registered:
Device Model and Vendor Firmware and Driver revisions Host name Initiators register using State Change Registration (SCR) Initiators receive notifications by Name Server of Registered State Change Notifications (RSCNs)
SAN Troubleshooting Basics
2008 Brocade Communications Systems, Inc. All rights reserved.

SCR and RSCN to Fabric Controller (FFFFFD)


March 2008

37

37

End-to-End Device Connectivity (cont.)


Link, Login, Fabric, Devices
Devices Initiator queries Name Server for available devices
Response contains devices within the effective zone configuration FC devices are Type 8 (FCP) Devices must successfully be logged into the fabric to exist within the Name Server Initiators are zoned with targets

Initiator PLOGI to each target device, based upon Name Server query results Process Login (PRLI) from initiator to target(s)
Provides the end-to-end connectivity for device communication

Issue Report LUNs and Inquiry to each available device

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

38

38

Troubleshooting End-to-End Device Connectivity


Start at the switch The switch contains a wealth of information concerning the condition of the fabric:
Devices that are logged into the fabric Devices registered within the Name Server Which devices are within the same zone

Dont forget about LUN Masking and Persistent Binding Storage array may implement LUN Masking
Initiator WWN (Port or Node) presented to array properly? Correct LUNs made available to initiator by array?

HBAs may use Persistent Binding to specify LUN WWN or 24-bit PID to OS device mapping
Target LUN WWN (Port or Node) or PID specified correctly in host file(s) May require entry for new or replaced target LUNs

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

39

39

Troubleshooting End-to-End Device Connectivity (cont.)


If previous steps have been verified, there should be end-to-end device connectivity and communication If there is no communication between end devices, use CLI commands to determine where the problem exists. Verify connectivity through the SAN first. If everything looks correct from switch CLI commands, use storage and host specific message logs and commands to isolate problems to the end point (initiator or target)

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

40

40

Troubleshooting Starts with switchshow


The first command to enter when you start troubleshooting is switchshow. That shows whether:
Switch is online SFP is installed in each port Port licensing e.g. Ports-On-Demand (POD) End devices are online

For remote devices, there are several commands to choose from, but start with nscamshow
Tells if remote devices are seen within the fabric.
Name Server (ns*) commands are filtered by ADs in FOS v5.2+ If ADs are implemented, select AD255 (Physical Fabric View): rsl1_st15_b20_1:admin> ad --select 255

Next get a view of the fabric configuration with cfgshow or just get a supportsave
Super command script file. It gets all these commands and more!

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

41

41

Light/Signal
Fibre Channel Layer 0 connectivity
The actual light transmitted and received over FC cabling Use switchshow command to verify light/signal is being transmitted from a device. Use portflagsshow to see if LED is seen. Additionally use sfpshow to verify SFP is not faulty

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

42

42

Light/Signal (cont.)
Successful light (still no speed/synchronization) output examples Use output of switchshow, portshow, and portflagsshow to verify light is being received:

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

43

43

Link Speed Negotiation


Speed Negotiation
Device and switch use special transmission characters to agree upon a transfer speed of 4 Gbit/sec, 2 Gbit/sec, or 1 Gbit/sec Speed negotiation starts with the highest possible speed and negotiates down until a speed is agreed upon or the lowest possible speed is attempted without success

CLI output information associated with the port when speed negotiation is successful:
switchshow: port speed will display the speed1 and State will display Online portshow: port speed will display configured or negotiated speed portflagsshow: Physical command column output field will display No_Sync or In_Sync

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

44

44

Link Speed Negotiation (cont.)


Unsuccessful Speed Negotiation switchshow <truncated output>
1 1 id 2G No_Sync

portshow 1 | grep portSpeed


portSpeed: 2Gbps

portflagsshow <truncated output>


1 Offline No_Sync PRESENT

Ensure port is set to default values: portcfgdefault 1 Or manually set port to auto negotiate speed: Use portcfgspeed 1 0

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

45

45

Physical Connectivity
Physical connectivity between a device and a switch port includes light/signal, speed, and link negotiation processes After speed negotiation the connecting points have to synchronize Devices can get into a condition defined as marginal when they go into and out of sync Commands that help identify this issue include
porterrshow The errshow output may also have relevant output

Fabric Watch can greatly augment the event reporting found in the error log (RASLog)

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

46

46

Physical Connectivity (cont.)


porterrshow The porterrshow command is very helpful for getting a picture of all ports and their associated error and link related counters Using this information, you can quickly isolate problems down to a specific port A Marginal link is defined as a degraded physical connection; it is not optimally passing data
The porterrshow, portstatsshow, and portshow output display counters that help monitor marginal ports Symptoms include poor performance and occasional loss of connectivity

A delta of the counters can help you isolate a problem to a port and/or the connected HBA or Storage device
Note that you can clear the port counters using portstatsclear on a per-port/port-group basis (granularity is dependent on FOS version) The link counters cannot be cleared without a reboot

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

47

47

Physical Connectivity (cont.)


Use the porterrshow command for initial investigation of marginal links

portstatsclear can be used to clear port errors on error statistics to left of the dotted line. The other counters get cleared on a reboot/fastboot.
March 2008 SAN Troubleshooting Basics
2008 Brocade Communications Systems, Inc. All rights reserved.

48

48

Physical Connectivity (cont.)


Granularity on ports with high error counters: porterrshow
Less granularity Good for quickly identifying port(s) of interest

portstatsshow
Good for monitoring exact values of counters

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

49

49

Error Counters
Certain port counters can point to physical link layer issues: enc_in: This counter increments when 8b/10b encoding errors are detected within a frame. enc_in errors are always detected on the ingress port. crc_err: Indicates corruption within the frame. Always seen on ingress port but will be passed by the switch unaltered through the fabric (like a trail of bread crumbs). enc_in and/or crc_err = Possible bad media (SFP, cable, patch panel)

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

50

50

Error Counters (cont.)


enc_out: 8b/10b encoding errors NOT associated with frames (IDLE, R_RDY, and various other primitives). This counter increments during speed negotiation prior to login. Locking a port to a speed supported by the end device can be used to isolate issues.
Possible bad media (SFP, cable, patch panel) Can cause a performance problem due to buffer recovery

disc_c3: Class 3 frame has been discarded because it is not routable to a destination address
Corrupted or not-online Destination ID (DID) Timeout exceeded (Condor ASIC hold time exceeded) Counter may increment when FC nodes and/or switches rapidly transition between online and offline; look at fabriclog s output (described in the Logical Connectivity slide later)

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

51

51

Link Counters
These are point-to-point errors; they do not propagate through the fabric Link failures - error conditions that cause a port to drop out of an active state
Requires the reconnecting device to FLOGI back into fabric (No speed negotiation required, since the device does not lose synchronization)

Loss of sync - occur when bit and word synchronization on link is lost Loss of signal occur when light or an electrical signal is lost on a link
Require connected device to renegotiate speed and FLOGI back into fabric

If you experience device connectivity and/or performance issues and rising link counters look for
bad cables/SFPs/patch-panel connections repeating cycles of online/offline states in fabriclog -s output

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

52

52

Device Initialization into Fabric

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

53

53

Device Initialization - Port Configuration


Device initialization could be affected by port configuration portcfgshow display port status

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

54

54

Port Configuration (cont.)


switchshow display login status; F/L/E or G:
1 1 id N1 Online G-Port

portcfglport Lock port to L-Port to force Loop Initialization prior to FLOGI


portcfglport <port> <0|1>

portcfggport Lock to G-Port if HBA/storage has difficulties negotiating initial Loop Initialization
portcfggport <port> <0|1>

portcfg mirrorport A port configured as a mirror port will prevent HBA/storage login
portcfg mirrorport <[slot/]port#> --enable Disable mirror port configured to connect a device portcfg mirrorport <[slot/]port#> --disable

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

55

55

Login Services
Three different levels of login: Fabric Login (FLOGI) is used by an N_Port or NL_Port (Nx_Ports) to establish service parameters with the switch
The following information is implicitly captured and put into the Name Server during this process: type; COS; PID; PortName (port WWN) ; and NodeName (node WWN)

N_Port Login (PLOGI) is used by one Nx_Port to establish service parameters with another N_Port or NL_Port Process Login (PRLI) is used by an upper-level process in one port to establish image pairs and service parameters with the corresponding upper-level process in the other port
For example, it can be used to establish the environment between related SCSI processes on an origination Nx_Port and a responding Nx_Port

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

56

56

Fabric Login (FLOGI)

When devices 1st connect, their address is 000000 (unless they are loop devices, then their address will be 0000pp) FLOGI is required before any frame can be sent thru the fabric FLOGI is sent to well-known address FFFFFE (Fabric F_Port)

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

57

57

Commands to Check FLOGI Status


switchshow A successful login displays an F_Port (including its WWN) or L_Port portshow A successful login displays fabric viewpoint of device
portFlags - a bit map and English translation of the ports login process portState - Online portPhys - In_Sync, receiving light and synchronized portId - 24-bit Fabric Address, port identifier (PID) of device portScn - F_Port, from the fabrics point of view all end devices that successfully logged in are F_Ports port WWN(s) of connected device(s) - an F_Port will have one WWN; an FL_Port can have multiple WWNs Distance and Speed Configuration of the port

portflagsshow Lists the translation of all port login state flags; same as portshow portFlags output

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

58

58

portshow

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

59

59

portstatsshow BB Credit

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

60

60

portcamshow
Hardware enforced SID/DID zone tables are kept in ASIC
portcamshow <port>

Out of CAM Entries Changes to Session-Based zoning


Resource issue - not an actual error condition

portzoneshow undocumented/unsupported command


Displays type of zoning (Hard, Session based) for each port

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

61

61

Logical Connectivity fabriclog -s


fabriclog s supersedes the fabstateshow command
Use it to check for port Online/Offline transitions:

Port 1 transitioned from Offline to Online multiple times Check physical connectivity for bad cable, SFPs, patch-panel, etc.

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

62

62

Fabric Name Server


Successful port login and registration to Name Server

A port login (PLOGI) to the Name Server can be confirmed by looking at the Name Server information Verify using the nsshow command Unsuccessful port login means no information within the Name Server

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

63

63

Fabric Name Server (cont.)


Check for successful port login with -t option: device is an Initiator or Target

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

64

64

State Change Notification Services


State Change Notification (SCN) - State Change Notifications (SCN) are used for internal state change notifications, not external
This is the switch logging that the port is online or is an Fx_port This is not sent from the switch to the Nx_ports!

State Change Register (SCR) Nx_Port request to receive notification when something in the fabric changes
FC Devices that choose to receive RSCNs must register for this service
Devices send a State Change Registration (SCR) to FFFFFD Registration indicates that the device wants to be notified of changes

Devices register after PLOGI to Name Server

Registered State Change Notification (RSCN) - issued by the Fabric Controller Service or an Nx_Port to devices that registered (issued an SCR requesting this notification) only sent to devices within an affected zone

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

65

65

Fabric Controller Services


The Fabric Controller (FFFFFD) Service alerts device that changes have occurred in the fabric by sending a Registered State Change Notification (RSCN) if:
Device registered to receive RSCN using an SCR A new device has been added (within the same zone) An existing device has been removed (within the same zone) A zone has been changed A switch name or IP address changed The fabric reconfigured

Registration is optional
SCSI initiators normally register SCSI targets do not register

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

66

66

Changes Within the Fabric


Properly written device drivers will do the following in response to an RSCN:
Query the Name Server for changes related to devices they are (or were) currently logged into Initiate a port login for any new devices the Name Server has notified them of within their Virtual Fabric zoning configuration

Sometimes it isnt a device driver issue. Applications can fail if their I/O is not satisfied quickly. (Quickly is a relative term.)
If necessary, FOS gives the ability to suppress RSCNs per port:

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

67

67

Device Identification Commands


Use switchshow, nsshow, nscamshow, nsallshow, and nodefind to identify devices in the fabric nsallshow lists all 24-bit PID addresses within the current fabric (Name Server view of current AD) nodefind lists Name Server information for:
Specified Alias Specified WWN Specified PID address

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

68

68

Devices - End-to-End Connectivity


End-to-end device connectivity communication could be blocked on the switch by:
Zoning AD configuration Commands to check include: fcping, cfgshow, and ad --show

End-to-end device connectivity flow


Nx_Port to Nx_Port communication Initiator to target (similar to SCSI model) PLOGI/PRLI from Nx_Port to Nx_Port

Name Server Query


Initiators learn about devices of interest, based upon FC4 layer type (5 or 8): where 8 = FCP/SCSI, 5 = IP over FC

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

69

69

End-to-End Device Connectivity (cont.)


Use the fcping command to check for end device connectivity and zoning Response when device is not online:
rsl1_st15_b41_1:admin> fcping 0x1400e8 0x0a0100 fcping: Error destination port invalid

Response when devices are online; but one does not respond to the fcping ELS ECHO frame:
rsl1_st15_b20_1:admin> fcping 0x0a0000 0x1400e2 Source: 0xa0000 Destination: 0x1400e2 Zone Check: Not Zoned Pinging 0xa0000 with 12 bytes of data: received reply from 0xa0000: 12 bytes time:650 usec <truncated output> 5 frames sent, 5 frames received, 0 frames rejected, 0 frames timeout Round-trip min/avg/max = 567/618/674 usec Pinging 0x1400e2 with 12 bytes of data: Request timed out <truncated output> 5 frames sent, 0 frames received, 0 frames rejected, 5 frames timeout Round-trip min/avg/max = 0/0/0 usec

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

70

70

Device to Device Login


Dont forget, devices do not only log into the fabric. Initiators will initiate PLOGIs and PRLIs to other end devices after:
Each device is Online in the switch database Each device has registered with the Name Server Devices are zoned together and within the same Virtual Fabric Administrative Domain (AD)

The mechanism for devices to login to each other through PLOGI is the same as used for device to switch login The switch acts as a middle-man
Passing PLOGI/PRLI requests and ACCEPT responses or Discarding such requests if the devices are not zoned together or in the same AD

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

71

71

Port Configuration End-to-End


Check port configuration for end-to-end device connectivity Use nszonemember as a final step to verify that:
End devices have logged into Name Server, are Online, and are zoned together within the same AD
rsl1_st15_b20_1:admin> nszonemember 0x0a0100 1 local zoned members: Type Pid COS PortName NodeName SCR N 0a0100; 2,3;10:00:00:00:c9:22:1f:23;20:00:00:00:c9:22:1f:23; 3 FC4s: FCP NodeSymb: [30] "Emulex LP8000 FV3.90A7 DV6.02h" Fabric Port Name: 20:01:00:05:1e:02:0c:77 Permanent Port Name: 10:00:00:00:c9:22:1f:23 Device type: Physical Initiator Port Index: 1 Share Area: No Device Shared in Other AD: No

output continued on next slide

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

72

72

Port Configuration End-to-End (cont.)


Check port configuration for end-to-end device connectivity
(nszonemember 0x0a0100 output continued) 1 remote zoned members: Type Pid COS PortName NodeName NL 1400e8; 3;21:00:00:04:cf:92:6a:58;20:00:00:04:cf:92:6a:58; FC4s: FCP PortSymb: [28] "SEAGATE ST318452FC 0004" Fabric Port Name: 20:00:00:05:1e:02:aa:7b Permanent Port Name: 21:00:00:04:cf:92:6a:58 Device type: Physical Target Port Index: 0 Share Area: No Device Shared in Other AD: No

Verifies end-to-end zoning within the fabric

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

73

73

When to use an Analyzer?


When all devices are logged into the fabric, zoning is configured properly, and hosts do not see their targets When there are I/O disruptions that cannot be isolated with RASLog (errdump) or porterrshow/portstatsshow When a problem exists within the payload of a transfer To monitor the health of a system for error statistics and performance problems (the switch also has relevant built-in diagnostic capabilities) To diagnose protocol problems
A complete look at the FC header and payload Capture end-to-end protocol information (including ULPs) An FC analyzer can be installed between the switch and the gateway at each end
Is the transmission the same as the reception? Can bit char word sync be established?

To troubleshoot extended Fabric communication

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

74

74

Port Mirroring - Configuration


Decide location of mirror port; on same ASIC as SID or DID port Login to the physical fabric using an Admin role account Follow these steps to use port mirroring to capture a FC analyzer trace:
1.

Configure the port as a mirror port by invoking the following command: portcfg mirrorport <[slot/]port#> --enable
Verify the configuration, invoke portcfgshow <[slot/]port#> and switchshow

2. 3.

Connect a FC Analyzer to the mirror port and verify that it comes online Configure port mirroring connection between the SID & DID thru the mirror port portmirror --add <mirrorportnumber> <SourceID> <DestID>
The mirror port must be online Verify mirror connection, invoke portmirror -show

4. 5. 6.

Start FC Analyzer capture, reproduce problem, stop capture and review output Remove the port mirror connection with the portmirror --delete command: portmirror --delete <mirrorportnumber> <SourceID> <DestID> Remove the mirror port configuration (to allow other connections to this port): portcfg mirrorport <[slot/]port#> --disable

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

75

75

Gathering Switch Support Data for Problem Determination and Escalation

76

Switch Support Data - Overview


Up to this point, we have gathered details about a switch by running one CLI command at a time For long-term support of a switch, we need to begin gathering switch support data
Larger, file-oriented data that provides a broader view of the switch Configuration of parameters State of FRUs and ports, both currently and in the past

There are several different types of switch support data that can be collected from a Brocade switch, router, or Director:
Switch error logs (RASLogs) Audit logs FFDC files Panic dump and core files Trace dump files

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

77

77

RASLog - Overview
Starting in Fabric OS v4.4, the System Message Log began to be called the Reliability, Availability, and Serviceability Log (RASLog) RASLog error messages are defined in one of two groups
External messages CRITICAL, ERROR, WARNING, and INFO can be viewed by admin-level users Internal messages - DEBUG and PANIC can not be viewed by adminlevel users

There is one RASLog stored in persistent memory


Up to 1024 external messages stored in a non-volatile circular buffer In blade-based switches, each CP maintains a separate RASLog

In Fabric OS v5.1+, certain security- and zoning-related commands cause an AUDIT flag to be added to error messages

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

78

78

RASLog - Standard Message Format


Fabric OS v4.4+ error messages follow a standard format:
Start Delimiter (customizable): Start Date (including year) and Time: 2006/03/08-11:59:32 Message Module and Numeric Instance: ZONE-3006 Sequence Number: 9 Audit Flag: AUDIT or FFDC (added in Fabric OS v5.1) Severity Level (one of four levels): INFO Switch Name: NDA-ST01-B48 Error description: User: admin, Role: admin, Event: cfgdisable, Status: success, Info: Current zone configuration disabled. End Delimiter (customizable): End

Start 2006/03/08-11:59:32, [ZONE-3006], 9, AUDIT, INFO, NDAST01-B48, User: admin, Role: admin, Event: cfgdisable, Status: success, Info: Current zone configuration disabled. End

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

79

79

RASLog - Management
Use the following commands to view the RASLog associated with external messages:
Display all external messages in the error log with no line breaks errdump (default display order: least-recent to most-recent) Display all external messages in the error log with line breaks - errshow (default display order: least-recent to most-recent) Use errdump/show -r to display error messages in reverse order: most-recent to least-recent Clear all internal and external messages from the error log with Admin level errclear command

Forward RASLog and Console log entries to a syslogd daemon on a host computer (syslogdipadd)
Especially important on dual-CP systems as host computer logs maintain a single, sequentially ordered, merged file for both CPs

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

80

80

Audit Log - Overview


The RASLog was designed to capture abnormal, error-related messages not highfrequency AUDIT events In Fabric OS v5.1 and earlier, error messages and AUDIT events are sent to the RASLog In Fabric OS v5.2+, error messages go to the RASLog, and all AUDIT events go only to a new Audit Log

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

81

81

Audit Log Overview (cont.)


The new Audit Log is designed for post event audits, and problem determination
Captured per Virtual Fabric AD Configurable (off by default) Who (user), when (timestamp), what (SAN component), and which AD Event type Other event-specific information (description) Format consistent with DMTF standard

For a given event it captures

AUDIT messages are always sent to the console, and can be configured to go to syslog servers
82

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

82

Audit Log - Details


Fabric OS v5.2+ continues to audit all Fabric OS v5.1 AUDIT messages
Secure Fabric OS configuration Security related: SSL, RADIUS, Zone, and password strengthening configuration configdownload (not configupload) firmwaredownload start, complete, and error messages encountered during download User initiated security events related to ACLs Fabric events related to command execution in other ADs (ad --exec)

Fabric OS v5.2+ can also be configured to audit these tasks:


In an AD-aware fabric, Audit Log configuration is done from AD255 Commands involved in configuring the Audit Log include:
auditcfg to enable auditing and define what gets audited (filters) syslogdipadd to specify IP address of syslog server configured to receive audit messages

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

83

83

FFDC - Overview
To minimize requests for problem recreation from certain Brocadedefined events, Fabric OS captures First Failure Data Capture (FFDC) data
Goal: Allow Brocade engineers to gain insight into problems that are transient, difficult-to-recreate, or difficult-to-solve Triggered by error MSG_IDs that are selected by Brocade engineering Messages are written to the console and the error log with an FFDC flag

Automatically collects supportshow-like information (based on CLI commands) as readable text when the selected event occurs
A single FFDC event may create one or more FFDC files Up to 4 MB for all FFDC files combined (if max size is reached, a RASLog message is generated, and periodic console messages are sent)

FFDC files are stored on the switch, and transferred by supportsave (automatically deletes files) or savecore (does not automatically delete files)

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

84

84

FFDC - Configuring
Enable and disable the FFDC functionality with the supportffdc command
Enabled by default - disable only if directed to do so by next-level support
switch:admin> supportffdc --enable <Enable FFDC> --disable <Disable FFDC> --show <Show FFDC state>

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

85

85

FFDC - Capturing
The supportsave command uploads the FFDC data via FTP, and deletes it from the switch
File name indicates the triggering event, and date/time stamp (example: FSSM1005-2006-08-12-114707.ffdc)

The savecore command also uploads the FFDC data via FTP (same file name), but does not delete it from the switch
switch:admin> savecore following 1 directories contains core files: [ ]0: /core_files/ffdc_data Welcome to core files management utility. Menu 1(or R): Remove all core files 2(or F): FTP all core files 3(or r): Remove marked files 4(or f): FTP marked files 5(or m): Mark Files for action 6(or u): Un Mark Files for action 9(or e): Exit Your choice:

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

86

86

Panic Dump and Core Files - Overview


Fabric OS creates panic dump and core files when there are problems in the Fabric OS kernel
Generated when an important Fabric OS daemon no longer responds or terminates unexpectedly Captures a snapshot of the current state of the switch at the time of the crash no historical information retained Panic dumps are text files, core file contents are encrypted

In a dual-CP Director, each CP can create these files, so always check both CPs

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

87

87

Panic Dump and Core Files (cont.)


To display panic dump files at the command line, enter the pdshow command
switch:admin> pdshow Could not find any valid pd file!

To upload (FTP) or delete (remove) panic dump and core files via FTP, use the savecore command
switch:admin> savecore -l /core_files/panic/core.873 /core_files/zoned/core.1234 /core_files/zoned/core.5678 /mnt/core_files/nsd/core.873 /mnt/core_files/panic/core.873 switch:admin> savecore -h 192.168.204.188 -u jsmith d core_files_here -p password f /core_files/zoned/,/mnt/core_files/nsd/ /core_files/zoned//core.1234: 1.12 kB 382.60 B/s /core_files/zoned//core.5678: 1.12 kB 381.95 B/s /mnt/core_files/nsd//core.873: 1.12 kB 382.53 B/s Files transferred successfully!

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

88

88

Trace Dump - Overview


The trace functionality is a proactive troubleshooting tool
Included in Fabric OS v4.4+ to aid Fabric OS debugging Always running, maintaining a historic record of the current and past state of the switch can not be disabled No impact on user data performance

The results from the trace operation are stored in a trace dump file
Triggered by a panic; timeout; CRITICAL-level event; or a manual trigger Binary file, retained in persistent memory Can be uploaded automatically or manually via FTP

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

89

89

Trace Dump - Implementation


Initiate or remove a trace dump file, or display trace dump status with the tracedump command
tracedump n: Create a trace dump manually tracedump r: Remove (delete) a trace dump from the switch

Use the traceftp command to manage the uploading (but not deleting) of trace dumps:
traceftp n: Manually upload trace dumps via FTP traceftp e: Enable automatic FTP upload of trace dumps traceftp d: Disable automatic FTP upload of trace dumps With traceftp e, specify the FTP server to which trace dumps are uploaded with the supportftp command must do this, or trace dump files will not be automatically uploaded

Web Tools supports some of the traceftp command functionality

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

90

90

Capturing Switch Support Data - Overview


There are several tools that you can use to capture switch support data:
supportshow supportsave Fabric Manager SAN Health

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

91

91

Capturing Switch Support Data supportshow


supportshow is a script that executes groups of pre-selected Fabric OS and LINUX commands, and displays them at the CLI command output To simplify troubleshooting for the future, use the supportshow output to establish a switch baseline
Documents the switch configuration under good conditions Future troubleshooting can start by comparing the current supportshow output with the baseline

supportshow takes ADs into consideration:


Command is relevant only in AD0 (no user-defined ADs) or AD255 (with user-defined ADs) AD must include the switch on which the command is run Example supportshow response in non-AD0/AD255 context:
Operation not allowed in AD1-AD254 context

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

92

92

Capturing Switch Support Data supportsave


To aid the capture of supportshow information, Fabric OS v4.4 introduced supportsave
Uploads supportshow in a text file whose name indicates the switch name (Director), CP slot (S0, S5), time stamp (200605200014), and SUPPORTSHOW Also uploads FFDC files, as well as other information
switch:admin> supportsave h 192.168.1.1 u anonymous d tmp This command will collect RASLOG, TRACE, and supportShow (active CP only) information for the local CP and then transfer them to a FTP server. The operation can take several minutes. OK to proceed? (yes, y, no, n): [no] y ... Saving support information for module SUPPORTSHOW... ...rtSave_files/Director-S5-200605200014-SUPPORTSHOW: 1.11 MB 346.39 kB/s

supportsave needs to be run on both the Active and Standby CPs

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

93

93

Capturing Switch Support Data


SAN Health
Another tool that automates the documentation of a SAN is Brocade SAN Health SAN Health is a free utility that helps you create:
Comprehensive Documentation Historical Performance Graphs Detailed Topology Diagrams Best Practice Recommendations

SAN Health can be run against:


Brocade systems running any version of Fabric OS or XPath OS McData systems running EOS 4.x+

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

94

94

Gathering Switch Support Data Troubleshooting


Before troubleshooting a Brocade switch, router, or Director, gather all the basic information that you can:
Document the current state of the switch with supportsave: RASLogs, numerous command outputs (supportshow) Identify user actions taken in the past: Audit logs (if available) Verify switch access settings (e.g. ipaddrshow) Check FRU status (e.g. fanshow) Validate firmware revisions (e.g. firmwareshow) Check port status, port errors (e.g. porterrshow)

Validate the current state of the switch by reviewing supportshow:


Identify faults on the switch by checking the RASLog (errdump) for errorrelated messages As needed, compare time stamps between the RASLog and the Audit Log to determine whether user actions were a problem source

March 2008

SAN Troubleshooting Basics

2008 Brocade Communications Systems, Inc. All rights reserved.

95

95

Gathering Switch Support Data


Escalating to Next-Level Support
If you are escalating an issue to next-level support, gather all the basic and Brocade information from the switch by running supportsave:
RASLogs supportshow FFDC files Trace dumps Core files and panic dumps AP blade details Affected devices/ports/switches SAN topology drawing Previous course of action (timeline, commands run) Details on recent changes to the fabric (additions/removal/configs)

In addition, describe the problem in as much detail as possible:


If available, also capture the Audit logs, so that past user actions can be identified
SAN Troubleshooting Basics
2008 Brocade Communications Systems, Inc. All rights reserved.

March 2008

96

96

Fin

97

You might also like