Professional Documents
Culture Documents
MPIO on AIX
Materials may not be reproduced in whole or in part without the prior written permission of IBM. 5.3
IBM Power Systems Technical Symposium 2011
Agenda
Correctly Configuring Your Disks
► Filesets for disks and multipath code
Multi-path basics
Multi Path I/O (MPIO)
► Useful MPIO Commands
► Path priorities
► Failed Path Recovery and path health
checking
► MPIO
path management
SDD and SDDPCM
Multi-path code choices for DS4000, DS5000
and DS3950
XIV & Nseries
SAN Boot
https://tuf.hds.com/gsc/bin/view/Main/AIXODMUpdates
Disk configuration ftp://ftp.emc.com/pub/elab/aix/ODM_DEFINITIONS/
The disk vendor…
Dictates what multi-path code can be used
Supplies the filesets for the disks and multipath code
Supports the components that they supply
A fileset is loaded to update the ODM to support the storage
AIX then recognizes and appropriately configures the disk
Without this, disks are configured using a generic ODM definition
Performance and error handling may suffer as a result
# lsdev –Pc disk displays supported storage
The multi-path code will be a different fileset
Unless using the MPIO that’s included with AIX
Server
FC Switch
Storage
4 X 4 = 16 2X2+2X2=8
If the links aren’t busy, there likely won’t be much, if any, savings from
use of sophisticated path selection algorithims vs. round robin
Generally utilization
of links is low
Costs of path selection algorithms
CPU cycles to choose the best path
Memory to keep track of in-flight IOs down each path, or
Memory to keep track of IO service times down each path
Latency added to the IO to choose the best path
© 2011 IBM Corporation 6
IBM Power Systems Technical Symposium 2011
VIO Server VIO Server VIOC uses multi-path code that the disk subsystem
supports
What is MPIO?
MPIO is an architecture designed by AIX development (released in AIX V5.2)
MPIO is also a commonly used acronym for Multi-Path IO
► In this presentation MPIO refers explicitly to the architecture, not the acronym
MPIO support
Storage Subsystem Family MPIO code Multi-path algorithm
IBM ESS, DS6000, DS8000, IBM Subsystem Device
fail_over, round_robin, load
DS3950, DS4000, DS5000, Driver Path Control
balance, load balance port
SVC, V7000 Module (SDDPCM)
Default FC PCM
DS3/4/5000 in VIOS fail_over, round_robin
recommended
IBM XIV Storage System Default FC PCM fail_over, round_robin
The disk subsystem vendor specifies what multi-path code is supported for their storage
►The disk subsystem vendor supports their storage, the server vendor generally doesn’t
You can mix multi-path code compliant with MPIO and even share adapters
►There may be exceptions. Contact vendor for latest updates.
HP example: “Connection to a common server with different HBAs requires separate HBA
zones for XP, VA, and EVA”
Generally one non-MPIO compliant code set can exist with other MPIO compliant code sets
►Except that SDD and RDAC can be mixed on the same LPAR
►The non-MPIO compliant code must be using its own adapters
Devices of a given type use only one multi-path code set
►e.g., you can’t used SDDPCM for one DS8000 and SDD for another DS8000 on the same
AIX instance
Disk using MPIO compliant code sets can share adapter ports
Path priorities
A Priority Attribute for paths can be used to specify a preference for path
IOs. How it works depends whether the hdisk’s algorithm attribute is set to
fail_over or round_robin.
Value specified is inverse to priority, i.e. “1” is high priority
algorithm=fail_over
►the path with the higher priority value handles all the IOs unless there's a path failure.
►the other path(s) will only be used when there is a path failure.
►Set the primary path to be used by setting it's priority value to 1, and the next path's
priority (in case of path failure) to 2, and so on.
►if the path priorities are the same and algorithm=fail_over, the primary path will be the
first listed for the hdisk in the CuPath ODM as shown by # odmget CuPath
algorithm=round_robin
►If the priority attributes are the same, then IOs go down each path equally.
►In the case of two paths, if you set path A’s priority to 1 and path B’s to 255, then for
every IO going down path A, there will be 255 IOs sent down path B.
Path priorities
# lsattr -El hdisk9
PCM PCM/friend/otherapdisk Path Control Module False
algorithm fail_over Algorithm True
hcheck_interval 60 Health Check Interval True
hcheck_mode nonactive Health Check Mode True
lun_id 0x5000000000000 Logical Unit Number ID False
node_name 0x20060080e517b6ba FC Node Name False
queue_depth 10 Queue DEPTH True
reserve_policy single_path Reserve Policy True
ww_name 0x20160080e517b6ba FC World Wide Name False
…
►Set priorities for half the LUNs to use VIOSa/vscsi0 and half to use
VIOSb/vscsi1
►Uses both VIOSs CPU and virtual adapters
►algorithm=fail_over is the only option at the VIOC for VSCSI disks
With NSeries – have the IOs go the primary controller for the LUN
►Set via the dotpaths utility that comes with Nseries filesets
hcheck_interval
► Defines how often the health check is performed on the paths for a device. The attribute supports a range
from 0 to 3600 seconds. When a value of 0 is selected (the default), health checking is disabled
► Preferably set to at least 2X IO timeout value
hcheck_mode
► Determines which paths should be checked when the health check capability is used:
● enabled: Sends the healthcheck command down paths with a state of enabled
● failed: Sends the healthcheck command down paths with a state of failed
● nonactive: (Default) Sends the healthcheck command down paths that have no active I/O, including
paths with a state of failed. If the algorithm selected is failover, then the healthcheck command is
also sent on each of the paths that have a state of enabled but have no active IO. If the algorithm
selected is round_robin, then the healthcheck command is only sent on paths with a state of failed,
because the round_robin algorithm keeps all enabled paths active with IO.
Path Recovery
MPIO will recover failed paths if path health checking is enabled with hcheck_mode=nonactive or failed
and the device has been opened
Trade-offs exist:
► Lots of path health checking can create a lot of SAN traffic
► Automatic recovery requires turning on path health checking for each LUN
► Lots of time between health checks means paths will take longer to recover after repair
► Health checking for a single LUN is often sufficient to monitor all the physical paths, but not to recover
them
SDD and SDDPCM also recover failed paths automatically
In addition, SDDPCM provides a health check daemon to provide an automated method of reclaiming failed
paths to a closed device.
One should also set up error notification for path failure, so that someone knows
about it and can correct it before something else fails.
This is accomplished by determining the error that shows up in the error log when a
path fails (via testing), and then
Adding an entry to the errnotify ODM class for that error which calls a script (that you
write) that notifies someone that a path has failed.
Hint: You can use # odmget errnotify to see what the entries (or stanzas) look like,
then you create a stanza and use the odmadd command to add it to the errnotify
class.
IOs for a LUN can be sent to any storage port with Active/Active controllers
LUNs are balanced across controllers for Active/Passive disk subsystems
► So a controller is active for some LUNs, but passive for the others
IOs for a LUN are only sent to the Active controller’s port for disk subsystems with Active/Passive
controllers
MPIO recognizes Active/Passive disk subsystems and sends IOs only to the primary controller
► Except under failure conditions, then the active/passive role switches for the affected LUNs
SDD: An Overview
SDD = Subsystem Device Driver – Pre-MPIO Architecture
Used with IBM ESS, DS6000, DS8000 and the SAN Volume Controller, but
is not MPIO compliant
SDD
Load balancing algorithms
►fo: failover
►rr: round robin
►lb: load balancing (aka. df or the default) and chooses adapter with fewest in-flight IOs
►lbs: load balancing sequential – optimized for sequential IO
►rrs: round robin sequential – optimized for sequential IO
The datapath command is used to examine vpaths, adapters, paths, vpath statistics,
path statistics, adapter statistics, dynamically change the load balancing algorithm,
and other administrative tasks such as adapter replacement, disabling paths, etc.
SDD automatically recovers failed paths that have been repaired via the sddsrv
daemon
Conclusion:
Load balancing is unlikely to improve performance--especially when
compared to other strategies like algorithm=round_robin or approaches
that balance IO with algorithm=fail_over
SDDPCM: An Overview
SDDPCM is MPIO compliant and can be used with IBM ESS, DS6000, DS8000,
DS4000 (most models), DS5000, DS3950, V7000 and the SVC
► A “host attachment” fileset (populates the ODM) and SDDPCM fileset are both installed
► Host attachment: devices.fcp.disk.ibm.mpio.rte
► SDDPCM: devices.sddpcm.<version>.rte
Fail over
Path Selection Fail over (default) Round Robin
Algorithms Round Robin (excluding VSCSI disks) Load Balancing (default)
Load Balancing Port
SDDPCM
Load balancing algorithms
► rr - round robin
► lb - load balancing based on in-flight IOs per adapter
► fo - failover policy
► lbp - load balancing port (for ESS, DS6000, DS8000, V7000 and SVC only) based
on in-flight IOs per adapter and per storage port
The pcmpath command is used to examine hdisks, adapters, paths, hdisk statistics, path
statistics, adapter statistics, dynamically change the load balancing algorithm, and other
administrative tasks such as adapter replacement, disabling paths
SDDPCM automatically recovers failed paths that have been repaired via the pcmserv
daemon
► MPIO health checking can also be used, and can be dynamically set via the pcmpath
command. This is recommended. Set the hc_interval to a non-zero value for an
appropriate number of LUNs to check the physical paths.
Transfer Size: <= 512 <= 4k <= 16K <= 64K > 64K
183162 67388759 35609487 46379563 22703724
…
www-01.ibm.com/support/docview.wss?rs=540&uid=ssg1S7001350#AIXSDDPCM
# sddpcm_get_config –Av
output is the same as above
XIV
Nseries/NetApp
Nseries/NetApp has a preferred storage controller for each LUN
Not exactly an active/passive disk subsystem, as the non-preferred
controller can accept IO requests
I/O requests have to be passed to the preferred controller which
impacts latency
Install the SAN Toolkit
Ontap.mpio_attach_kit.*
► Provides the dotpaths utility
and sanlun commands
► dotpaths sets hdisk path priorities
to favor the primary controller
…for every IO going down secondary path, there will be 255 IOs sent down primary path
AIX installation
► Boot from installation CD or NIM, this runs the install program
► When you do the installation you'll get a list of disks that will be on the SAN for the
system
► Choose the disks for installing rootvg
► Be aware of disk SCSI reservation policies
● Avoid policies that limit access to a single path or adapter
Only assign the rootvg LUN to the host prior to install, assign data LUNs later, or
Create a LUN for rootvg with a size different than other LUNs, or
Write down LUN ID and storage WWN, or
Use disk with an existing PVID
These criteria can be used to select the LUN from the AIX install program (shown
in following screen shots) or via a bosinst_data file for NIM
1 hdisk2 U8234.EMA.06EF634-V5-C22-T1-W50050768012017C2-L1000000000000
2 hdisk3 U8234.EMA.06EF634-V5-C22-T1-W500507680120165C-L2000000000000
3 hdisk5 U8234.EMA.06EF634-V5-C22-T1-W500507680120165C-L3000000000000
Session Evaluations
Session Number – SE39
Session Name – Working with San Boot…
Date - Thursday, April 28, 14:30, Lake Down B
Friday, April 29, 13:00, Lake Hart B