Openvms I/o and Storage

OPENVMS I/O AND STORAGE
Tips and Best Practices for good performance
Rafiq Ahamed K OpenVMS Engineering
2011 Hewlett-Packard Development Company, L.P. 2010 The information contained herein is subject to change without notice
Agenda
OpenVMS I/O Facts

I/O
Evolution on Integrity servers to expect from hardware do you know about multipathing on IOPERFORM and I/O
What What Notes
NUMA
Storage Tips and Tricks

EVA
Best Practices Connectivity a sneak into EVA PA
OpenVMS Have
Q&A
9/19/2011
Operating System performance is largely dependent on the underlying hardware
..So know your hardware capabilities
39/19/2011
Integrity server I/O

PCI PCI (66MHz, Interconnects 0.5GB/sec) PCI-X PCIe Gen 1 (133MHz,1GB/sec) (2.5Gb/sec/Lane, PCI-X 250MB/sec) (266MHz, 2GB/sec) Ultra SCSI 320 2G FC 600GB 3G SAS (LSI Logic) 4G FC 1.2TB PCIe Gen 2 (5Gb/sec/Lane, 500MB/sec) 6G SAS (p410i) 8G FC 7.2TB
Core I/O SAN I/O Disk Size
Ultra SCSI 160 1G FC 300GB
# I/O Device 3
12/16
Architecture has evolved drastically for I/O Devices within Integrity, Performance & Scalability - doubling in each new hardware release
9/19/2011
Examples of latest speeds and feeds of I/O on Integrity platforms
Leadership in I/O and Storage on i2 architecture

High performance, reliable and scalable SAS provides a point-to-point connection to each HDD @ 6G Speeds Provides four p410 RAID controllers (one per blade) on BL890c i2, One 410 RAID on rx2800
Configured as RAID 0/1 or HBA mode [Future]
Stripe data within and across multiple p410 RAID controllers (OpenVMS Shadowing)
Striping within provides high Performance. Striping across controllers provides no SPOF storage
Parallel SCSI with rx7640 has a shared bus Ultra160 SCSI
Each BL890c/rx2800 supports eight SFF SAS HDD, up to 7.2TB capacity
9/19/2011
Core I/O on i2 servers

Data shows the impact of p410i Caching and Striping
rx2800 i2 - Core SAS Caching
700 600 2500 2000
IOPS
rx2800 i2 - SAS Logical Disk (Striping)
500
IOPS
400 300 200 100 0 1 2 4 8 16

Load
1500 1000 500 0
32
64
128
256
16
Load
32
64
128
256
W/O Cache
Cache
1 disk with Cache
2 disk with Cache
4 disk with Cache
Use p410i Cache Batter Kit for faster response Stripe across multiple disks to maximize the utilization and throughput
9/19/2011
Customer Concerns..

How is I/O Performance on Integrity Servers? How they compare against my existing high end alpha servers? After migrating to new platform what should I expect? Why is i2 server I/O market differentiator?
9/19/2011
Software capabilities; multipathing
10
Multipathing 1(4)
Multipathing (MP) is a technique to manage multiple paths to storage device through failover and failback mechanisms It helps user to load balance across multiple paths to storage By default multipath will be enabled on OpenVMS OpenVMS MP supports ALUA (Asymmetric Logical Unit Access) [> V8.3]
OpenVMS MP supports FAILOVER and FAILBACK

OpenVMS MP load balances the storage devices across all the paths available
It
spreads # of devices evenly across all the paths during boot time
At any point in time only single path can be active to a device

Users are recommended to use static load balancing techniques
11
9/19/2011
MP Connections Good and Bad

IA64 switch IA64 IA64 switch switch
Shows single controller configurations
switch
HSV
Single
HSV
Two Paths
HSV
Four Paths
Alpha
IA64
Alpha
IA64
switch
switch
switch
switch
HSV
HSV
HSV
HSV
12
9/19/2011
Multipathing 2(4)
Device discovery initiates the path discovery and forms MP set for each device
MC First The
SYSMAN IO AUTO, SDA > SHOW DEVICE DGAxx shows MP Set path discovered is considered primary active path is called current path selection algorithms optimized to support Active/Active arrays
Automatic Always
active optimized (AO) paths are picked for I/O, if no alternative then active nonoptimized (ANO) is picked [ how to fix this will discuss during EVA best practices]
With latest firmware's on storage, very rare you will get connected to ANO
13
9/19/2011
Multipathing 3(4)
VMS switches its path to a LUN when:
An I/O error triggers mount verification (MV)

Device
is not reachable on current path and another path works
MOUNT of a device with current path offline Manual path switch via SET DEVICE/SWITCH/PATH= Some local path becomes available when current path is MSCP
Path
switch from MSCP to local triggered by the poller [not if manual sw]
Note: Any MV might trigger a path switch

MV MV
due to loss of cluster quorum due to SCSI error flush
14
9/19/2011
Multipathing 4(4)
MPDEV_POLLER is light weight, will poll all paths for availability

SET
DEVICE device/POLL/NOPOLL
MV is not bad. It only indicates that OpenVMS validated your device

Shadow
devices can initiate and complete a MV ; Each shadow member operate independently on a path switch
But, MV followed with path switch is indication of failover/failback

The
operator logs will indicate the details DEV device/full will show details of path switch [time etc] SHOW DEV device logs lot of diagnostic information in MPDEV structure
SHOW SDA>
15
9/19/2011
SCSI Error Poller
We have seen customers reporting huge traffic of Unit Attention(UA) in SAN resulting in Cluster hangs, slow disk operations, high mount verifications etc! These UA are initiated due to any changes in SAN like Firmware Upgrades, Bus Reset etc SCSI_ERROR_POLL is poller responsible for clearing the latched errors (like SCSI UA) on all the fibre and SCSI devices, which can otherwise cause confusion in SAN By default the poller will be enabled SYSGEN>SHOW SCSI_ERROR_POLL
16
9/19/2011
Customer/Field Concerns..
OpenVMS Multipathing
After upgrading my SAN components, I see large number of mount verifications, does that indicate problem? Does multipathing do load balance? Or policies? I see too many mount verification messages in operator log, will it impact the volume performance (especially latency) How do I know if my paths are well balanced or not? How do I know my current path is active optimized or not? Does multipathing support Active/Active Arrays, ALUA, third party storage, SAS device, SSD device
17
9/19/2011
Did you know QIO is one of the most heavily used interface in OpenVMS. We want to put it on diet. What should we do? 1. Optimize QIO 2. Replace QIO 3. Provide alternative
18 9/19/2011
IOPERFORM/FastIO

Fast I/O is a performance-enhanced alternative to performing QIOs Substantially reduces the setup time for an I/O request Fast I/O uses buffer objects to eliminate the I/O overhead of manipulating I/O buffers locked memory doubly mapped Performed using the buffer objects and the following system services:
sys$io_setup
,sys$io_perform, sys$io_cleanup
(jacket) /sys$create_bufobj_64
sys$create_bufobj $
dir sys$examples:io_perform.c
System management considerations:

SYSGEN parameter
Creating
MAXBOBMEM limits memory usage (defaults to 100)
VMS$BUFFER_OBJECT_USER identifier is required for process
buffer objects once and using them for the time of application is faster
19
9/19/2011
Impact of IOPERFORM/FASTIO
I/O Data Rate (MB/sec)
Resource usage is reduced by 20-70% depending on load and system

Small
size random workloads doubles the throughput with increased loads size sequential workloads operate same
450 400 350 300 250 200 150 100 50 0 1 2 3 4 Threads 5 6 7
128K_READ 128K_READ_QIO 128K_WRITE 128K_WRITE_QIO
Larger
Throughput (IOPS)
45000 40000 35000 30000 25000 20000 15000 10000 5000 0
8K_READ
8K_READ_QIO
8
Threads
16
32
64
20
9/19/2011
NUMA/RAD Impact
What you should know?
DGA100
P1
22
9/19/2011
BL890c i2 (architecturally 4 Blades conjoined)
NUMA/RAD Impact
In RAD based system, each RAD is made of CPU, Memory and I/O devices The accessibility of I/O devices from local to remote domains results in accessing remote memory and remote interrupt latency
Impact of RAD on I/O Device
3700 3600 3500 IO Rate 3400 3300 3200 3100
Optimized Performance
10-15 % overhead
3000
0 1 2 3 4 RAD #
23 9/19/2011
Opt/sec
RAD Guidelines for I/O

Keep I/O Devices close to process which is heavily accessing it Make use of FASTPATHING efficiently
Make The
sure to FASTPATH the Fibre Devices close to the process which is initiating the I/O
overhead involved in handling the remote I/O can impact the throughput [chart]
FASTPATH algorithms assign the CPU on round robin basis Statically load balance the devices across multiple RADs Make use to SET PROC/AFFINITY to bind processes with high I/O Use SET DEVICE device/PREFERRED_CPUS
24
9/19/2011
STORAGE BEST PRACTICES
25
EVA Differences
Speeds and Feeds
EVA4400
Model Memory /controller pair HSV300 4GBytes
EVA6400
HSV400 8GBytes
EVA8400
HSV450 14/22GBytes
P6300
HSV340 4GBytes
P6500
HSV360 8GBytes
Host Ports / controller pair
4 FC 20 w switches
8 FC
8 FC
Host Port speed Device Ports, # Device Ports, Speed # 3-1/2 drives # 2-1/2 drives Max. Vdisk I/O Read Bandwidth I/O Write Bandwidth Random Read I/O
4Gb/s FC 4 4Gb/s FC 96 0 1024 780 MB/s 590 MB/s 26,000 IOPs
4Gb/s FC
4Gb/s FC
8 4Gb/s FC 216 0 2048 1,250 MB/s 510 MB/s 54,000 IOPs
12 4Gb/s FC 324 0 2048 1,545 MB/s 515 MB/s 54,000 IOPs
8 FC, 0 GbE 4 FC, 8 1GbE 4 FC, 4 10GbE 8Gb/s FC 8Gb/s FC 1Gb/s iSCSI 1Gb/s iSCSI 10Gb/s 10Gb/s iSCSI/FCoE iSCSI/FCoE 8 16
6Gb/s 120 250 1024 1,700 MB/s 600 MB/s 45,000 IOPs 6Gb/s 240 500 2048 1,700 MB/s 780 MB/s 55,000 IOPs
8 FC, 0 GbE 4 FC, 8 1GbE 4 FC, 4 10GbE
27
9/19/2011
General I/O issues reported
After upgrading the OS or applying the patch, I/O response has become slower
We
see 5-6 millisecond delay in completion of I/O compared to yesterday
After moving to new blade in same SAN environment, our CRTL FSYNC is running slow After upgrading, we see additional CPUs ticks for copy, delete and rename Our database is suddenly responding slow Some nodes in cluster see high I/O latency after mid-night
Customer wants to know if this storage is enough for next 5 years

Customer is migrating from older version of EVA to newer version, can you advise
9/19/2011
28
Maximum number of storage performance issues reported are due to mis-configuration of SAN components
29 9/19/2011
Best Practices..1(6)
Number of disks influences performance - Yes

Fill
the EVA with as many disk drives as possible.

have shown linear growth in throughput (small random)
Tests
Number of disk groups influences performance No

In
mixed load environments, it would be ok to have random vs. sequential application disk groups best performance over the widest range of workloads; however, Vraid5 is better for some sequential-write workloads provides the best random write workload performance , but no protection Use for non-critical storage needs
Vraid level influences performance Yes

Vraid1 Vraid0
30
9/19/2011
Fibre channel disk speed - Yes

10K
vs 15K Speed 15K rpm disks provide highest performance

sequential I/O, speed doesnt matter, but capacity matters random I/O, 30-40% gains in request rates is seen
Large-block Small-block Best
price-performance
For the equivalent cost of using 15k rpm disks, consider using more 10k rpm disks.
Combine disks with different performance characteristics in the same disk group
Do
not create separate disk groups to enhance performance
31
9/19/2011
Mixing disk capacities Yes

The No Use
EVA stripes LUN capacity across all the disks in a disk group. The larger disk will have more LUN capacity leading into imbalanced density control over the demand to the larger disks. disks with equal capacity in a disk group.
Read cache management influences performance and always ENABLE LUN count Yes, No
Good
to have few LUNs per controller on Host Requests and Queue Depths OpenVMS Queue depth
Depends Monitor
Transfer size Yes

Impacts
Tune
SEQUENTIAL Workloads
the write transfer size to be a multiple of 8K and no greater than 128K.
OpenVMS
Max Transfer Size is 128K for Disk and 64K for Tapes! DEVICE_MAX_IO_SIZE
9/19/2011
32
SSD performance - Yes, Yes, Yes

SSDs
are about 12 times better
OpenVMS performed 10x times better than FC [next slide details]
Workloads Spread SSDs Monitor
like transaction processing, data mining, and databases such as Oracle are ideal SSDs evenly across all available back end loops and HDDs may be mixed in the same drive enclosure
your application and EVA, accordingly can assign SSD or HDD to individual Controllers, or enable write through mode for SSDs can help [Experiment!!] use SSD drives to keep the critical path data, where the response time is un-compromised
Customers
33
9/19/2011
OpenVMS 8.4 Performance Results

SSD Drive Through EVA
4K Mixed QIO - FC vs SSD
14000 12000 10000 IOPS 8000 6000 4000 2000 0 1 2 4 8 16 Threads QIO_FC QIO_SSD FC 32 64 128 256 Res Time(msec)
OpenVMS V8.4
10x Faster
250 200 150 100 50 0 0
4K Mixed QIO FC vs SSD
10x Faster
50
100
150 Threads SSD
200
250
300
Mixed Load, 8 Disks SSD/FC DG on EVA4400 Smaller IOs(4K/8K) showed 9-10x times sustained increased in IOPS and MB/sec with increase in load for SSD carved LUNs compared to FC With 10 times faster response time, SSD carved LUN was able to deliver 10 time more performances and bandwidth for smaller size IOs
9/19/2011
34
Controller balancing Yes, Yes, Yes

Maximize Manually
the utilization present LUNs simultaneously through both controllers, Ownership is only one
Active/Active
load balance LUN ownership to both controllers (use EVAPerf) either through using CV EVA or using OpenVMS SET DEVICE/SWITCH/PATH='PATH_NAME' 'DEV_NAME' command Path During the initial boot of the EVA the preferred path parameter is read and determines the managing controller [see below figure for options] LUN ownership is reassigned after a failed controller has been repaired the workload as evenly as possible across all the Host Ports
DGA99 answers Inquiry on these ports DGA99 does I/O on these ports HSVxxx HSVxxx
Preferred Verify
Balance
DGA99
35 9/19/2011
Command View EVA Preferred Path Settings
Customer Scenario
Controller Load Imbalance & No Equal Port Load Distribution
36
9/19/2011
Ensure there are no HW issues Specially Battery failure (Cache Battery failure causes to change to Write-Through mode, hence Write performance becomes an issue), Device loop failure etc. Drive reporting timeouts Deploy the array only in supported configurations Stay latest on EVA firmware!! BC and CA have different best practices and beyond the scope of this discussion
37
9/19/2011
Some Points to Remember
Large latencies may be quite natural in some contexts, such as an array processing large I/O requests Array processor utilizations tend to be high under intense, small-block, transaction oriented workloads but low under intense large-block workloads
38
9/19/2011
OpenVMS IO Data
P6500 36G RAID 5 Volume 4G FC Infrastructure
Sequential Read (MB/sec)
45 40 35 30 25 20 15 10 5 0 138 286 367 383 404 411 412 412 MB/sec resp time(msec)
Higher bandwidth can be obtained with larger blocks. Larger blocks can drain the interconnects faster due large data transfers.
128K IOs pushing 4G FC line speeds!
Random Read (IO/sec)

6 resp time(msec) 5 4 3 2 1 0 6562 12935 22287 31888 38136 40052 43452 45504 46202 IOPS
Higher throughputs can be obtained with smaller blocks. Usually smaller blocks need a lot of processing power
8K workloads pushing close to EVA Max Throughputs!
40
9/19/2011
STORAGE PERFORMANCE ANALYSIS TOOLS & REFERENCES
41 9/19/2011
Storage Performance Tools
OpenVMS Host Utilities

T4,
TLViz Disk, FCP, VEVAMON (older EVAs) > FC [for fibre devices], PKR/PKC [for SAS devices] SCSI_INFO.EXE, SCSI_MODE.EXE, FIBRE_SCAN.EXE more..
SDA
SYS$ETC: Many
EVAPerf Command Line EVA Performance Data Collector
EVA Performance Advisor [ Year End Release]

XP Performance Advisor Storage Essential Performance Pack
42
9/19/2011
Salient aspects of HP P6000 Performance Advisor

Slated to release soon Can participate in early adaptor program
Integrated with Command View 10.0 in single pane of glass

User centric design compliance Features:
Dashboard Threshold Key Chart Report
monitoring & notification
metric
Quick setup Events Database
43
9/19/2011
Sizing EVA
HP StorageWorks Sizing Tool
44
9/19/2011
References EVA Performance
HP StorageWorks Enterprise Virtual Array A tactical approach to performance problem diagnosis. HP Document Library
45
9/19/2011
Questions/Comments
Business Manager (Rohini Madhavan)

rohini.madhavan@hp.com
Office of Customer Programs

OpenVMS.Programs@hp.com
46
9/19/2011
THANK YOU
EVA Models - Reference

EVA Model EVA3000 EVA5000 EVA3000 EVA5000 EVA4000 EVA6000 EVA4100 EVA6100 Controller Model HSV100 HSV1 10 HSV101 HSV1 1 1 HSV200 or HSV200-A HSV200 or HSV200-A HSV200-B HSV200-B 5.XXX or 6.XXX (Latest 6220) Firmware
1.XXX, 2.XXX or 3.XXX (Latest 31 10)

4.XXX (Latest 4100)
EVA8000
EVA8100 EVA4400 EVA6400 EVA8400
HSV210 or HSV210-A
HSV210-B HSV300 HSV400 HSV450 09XXXXXX or 10000000 095XXXXX, or 10000000 10000090
P6300
P6500
48 9/19/2011
HSV340
HSV360

Openvms I/o and Storage

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Openvms I/o and Storage

Uploaded by

Copyright:

Available Formats

OPENVMS I/O AND STORAGE

Tips and Best Practices for good performance

Rafiq Ahamed K OpenVMS Engineering

OpenVMS I/O Facts

What What Notes

Storage Tips and Tricks

Best Practices Connectivity a sneak into EVA PA

Operating System performance is largely dependent on the underlying hardware

..So know your hardware capabilities

Integrity server I/O

Core I/O SAN I/O Disk Size

Ultra SCSI 160 1G FC 300GB

Examples of latest speeds and feeds of I/O on Integrity platforms

Leadership in I/O and Storage on i2 architecture

Parallel SCSI with rx7640 has a shared bus Ultra160 SCSI

Each BL890c/rx2800 supports eight SFF SAS HDD, up to 7.2TB capacity

Core I/O on i2 servers

rx2800 i2 - SAS Logical Disk (Striping)

400 300 200 100 0 1 2 4 8 16

1500 1000 500 0

1 disk with Cache

2 disk with Cache

4 disk with Cache

Software capabilities; multipathing

OpenVMS MP supports FAILOVER and FAILBACK

At any point in time only single path can be active to a device

MP Connections Good and Bad

An I/O error triggers mount verification (MV)

is not reachable on current path and another path works

Note: Any MV might trigger a path switch

due to loss of cluster quorum due to SCSI error flush

MPDEV_POLLER is light weight, will poll all paths for availability

MV is not bad. It only indicates that OpenVMS validated your device

But, MV followed with path switch is indication of failover/failback

SCSI Error Poller

System management considerations:

MAXBOBMEM limits memory usage (defaults to 100)

VMS$BUFFER_OBJECT_USER identifier is required for process

Resource usage is reduced by 20-70% depending on load and system

450 400 350 300 250 200 150 100 50 0 1 2 3 4 Threads 5 6 7

128K_READ 128K_READ_QIO 128K_WRITE 128K_WRITE_QIO

BL890c i2 (architecturally 4 Blades conjoined)

RAD Guidelines for I/O

STORAGE BEST PRACTICES

Host Ports / controller pair

4Gb/s FC 4 4Gb/s FC 96 0 1024 780 MB/s 590 MB/s 26,000 IOPs

8 4Gb/s FC 216 0 2048 1,250 MB/s 510 MB/s 54,000 IOPs

12 4Gb/s FC 324 0 2048 1,545 MB/s 515 MB/s 54,000 IOPs

8 FC, 0 GbE 4 FC, 8 1GbE 4 FC, 4 10GbE

General I/O issues reported

see 5-6 millisecond delay in completion of I/O compared to yesterday

Customer wants to know if this storage is enough for next 5 years

Number of disks influences performance - Yes

the EVA with as many disk drives as possible.

Number of disk groups influences performance No

Vraid level influences performance Yes

Fibre channel disk speed - Yes

vs 15K Speed 15K rpm disks provide highest performance

Large-block Small-block Best

not create separate disk groups to enhance performance

Mixing disk capacities Yes

Transfer size Yes

the write transfer size to be a multiple of 8K and no greater than 128K.