Professional Documents
Culture Documents
Agenda
Virtual Machine
Guest OS
TCP/IP
I/O Drivers
ESXi
Scheduler
Physical
Hardware
Memory
Allocator
Virtual NIC
Virtual SCSI
Virtual Switch
File System
NIC Drivers
I/O Drivers
VMM
GAVG
vSCSI
ESX Storage
Stack
KAVG
QAVG
Time spent in ESX
storage stack is minimal,
for all practical purposes
KAVG ~= QAVG
Driver
HBA
Fabric
Array SP
DAVG
In a well configured
system QAVG should be
zero
GQLEN
VMM
vSCSI
ESX Storage
Stack
Driver
WQLEN
AQLEN
DQLEN
HBA
Reported in esxtop
DQLEN can change
dynamically when SIOC
is enabled
Fabric
Array SP
SQLEN
vCenter Operations
Aggregates metrics into workload,
capacity and health scores
esxtop/resxtop
For live troubleshooting and root cause analysis, Finer Granularity (2 Second)
Lots of Metrics reported
Available in the ESXi shell or the VMware Management Application (VMA)
E S X T O P
CPU
Scheduler
Memory
Scheduler
Virtual
Switch
vSCSI
c, i, p
d, u, v
c: cpu (default)
m: memory
n: network
p: power management
S C R E E N S
i: Interrupts
d: disk adapter
u: disk device
v: disk VM
10
IO commands
in Flight
IO commands
waiting in
Queue
World ID
World Queue Length
modifiable
Disk.SchedNumReq
uestOutstanding
11
Different
adapters have
different queue
size
Adapter Queue
can come into play
if the total
outstanding IOs
exceeds the
adapter queue
12
KAVG is
non-zero
LUN
Queue
depth is 32
13
32 IOs in
flight and
32
Queued
Queuing
issue
14
Elapsed Time
WAIT
IDLE
RDY
SWPWT
Guest I/O
blocked MLMTD
VMWAIT
CSTP
RUN
I/O activity to
NFS datastore
System time
charged for
NFS activity
16
No I/O activity
on the NFS
datastore
VM is not
using CPU
17
VM blocked,
connectivity lost
to NFS
datastore
18
19
20
Low-Latency
(SSD) Swap
21
22
Some swapping
activity
Time spent in
blocked state due
to swapping
23
vSphere 5.0
New Storage Features
24
VAAI: vSphere Storage APIs for Array Integration primitives for Thin
Provisioning
25
VSA Manager
Managed By
Volume 1
Volume 2
(Replica)
Volume 2
VSA
Datastore 1
VSA
Datastore 2
Volume 1
(Replica)
VSA Manager
Managed By
Volume11
Volume
Volume 22
Volume
(Replica)
(Replica)
Volume 2
VSA
Datastore 1
VSA
Datastore 2
VSA NFS IP #2
VSA NFS IP #1
27
Volume 1
(Replica)
29
Multi-threaded
Metrics Collected
30
31
vscsiStats Details
vscsiStats characterizes IO for each
virtual disk
Allows us to separate out each different type
of workload into its own container and
observe trends
Metrics
I/O Size
Seek Distance
Outstanding I/Os
I/O Interarrival Times
Latency
32
Miscellaneous
Storage Tips and Tricks
33
Sizing Storage
Throughput MB/s
*IOPS
Write Read
RAID 0
175
44
110
RAID 5
40
31
110
RAID 6
30
30
110
RAID 10
85
39
110
Rules of Thumb
50 - 150 IOPs/ VM
* 100%
Drive Type
MB/sec
IOPS
Latency
Use Case
FC 4Gb (15k)
100
200
5.5ms
FC 4Gb (10k)
75
165
6.8ms
<15 ms latencies
SAS (10k)
150
185
12.7ms
Streaming
~Typical workload
8K IO Size
45% Write
80% Random
SATA (7200)
140
38
12.7ms
Streaming/Nearline
SATA (7200)
68
38
12.7ms
Nearline
230(read)
180(write)
25000(read)
6000(write)
< 1 ms
34
SSD
Random
Random
Random
Sequential
Sequential
Random
Sequential
Random
Random
35
180
160
140
MBps
120
100
80
60
40
20
0
1
2
Sequential Read
Sequential
Write
Number
16
ofRandom
hostsRead
32
64
Random Write
Average bandwidth
increase of 9.6%
http://blogs.vmware.com/performance/2011/07/analysis-of-storage-technologies-on-clusters-using-vmmark-21.html
37
Disk
Use VMFS
Always use VMFS
VMFS Scalability
8000
superior functionality
7000
6000
www.vmware.com/pdf/esx3_partiti
on_align.pdf
If possible schedule
maintenance for off-peak hours
39
IOPS
VMFS
5000
4000
RDM
(virtual)
3000
RDM
(physical)
2000
1000
0
4K IO
16K IO
64K IO
pvscsi
Why not use?
Could not boot off of pvscsi before U1
Not optimized for low IO workloads
(<2000 IOPS)
http://kb.vmware.com/kb/1017652
Why use?
Up to 50% less CPU usage at high IO
Needed to exceed 30K IOPS on a
single VM
40
VMDirectPath
Why not use?
Monopolizes access to physical
hardware by a single VM
Why use?
Added 10% to extreme network
workloads (20 Gbps)
http://communities.vmware.com/docs/
DOC-12103
Storage Optimization
VMware ESX
HBA1
HBA2
HBA3
HBA4
FC Switch
SP1
Passive/Standby
SP2
41
Storage array
Summary
Throughput MB/sec
Response Time - Investigate > 30ms Disk Latency or > 2ms Kernel Latency
Follow Storage Vendor best practices for Block size and Alignment,
42
Thank You !!
43