V Sphere Apis For Performance Monitoring

vSphere APIs for performance monitoring
London Workshop October 2010
Balaji Parimi, Staff Engineer, Ecosystem Performance, VMware, Inc. Ravi Soundararajan, Senior Staff Engineer, Performance, VMware, Inc.
Motivation
To debug performance, why deal with this...?
Motivation
When you can deal with this instead?
More motivation
Why look at data like this?
Before memhog: no guest swapping
After memhog, guest swaps, but Host does not!
More motivation
When you can look at it like this?
Even more motivation

Why compare resource pool performance like this?
Even more motivation

When you can compare them like this?
Why?
vSphere gives you awesome, helpful charts But you dont have to rely solely on these charts Do you want to learn how to make your own charts?
Keep watching
Goal
Teach you how to use our APIs for performance monitoring
Agenda
What sorts of stats are useful? How does vSphere retrieve them? How can you get these stats and use them yourself?
Useful stats
Basics of performance monitoring in virtual infrastructure Find underperforming resources Find overcommitted resources Identify issues due to resource sharing among VMs
Resources we will look at

CPU Memory Disk Network
Resources that we often look at

CPU basics
Wait/Idle VM5 VM6 VM4 Ready Run
VM0
CPU0
VM1
CPU1
VM2
CPU2
VM3
CPU3
ESX
Run (accumulating used time) Ready (wants to run, no physical CPU available) Wait: blocked on I/O or voluntarily descheduled
Why is my VM slow?
CPU saturated (cpu.usage.average) Ready time? (cpu.ready.summation) Latency to be swapped in? (cpu.swapwait.summation)
CPU saturation
2 vCPUs 2.2GHz/CPU ~4.4GHz used (Look at left y-axis)
Small ready time
Ready time vCPU1: 150ms Real-time chart: refresh 20s 150ms / 20s = 0.75% (No big deal) Right y-axis is relevant
Now, turn on CPU burner on same host
CPU burner ~100% of 1 vCPU
And see what happens to original VMs ready time
SpecJBB ready time ~2000ms = 10% (ps. SpecJBB perf. dropped by 10%)
Latency to load in VM: cpu.swapwait.average

Sometimes there is a latency to load VM data from disk: cpu swapwait
CPU takes 20s to load in data before VM can run!
CPU issues: Summary

CPU saturated? High Ready time Problematic if it is sustained for high periods
Sample rule of thumb: > 20% per vCPU
investigate further
Possible contention for CPU resources among VMs

Workload Variability? Fix with VMotion/DRS Resource limits on VMs? Check Limits, reservations and shares Actual over commitment? Fix with Vmotion/DRS/more CPUs
High SwapWait time Consider setting memory reservation (see next section, Memory)

Memory
ESX must balance memory usage
Page sharing to reduce memory footprint of Virtual Machines Ballooning to relieve memory pressure in a graceful way Host swapping to relieve memory pressure when ballooning insufficient Compression to relieve memory pressure without host-level swapping
ESX allows over commitment of memory Sum of configured memory sizes of virtual machines can be greater than
physical memory if working sets fit
Memory also has limits, shares, and reservations Host swapping can cause performance degradation
Ballooning, compression, and swapping (1)

Ballooning: Memctl driver grabs pages and gives to ESX
Guest OS choose pages to give to memctl (avoids hot pages if possible): either free
pages or pages to swap
Unused pages are given directly to memctl Pages to be swapped are first written to swap partition within guest OS and then
given to memctl
VM1 F memctl 2. Reclaim 3. Redistribute Swap partition w/in Guest OS 1. Balloon
VM2
ESX
Ballooning, swapping, and compression (2)

Swapping: ESX reclaims pages forcibly
Guest doesnt pick pagesESX may inadvertently pick hot pages ( possible VM
performance implications)
Pages written to VM swap file
VM1
VM2
Swap Partition VSWP (w/in guest) (external to guest)
ESX
1. Force Swap 2. Reclaim 3. Redistribute
Ballooning, swapping and compression (3)

Compression: ESX reclaims pages, writes to in-memory cache
Guest doesnt pick pagesESX may inadvertently pick hot pages ( possible VM
performance implications)
Pages written in-memory cache
faster than host-level swapping
VM1
VM2
Swap Partition (w/in guest)
ESX
1. Write to Compression Cache 2. Give pages to VM2
Compression Cache
Ballooning, swapping, and compression

Bottom line: Ballooning may occur even when no memory pressure just to keep memory
proportions under control
Ballooning is preferable to compression and vastly preferably to swapping

Guest can surrender unused/free pages With host swapping, ESX cannot tell which pages are unused or free and may
accidentally pick hot pages
Even if balloon driver has to swap to satisfy the balloon request, guest chooses what
to swap
Can avoid swapping hot pages within guest Compression: reading from compression cache is faster than reading from disk
Swapping in Guest! = Swapping in Host

DVDstore benchmark: SQL DB benchmark uses lots of memory
About to start memory hogger program in guest
Force Guest swapping: No Host-level swapping
Before memhog: no guest swapping
After memhog, guest swaps, but Host does not!
Viewing Host-level swapping with performance charts
Setup: 2 VMsone dvdstore, one memhog, competing for host memory Host swaps out dvdstore VM memory to fulfill memhog VM requests Host swaps in dvdstore VM memory to fulfill dvdstore VM requests
Using Swap Rate Counters: Remember CPU SwapWait?
Cpu.swapwait.summation: CPU is waiting for memory to be swapped in
Absolute Swap Counters
Swapin, swapout (KB) show some activity but hard to detect
And Swap Rate Counters
SwapinRate, SwapoutRate (KBps) show activity much more clearly Rule of thumb: host swapping > 1MBps is cause for concern

ESX storage stack

Different latencies for local disk vs. SAN (caching, switches, etc.) Queuing within kernel and in hardware vSphere shows
Total Command Latency Kernel Latency Device Latency Bandwidth/IOPS
Disk performance problems 101

What should I look for to figure out if disk is an issue?
Am I getting the IOPs I expect? Am I getting the bandwidth (read/write) I expect? Are the latencies higher than I expect? Where is time being spent?
What are some things I can do? Make sure devices are configured properly (caches, queue depths) Use multiple adapters and multipathing Check networking settings (for iSCSI/NAS)
Another disk example: Slow VM power on

Trying to Power on a VM Sometimes, powering on VM would take 5 seconds Other times, powering on VM would take 5 minutes! Where to begin? Powering on a VM requires disk activity on host
Check disk metrics for host
Lets look at the vSphere client
Rule of thumb: latency > 20ms is Bad. Here: 1,100ms REALLY BAD!!!
Max Disk Latencies range from 100ms to 1100msvery high! Why? (counter name: disk.maxTotalLatency.latest)
High disk latency: Mystery solved
Host events: disk has connectivity issues
high latencies!
Bottom line: monitor disk latencies; issues may not be related to virtualization!

Network performance problems 101

What should I look for to figure out if network is an issue?
Am I getting the packet rate that I expect? Am I getting the bandwidth (read/write) I expect? Is all traffic on one NIC, or spread across many NICs? [more advanced not available through counters]: out-of-order packets?
What are some things I can do? Check host networking settings
Full-duplex/Half-duplex 10Gig network vs 100Mb network? Firewall settings
Check VM settings: all VMs on proper networks?
Network performance troubleshooting

Customer complains about slow network
Shes running netperf on a GigE Link She sees only 200Mbps Why? I bet its that VMware stuff!! Note to reader: Please dont blame VMware first
Where do we start?
All VMs using same NIC (VM network)
All VMs using VM Network and sharing 1 physical NIC
Where do we begin? Check VM bandwidth

Measure VM Bandwidth (net.transmitted.average)
200 Mb/s Screenshot from the vSphere client
Check Host Bandwidth

Measure Host Bandwidth (net.transmitted.average)
Host sees around 900Mbpswhy is VM at 200Mbps?
Hmm are we sharing this NIC with multiple VMs?
All traffic is going through one NIC!

Measure per-physical-NIC traffic
All traffic through one NIC on this host
Hmm all VM traffic is going through 1 NIC Lets split the VMs across NICs
Split VMs across multiple NICs. Bingo!
Network issues: Configuration woes
Network adapter set to autonegotiate: 90Mbps
Network adapter set to full duplex, 100 Mbps: < 0.1Mbps! Specific combo of switch and adapter caused this performance degradation! Lesson: Check specs & configuration!
Agenda
Stats infrastructure in vSphere

4. Rollups
VM VM VM VM VM DB ESX
3. Send 5-min stats to DB

vCenter Server (vpxd, tomcat)
VM VM VM VM VM ESX
VM VM VM VM VM 1. Collect 20s
2. Send 5-min stats to vCenter
ESX
and 5-min host and VM stats
Rollups
DB
1. 2. 3. 4.
Past-Day (5-minutes) Past-Week Past-Week (30-minutes) Past-Month Past-Month (2-hours) Past-Year (Past-Year = 1 data point per day)
DB only archives historical data

Real-time (i.e., Past hour) NOT archived at DB Past-day, Past-week, etc. Stats Interval Stats Levels ONLY APPLY TO HISTORICAL DATA
Anatomy of a stats query: Past-hour (RealTime) Stats
VM VM VM VM VM
1. Query
Client vCenter Server (vpxd, tomcat) ESX
3. Response
VM VM VM VM VM
2. Get stats from host
ESX
No calls to DB Note: Same code path for past-day stats within last 30 minutes
Anatomy of a stats query: Archived stats
2. Get Stats
VM VM VM VM VM
1. Query
Client vCenter Server (vpxd, tomcat) ESX
3. Response
VM VM VM VM VM ESX
No calls to ESX host (caveats apply) Stats Level = Store this stat in the DB
Agenda
Phew! Ok, How do I get these stats?

You want a chart like this?
PowerCLI CPU Usage for a VM for last hour: $vm = Get-VM Name Foo Get-Stat Entity $vm Realtime Maxsample 180 Stat
cpu.usagemhz.average
Grab appropriate fields from output, use graphing program, etc.
Looks simple Whats going on behind the scenes?

To get stats, this is what is going on FOR EACH GET-STAT CALL Retrieve PerformanceManager QueryPerfProviderSummary $vm QueryAvailablePerfMetric $vm QueryPerfCounter
Says what intervals are supported Describes available metrics
Verbose description of counters Query specification to get the stats
Create PerfQuerySpec QueryPerf

Get stats
Bottom line: The PowerCLI toolkit spares you detailsEasy to use!
PowerCLI Is so easy Why use Java / C#?

PowerCLI is great for scripting Stateless Hides details But with Java / C# You can squeeze out more performance! Much higher scalability
Pseudo code
PowerCLI Java perfCounter property Of PerformanceManager Get MOREF for each Get-Stat { QueryAvailablePerfMetric(); QueryPerfCounter(); QueryPerfProviderSummary(); create PerfQuerySpec(); QueryPerf(); } } Get MOREF QueryAvailablePerfMetric(); QueryPerfCounter(); QueryPerfProviderSummary(); create PerfQuerySpec(); for each Get-Stat { QueryPerf();
Performance implications: Need to write scalable scripts!

Entities
(cpu.usagemhz.average)
PowerCLI
(Time in secs)
Java
(Time in secs)
1 VM
9.2
14
6 VMs
11
14.5
39 VMs
101
16
363 VMs
2580 (43 minutes)
50
Highly-tuned Java Stats Collector
A Nave script that works for small environments may not be suitable for large environments Java provides opportunities for scalable, ongoing stats collection Lets examine Java code in more detail
GetPerfStats Main method
Get MOREF QueryAvailablePerfMetric(); perfCounter QueryPerfCounter(); QueryPerfProviderSummary(); create PerfQuerySpec(); for each Get-Stat { QueryPerf(); }
QueryAvailablePerfMetric QueryProviderSummary create PerfQuerySpec Get CounterIds Get MOREF
QueryPerf
GetPerfStats
Get MOREF QueryAvailablePerfMetric(); QueryPerfCounter(); QueryPerfProviderSummary(); create PerfQuerySpec(); for each Get-Stat { QueryPerf(); }
Get MOREF
Get the entity MOREF
GetPerfStats
perfCounter property Of PerformanceManager
Get MOREF
QueryAvailablePerfMetric(); QueryPerfCounter(); QueryPerfProviderSummary(); create PerfQuerySpec(); for each Get-Stat { QueryPerf(); }
Get CounterIds
Get available counterIDs from perfCounter property of PerformanceManager
Map human-readable stat name to counterID (e.g., cpu.usagemhz.average 101) QueryPerf () requires counterID
GetPerfStats
QueryPerfProviderSummary
All VMs have same value All Hosts have same value etc. Call once for a given entity type and store result
GetPerfStats
Create PerfQuerySpec
Use wild card
CSV output format
GetPerfStats
QueryPerf
So, what is Java / C# buying us?

Avoiding redundant work More compact return format (CSV vs. objects) Low-overhead tracking of ongoing inventory changes Etc.
If we dig deeper, we can optimize even more
Digging deeper: The PerfQuerySpec architecture

To grab counters: QueryPerf(PerfQuerySpec[] querySpec) PerfQuerySpec: Specifies which counters to grab
Entity (host, VM) Format (CSV, normal) MetricId StartTime EndTime IntervalID (20s, 300s) maxSample
PerfQuerySpec[]: [pQs1, pQs2, pQs3, ] Array of PerfQuerySpec objects pQs1, pQs2, pQs2 Can grab multiple stats using single QueryPerf call
Complexities of QueryPerf
How Does vSphere Process QueryPerf(querySpec[])? 1. vCenter receives queryPerf request with querySpec[] 2. vCenter takes each querySpec one at a time 3. vCenter gets data for each querySpec before processing next one Options for querySpec[]: 1. 1 entry 1 stat or set of stats for a single entity (e.g., all CPU) pQs1 pQs2 pQs3 2. Multiple entries. Examples:
VM1,cpu.* Each entry for a different entity Each entry for a different stat type, same entity VM1,cpu.* VM2,cpu.* H3,mem.*
VM1,net.*
VM1,mem. *
Implications of QuerySpec
Format of QuerySpec Allows Multiple Client Options 1. 2. 3. 4.
Grab each stat one at a time Grab a group of stats per entity at once Grab all stats for all entities at once Grab stats for a subset of entities at once
Some Tradeoffs: 1. Network processing (large result sets vs. small result sets) 2. Client aggregation overhead 3. vCenter processing (Each QueryPerf handled in a single thread)
What about in-guest stats?

Using VIX APIs: Create a script that can get what ever stats you are interested in. Make the script write the stats to a file. Copy file from the guest. Session covering this topic PPC-15 Guest Operations using VMware VIX APIs and Beyond
Back to the Future (1)

Now I know how to I convert this (many metrics on different charts)

To This (CPU, Memory, Disk, and Network on the same chart)
Combining metrics across VMs & Hosts
Comparing resource pools
Use VIX API + vSphere counters to get RP performance data
What about VMs running on a Host?

Memory usage of VMs on a Host
Summary, Part 1: Some useful Counters to monitor

Resource Metric
CPU Usage Ready SwapWait Memory Swapin, swapinrate Swapout, swapoutrate Disk commands totalLatency Usage Network Packets received, transmitted Usage
Host or Description VM?

Both VM VM Both Both Both Host Both Both CPU % used Ready to run, but limit or no available physical CPU CPU time spent waiting for host-level swap-in Memory ESX host swaps in from disk (per VM, or cumulative over host) Memory ESX host swaps out to disk (per VM, or cumulative over host) Operations done during stats refresh interval End-to-end disk latency (available for reads & writes) Disk bandwidth utilized (available for reads & writes) Operations done during stats refresh interval
Both
Network bandwidth used (available for reads & writes)
For completenessVM memory metrics

Metric
Memory Active (KB) Memory Usage (%) Memory Consumed (KB) Memory Granted (KB)
Description
Physical pages touched recently by a virtual machine Active memory / configured memory Machine memory mapped to a virtual machine, including its portion of shared pages. Does NOT include overhead memory. VM physical pages backed by machine memory. May be less than configured memory. Includes shared pages. Does NOT include overhead memory. Physical pages shared with other virtual machines Physical memory ballooned from a virtual machine Physical memory in swap file (approx. swap out swap in). Swap out and Swap in are cumulative. Machine pages used for virtualization
Memory Shared (KB) Memory Balloon (KB) Memory Swapped (KB) Overhead Memory (KB)
Host memory metrics

Metric
Memory Active (KB) Memory Usage (%)* Memory Consumed (KB) Memory Granted (KB) Memory Shared (KB) Shared common (KB) Memory Balloon (KB) Memory Swap Used (KB) Overhead Memory (KB)
Description
Physical pages touched recently by the host Active memory / configured memory Total host physical memory free memory on host. Includes Overhead and Service Console memory. Sum of memory granted to all running virtual machines. Does NOT include overhead memory. Sum of memory shared for all running VMs Total machine pages used by shared pages Machine pages ballooned from virtual machines Physical memory in swap files (approx. swap out swap in). Swap out and Swap in are cumulative. Machine pages used for virtualization
*For a cluster, mem.usage.average = (consumed + overhead)/total mem
Summary, Part 2: Cheat sheet

Rules of Thumb Ready Time > 20% sustained is undesirable Host-level swapping is bad, > 1MBps is especially bad Disk latencies > 20 ms BAD
Use IOmeter to assess disk bandwidth and latency
Network
run netperf to get network baselines
Summary, Part 3: SDK/API Tips and tricks

Collect static data once CounterIDs, metricIDs, MOREFs etc. Use Views to keep this data up to date. Reuse PerfQuerySpec as much as possible Use CSV format Reduces serialization cost and the size of metadata Choose metrics and query intervals carefully Query the real-time stats at a slower rate than the refresh rate Choose correct stats levels Use parallelism (multi-threaded clients)
Conclusion
vSphere gives a bunch of awesome charts If you want to see the data differently, use the API PowerCLI is great for simple scripts When designing for scalability, consider Java / C#
Resources
Developer Support Dedicated support for your organization when building solutions using vSphere
APIs, PowerCLI, vSphere Web Services SDKs and many more VMware SDKs
http://vmware.com/go/sdksupport PowerCLI Training 2 day instructor led training, 40% lecture, 60% lab http://vmware.com/go/vsphereautomation VMware Developer Community SDK Downloads, Documentation, Sample Code, Forums, Blogs http://developer.vmware.com Technology Alliance Partner (TAP) Program Updated partner benefits http://www.vmware.com/partners/alliances/programs/
Disclaimer
This session may contain product features that are currently under development. This session/overview of the new technology represents no commitment from VMware to deliver these features in any generally available product. Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind. Technical feasibility and market demand will affect final delivery. Pricing and packaging for any new technologies or features discussed or presented have not been determined.
These features are representative of feature areas under development. Feature commitments are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind. Technical feasibility and market demand will affect final delivery.
Backup slides
What about VMs across resource pools?

To This (CPU, Memory, Disk, and Network on the same chart)

V Sphere Apis For Performance Monitoring

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

V Sphere Apis For Performance Monitoring

Uploaded by

Copyright:

Available Formats

vSphere APIs for performance monitoring

London Workshop October 2010

Before memhog: no guest swapping

After memhog, guest swaps, but Host does not!

Even more motivation

Even more motivation

Resources we will look at

Resources that we often look at

2 vCPUs 2.2GHz/CPU ~4.4GHz used (Look at left y-axis)

Small ready time

Now, turn on CPU burner on same host

CPU burner ~100% of 1 vCPU

And see what happens to original VMs ready time

Latency to load in VM: cpu.swapwait.average

CPU takes 20s to load in data before VM can run!

CPU issues: Summary

Possible contention for CPU resources among VMs

Resources that we often look at

Ballooning, compression, and swapping (1)

VM1 F memctl 2. Reclaim 3. Redistribute Swap partition w/in Guest OS 1. Balloon

Ballooning, swapping, and compression (2)

Pages written to VM swap file

Swap Partition VSWP (w/in guest) (external to guest)

1. Force Swap 2. Reclaim 3. Redistribute

Ballooning, swapping and compression (3)

Pages written in-memory cache

faster than host-level swapping

Swap Partition (w/in guest)

1. Write to Compression Cache 2. Give pages to VM2

Ballooning, swapping, and compression

Ballooning is preferable to compression and vastly preferably to swapping

Swapping in Guest! = Swapping in Host

About to start memory hogger program in guest

Force Guest swapping: No Host-level swapping

Before memhog: no guest swapping

After memhog, guest swaps, but Host does not!

Viewing Host-level swapping with performance charts

Using Swap Rate Counters: Remember CPU SwapWait?

Cpu.swapwait.summation: CPU is waiting for memory to be swapped in

Absolute Swap Counters

Swapin, swapout (KB) show some activity but hard to detect

And Swap Rate Counters

Resources that we often look at

ESX storage stack

Disk performance problems 101

Another disk example: Slow VM power on

Lets look at the vSphere client

High disk latency: Mystery solved

Host events: disk has connectivity issues

Resources that we often look at

Network performance problems 101

Check VM settings: all VMs on proper networks?

Network performance troubleshooting

All VMs using same NIC (VM network)

All VMs using VM Network and sharing 1 physical NIC

Where do we begin? Check VM bandwidth

Check Host Bandwidth

Hmm are we sharing this NIC with multiple VMs?

All traffic is going through one NIC!

All traffic through one NIC on this host

Split VMs across multiple NICs. Bingo!

Network issues: Configuration woes