Professional Documents
Culture Documents
Balaji Parimi, Staff Engineer, Ecosystem Performance, VMware, Inc. Ravi Soundararajan, Senior Staff Engineer, Performance, VMware, Inc.
Motivation
To debug performance, why deal with this...?
Motivation
When you can deal with this instead?
More motivation
Why look at data like this?
More motivation
When you can look at it like this?
Why?
vSphere gives you awesome, helpful charts But you dont have to rely solely on these charts Do you want to learn how to make your own charts?
Keep watching
Goal
Teach you how to use our APIs for performance monitoring
Agenda
What sorts of stats are useful? How does vSphere retrieve them? How can you get these stats and use them yourself?
Useful stats
Basics of performance monitoring in virtual infrastructure Find underperforming resources Find overcommitted resources Identify issues due to resource sharing among VMs
CPU basics
Wait/Idle VM5 VM6 VM4 Ready Run
VM0
CPU0
VM1
CPU1
VM2
CPU2
VM3
CPU3
ESX
Run (accumulating used time) Ready (wants to run, no physical CPU available) Wait: blocked on I/O or voluntarily descheduled
Why is my VM slow?
CPU saturated (cpu.usage.average) Ready time? (cpu.ready.summation) Latency to be swapped in? (cpu.swapwait.summation)
CPU saturation
Ready time vCPU1: 150ms Real-time chart: refresh 20s 150ms / 20s = 0.75% (No big deal) Right y-axis is relevant
SpecJBB ready time ~2000ms = 10% (ps. SpecJBB perf. dropped by 10%)
High SwapWait time Consider setting memory reservation (see next section, Memory)
Memory
ESX must balance memory usage
Page sharing to reduce memory footprint of Virtual Machines Ballooning to relieve memory pressure in a graceful way Host swapping to relieve memory pressure when ballooning insufficient Compression to relieve memory pressure without host-level swapping
ESX allows over commitment of memory Sum of configured memory sizes of virtual machines can be greater than
physical memory if working sets fit
Memory also has limits, shares, and reservations Host swapping can cause performance degradation
Unused pages are given directly to memctl Pages to be swapped are first written to swap partition within guest OS and then
given to memctl
VM2
ESX
VM1
VM2
ESX
VM1
VM2
ESX
Compression Cache
Even if balloon driver has to swap to satisfy the balloon request, guest chooses what
to swap
Can avoid swapping hot pages within guest Compression: reading from compression cache is faster than reading from disk
Setup: 2 VMsone dvdstore, one memhog, competing for host memory Host swaps out dvdstore VM memory to fulfill memhog VM requests Host swaps in dvdstore VM memory to fulfill dvdstore VM requests
SwapinRate, SwapoutRate (KBps) show activity much more clearly Rule of thumb: host swapping > 1MBps is cause for concern
What are some things I can do? Make sure devices are configured properly (caches, queue depths) Use multiple adapters and multipathing Check networking settings (for iSCSI/NAS)
Rule of thumb: latency > 20ms is Bad. Here: 1,100ms REALLY BAD!!!
Max Disk Latencies range from 100ms to 1100msvery high! Why? (counter name: disk.maxTotalLatency.latest)
high latencies!
Bottom line: monitor disk latencies; issues may not be related to virtualization!
What are some things I can do? Check host networking settings
Full-duplex/Half-duplex 10Gig network vs 100Mb network? Firewall settings
Where do we start?
Hmm all VM traffic is going through 1 NIC Lets split the VMs across NICs
Network adapter set to full duplex, 100 Mbps: < 0.1Mbps! Specific combo of switch and adapter caused this performance degradation! Lesson: Check specs & configuration!
Agenda
What sorts of stats are useful? How does vSphere retrieve them? How can you get these stats and use them yourself?
VM VM VM VM VM ESX
VM VM VM VM VM 1. Collect 20s
ESX
Rollups
DB
1. 2. 3. 4.
Past-Day (5-minutes) Past-Week Past-Week (30-minutes) Past-Month Past-Month (2-hours) Past-Year (Past-Year = 1 data point per day)
VM VM VM VM VM DB ESX
VM VM VM VM VM
1. Query
Client vCenter Server (vpxd, tomcat) ESX
3. Response
VM VM VM VM VM
ESX
No calls to DB Note: Same code path for past-day stats within last 30 minutes
VM VM VM VM VM DB ESX
2. Get Stats
VM VM VM VM VM
1. Query
Client vCenter Server (vpxd, tomcat) ESX
3. Response
VM VM VM VM VM ESX
No calls to ESX host (caveats apply) Stats Level = Store this stat in the DB
Agenda
What sorts of stats are useful? How does vSphere retrieve them? How can you get these stats and use them yourself?
PowerCLI CPU Usage for a VM for last hour: $vm = Get-VM Name Foo Get-Stat Entity $vm Realtime Maxsample 180 Stat
cpu.usagemhz.average
Pseudo code
PowerCLI Java perfCounter property Of PerformanceManager Get MOREF for each Get-Stat { QueryAvailablePerfMetric(); QueryPerfCounter(); QueryPerfProviderSummary(); create PerfQuerySpec(); QueryPerf(); } } Get MOREF QueryAvailablePerfMetric(); QueryPerfCounter(); QueryPerfProviderSummary(); create PerfQuerySpec(); for each Get-Stat { QueryPerf();
PowerCLI
(Time in secs)
Java
(Time in secs)
1 VM
9.2
14
6 VMs
11
14.5
39 VMs
101
16
363 VMs
50
A Nave script that works for small environments may not be suitable for large environments Java provides opportunities for scalable, ongoing stats collection Lets examine Java code in more detail
Get MOREF QueryAvailablePerfMetric(); perfCounter QueryPerfCounter(); QueryPerfProviderSummary(); create PerfQuerySpec(); for each Get-Stat { QueryPerf(); }
QueryAvailablePerfMetric QueryProviderSummary create PerfQuerySpec Get CounterIds Get MOREF
QueryPerf
GetPerfStats
Get MOREF QueryAvailablePerfMetric(); QueryPerfCounter(); QueryPerfProviderSummary(); create PerfQuerySpec(); for each Get-Stat { QueryPerf(); }
Get MOREF
GetPerfStats
perfCounter property Of PerformanceManager
Get MOREF
Get CounterIds
Map human-readable stat name to counterID (e.g., cpu.usagemhz.average 101) QueryPerf () requires counterID
GetPerfStats
Get MOREF QueryAvailablePerfMetric(); QueryPerfCounter(); QueryPerfProviderSummary(); create PerfQuerySpec(); for each Get-Stat { QueryPerf(); }
QueryPerfProviderSummary
All VMs have same value All Hosts have same value etc. Call once for a given entity type and store result
GetPerfStats
Get MOREF QueryAvailablePerfMetric(); QueryPerfCounter(); QueryPerfProviderSummary(); create PerfQuerySpec(); for each Get-Stat { QueryPerf(); }
Create PerfQuerySpec
GetPerfStats
Get MOREF QueryAvailablePerfMetric(); QueryPerfCounter(); QueryPerfProviderSummary(); create PerfQuerySpec(); for each Get-Stat { QueryPerf(); }
QueryPerf
PerfQuerySpec[]: [pQs1, pQs2, pQs3, ] Array of PerfQuerySpec objects pQs1, pQs2, pQs2 Can grab multiple stats using single QueryPerf call
Complexities of QueryPerf
How Does vSphere Process QueryPerf(querySpec[])? 1. vCenter receives queryPerf request with querySpec[] 2. vCenter takes each querySpec one at a time 3. vCenter gets data for each querySpec before processing next one Options for querySpec[]: 1. 1 entry 1 stat or set of stats for a single entity (e.g., all CPU) pQs1 pQs2 pQs3 2. Multiple entries. Examples:
VM1,cpu.* Each entry for a different entity Each entry for a different stat type, same entity VM1,cpu.* VM2,cpu.* H3,mem.*
VM1,net.*
VM1,mem. *
Implications of QuerySpec
Format of QuerySpec Allows Multiple Client Options 1. 2. 3. 4.
Grab each stat one at a time Grab a group of stats per entity at once Grab all stats for all entities at once Grab stats for a subset of entities at once
Some Tradeoffs: 1. Network processing (large result sets vs. small result sets) 2. Client aggregation overhead 3. vCenter processing (Each QueryPerf handled in a single thread)
Both
Description
Physical pages touched recently by a virtual machine Active memory / configured memory Machine memory mapped to a virtual machine, including its portion of shared pages. Does NOT include overhead memory. VM physical pages backed by machine memory. May be less than configured memory. Includes shared pages. Does NOT include overhead memory. Physical pages shared with other virtual machines Physical memory ballooned from a virtual machine Physical memory in swap file (approx. swap out swap in). Swap out and Swap in are cumulative. Machine pages used for virtualization
Memory Shared (KB) Memory Balloon (KB) Memory Swapped (KB) Overhead Memory (KB)
Description
Physical pages touched recently by the host Active memory / configured memory Total host physical memory free memory on host. Includes Overhead and Service Console memory. Sum of memory granted to all running virtual machines. Does NOT include overhead memory. Sum of memory shared for all running VMs Total machine pages used by shared pages Machine pages ballooned from virtual machines Physical memory in swap files (approx. swap out swap in). Swap out and Swap in are cumulative. Machine pages used for virtualization
Network
run netperf to get network baselines
Conclusion
vSphere gives a bunch of awesome charts If you want to see the data differently, use the API PowerCLI is great for simple scripts When designing for scalability, consider Java / C#
Resources
Developer Support Dedicated support for your organization when building solutions using vSphere
APIs, PowerCLI, vSphere Web Services SDKs and many more VMware SDKs
http://vmware.com/go/sdksupport PowerCLI Training 2 day instructor led training, 40% lecture, 60% lab http://vmware.com/go/vsphereautomation VMware Developer Community SDK Downloads, Documentation, Sample Code, Forums, Blogs http://developer.vmware.com Technology Alliance Partner (TAP) Program Updated partner benefits http://www.vmware.com/partners/alliances/programs/
Disclaimer
This session may contain product features that are currently under development. This session/overview of the new technology represents no commitment from VMware to deliver these features in any generally available product. Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind. Technical feasibility and market demand will affect final delivery. Pricing and packaging for any new technologies or features discussed or presented have not been determined.
These features are representative of feature areas under development. Feature commitments are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind. Technical feasibility and market demand will affect final delivery.
Backup slides