You are on page 1of 10

 Installation of ESX Servers, vSphere Client and Vcenter Server

 Create the datastore for storing VMs and data


 Deploy Virtual Machines and use Clones, Snapshots, templates
 Health Check for VMs and ESX host
 Present / Assign LUN to esx host (Actually a storage admin task)

Errors

Always checks system logs with error timestamp to see if it was caused by a system error
For example: IBM® InfoSphere® Master Data Management Server for Product Information
Management reporting system error saving a file to docstore should result in the System Expert checking
the system logs to see if it is an IO problem or disk full or file system problem.

Monitor

Monitors all system errors and critical system messages


Checks for disk space getting full

Performance

When performance problems arise determines IO bandwidth, memory, swapping, and CPU usage to
see if there is a bottleneck in the current hardware setup
Checks for the existence of zombie or defunct processes and determines cause of freeze

Client PCs

Determines if different software packages on PC might be causing a problem


If cannot determine, then remove all non standard packages and see if problem goes away
Tracks modifications to all PC settings to determine if a problem was caused by configuration
changes
For example: Internet Explorer and network settings

Network

Configures and maintains load balancer (if applicable) and knows when it may be the cause of a
problem
For example: if unexpectedly logged out of the system, correlates time of the problem with load
balancer logs to see if cause was load balancer
Might bypass load balancer completely for a period of time to see if it fixes a problem
Configures and maintains proxy server
If using a proxy server ensures all relevant proxy server http caches are flushed when a IBM
InfoSphere Master Data Management Server for Product Information Management patch is installed
Monitors network bandwidth
 1. General admin tasks - creation, deletion and migration of VM's.

 2. Backups

 3. Performance monitoring of physical hosts and VM's.

 4. Patch management of hosts and VM's

 5. Capacity planning - disk, cpu, network and memory.

 6. Security and audit logs - who is doing what and when.

But maintenance isn't just about sustaining uptime. It ensures that you get the
most out of your investments. ESX, vSphere, vCenter, and Workstation are
well-established virtualization technologies, but they still require routine
maintenance.

Take the time to understand your infrastructure, and create a VMware


maintenance schedule to meet service-level agreements and minimize job
stress. The following guidelines cover some best practices, but every virtual
infrastructure is different, so determine what works best for you.

VMware maintenance tasks can be roughly broken down by frequency: daily,


weekly and monthly.

Daily VMware maintenance tasks


I carry out the following maintenance tasks on a daily basis:

 Monitor mailbox for alarms. Mailbox monitoring is ongoing, passive and,


once the appropriate alarms are configured, doesn't require much effort. As
you become more familiar with your infrastructure, you can differentiate
between alarms that require immediate attention and ones that are
indications of a gradual baseline shift.
If a particular database job triggers CPU usage alarms at the same time
each day, it's wise to adjust your alarm tolerance and frequency.
Remember "The Boy Who Cried Wolf" Aesop's fable? Don't let persistent
alarms lure you into ignoring important ones.

 Visit the server room. I have alerts configured, but I can tell exactly what
is happening after a quick visual of a host's component displays -- which is
helpful, especially if I miss an alarm or alert.

 Are all the fans running? Are the memory sticks error-free? Do the storage
area network (SAN) drives look good? I look at the hardware light-emitting
diodes for a basic status update. I also quickly check the uninterruptible
power supplies' voltage, as well as the current runtime. From that
information, In case of a disaster, I know how much time I need to power
off equipment. In addition to these quick visuals, I often use Hewlett-
Packard Co.'s integrated Lights-Out (or iLO) ports on ESX hosts to check
the hardware status, including temperature levels -- which is mandatory if
you don't have physical access to the hosts or SAN.

 Look around vCenter/VirtualCenter. I check out any tasks that haven't


been completed, glance at the ESX host performance and get a feel for
how everything is working. When you are familiar with how the systems
perform on an average day, it's easier to isolate problems. And believe me,
problems will emerge at some point -- no matter how well your system is
tuned. This process is similar to knowing your resting heart rate, and
periodically checking it on a treadmill or bike.

Weekly VMware maintenance tasks


Each week, I perform the following activity:

 Back up vCenter/VirtualCenter database. My infrastructure does not


change often, so I perform weekly database dumps and full backups of my
management server. If your infrastructure is more dynamic, you should do
these tasks more frequently. It cannot be emphasized enough: In case you
need to rebuild, have good database backups.

Monthly VMware maintenance tasks


I carry out the following maintenance chores each month:

 Clean up storage. If there are any unnecessary snapshots, it's a good


idea to get rid of them. If you are unsure whether you have old snapshots,
you can find them with VMware's SiteSurvey.

 Revisit support agreements. Are your support agreements up to date? Is


it time to start writing purchase orders to make sure you have proper help?

 Envision future improvements. So far, the focus has been on keeping


the environment functioning. Take a step back, close your eyes and just
think about what you would like the environment to do, instead of the other
way around. How can it make your business better? Map out a way to get
there. It sounds like daydreaming, but it may be some of your most
productive time all day.
I agree with Ed that this is a general question and most responses will typically include one of
VMware's favorite answers to most questions..."it depends" . That being said, here are a few of the
tasks to get you started that I believe are important to the daily health of the environment. Our
environment is over 90% virtualized, but I do wear multiple hats, so what I am, "able" to do versus
what I think, "should" be done is always an endless struggle.

Prerequisites:
Overall, you want to have a thorough knowledge of the virtual inventory you are supporting. You
will want to know the answers to the following questions. How many servers, physical and
virtual? What models? Hardware specs? Software versions? Have you (or your company)
decided on a specific ROI for virtualization (number of VMs/Host, etc)? A lot of this information can
be gathered through scripts that can be run on a scheduled basis to give you that 10,000 foot view
of your architecture. These reports will also probably identify some items that need to be
addressed, which will add further to your tasks. After that you will be better prepared to dig into the
minutae.
 Check the health of all Hosts and VM objects in vCenter. Are there any active alarms in vCenter? Have you
set up any alarms in the first place? Do the alarms automatically trigger notification or any type of incident
tracking mechanism?
 Are all vCenter plug-ins functioning properly?
 Do you have any Host Hardware issues? Alarms, bad memory, power supply or capacity issues?
 Are all Hosts in compliance with Host Profiles?
 Are there any resource bottlenecks? Memory, CPU, Disk, Network? Do you have any, or need any, additional
tools to have a better handle on this?
 Are you running at your optimum resource levels? In other words, is the load properly distributed?
 Are you running out of resources anywhere? LUNs with low disk space, etc. Do you need to start looking at
budgeting for additional capacity?
 Check for Firmware updates on Host hardware
 Check for ESX Patches
 Check for VM Patches
 Check VMware Tools version
 Run scripts to identify the existence of VMs with snapshots and follow up to see if they are still needed.
 Have you schmoozed with your Storage Admins lately? A good idea since you cannot get very far without
them.
Troubleshooting ESX/ESXi virtual machine performance
issues (2001003)
Symptoms

 Services running in guest virtual machines respond slowly.


 Applications running in the guest virtual machines respond intermittently.
 The guest virtual machine may seem slow or unresponsive.

Resolution
This article provides information on isolating a performance issue on ESXi/ESX. Bad performance can be caused by
several different areas: CPU constraints, memory overcommitment, storage latency, or network latency. If one or
more of your virtual machines has a bad response time, consider each of these areas to find the bottleneck.

Each step below provides instructions and links to the appropriate documents.

The steps are ordered in the most appropriate sequence to isolate the issue and to identify the proper resolution.
They are also ordered in the most appropriate sequence to minimize data loss.
Note: After completing each step, determine whether the performance issue still exists. Work through each
troubleshooting step in order, and do not skip a step.

This article includes four main sections:

 CPU constraints
 Memory overcommitment
 Storage Latency
 Network latency

CPU constraints
To determine whether the poor performance is due to a CPU constraint:

1. Use the esxtop command to determine if the ESXi/ESX server is being overloaded. For more information
about esxtop, see the Resource Management Guide for your version of ESXi/ESX:

o ESXi 6.0
o ESXi 5.5
o ESXi 5.1
o ESXi 5.0
o ESXi/ESX 4.1
o ESX 4.0
o ESX 3.5:
 Update 2 and later
 Initial Release and Update 1
o ESX 3.0
i. Examine the load average on the first line of the command output.

A load average of 1.00 means that the ESXi/ESX Server machine’s physical CPUs are fully utilized,
and a load average of 0.5 means that they are half utilized. A load average of 2.00 means that the
system as a whole is overloaded.

j. Examine the %READY field for the percentage of time that the virtual machine was ready but could
not be scheduled to run on a physical CPU.

Under normal operating conditions, this value should remain under 5%. If the ready time values are
high on the virtual machines that experience bad performance, then check for CPU limiting:

 Make sure the virtual machine is not constrained by a CPU limit set on itself
 Make sure that the virtual machine is not constrained by its resource pool.

For more information, see Impact of virtual machine memory and CPU resource limits
(1033115).

If the load average is too high, and the ready time is not caused by CPU limiting, adjust the CPU load on the
host. To adjust the CPU load on the host, either:

o Increase the number of physical CPUs on the host

OR

o Decrease the number of virtual CPUs allocated to the host. To decrease the number of virtual
CPUs allocated to the host, either:
 Reduce the total number of CPUs allocated to all of the virtual machines running on the
ESX host. For more information, seeDetermining if multiple virtual CPUs are causing
performance issues (1005362).

OR

 Reduce the number of virtual machines running on the host.

2. If you are using ESX 3.5, determine whether IRQ sharing is an issue. For more information, see ESX has
performance issues due to IRQ sharing (1003710).

Memory overcommitment
To determine whether the poor performance is due to memory overcommitment:

1. Use the esxtop command to determine whether the ESXi/ESX server's memory is overcommitted. For
more information about esxtop, see theResource Management Guide for your version of ESXi/ESX:

o ESXi 6.0
o ESXi 5.5
o ESXi 5.1
o ESXi 5.0
o ESXi/ESX 4.1
o ESX 4.0
o ESX 3.5:
 Update 2 and later
 Initial Release and Update 1
o ESX 3.0
i. Examine the MEM overcommit avg on the first line of the command output. This value reflects
the ratio of the requested memory to the available memory, minus 1.

Examples:

 If the virtual machines require 4 GB of RAM, and the host has 4 GB of RAM, then there is
a 1:1 ratio. After subtracting 1 (from 1/1), theMEM overcommit avg field reads 0.
There is no overcommitment and no extra RAM is required.
 If the virtual machines require 6 GB of RAM, and the host has 4 GB of RAM, then there is
a 1.5:1 ratio. After subtracting 1 (from 1.5/1), the MEM overcommit avg field reads
0.5. The RAM is overcommited by 50%, meaning that 50% more than the available RAM
is required.

If the memory is being overcommited, adjust the memory load on the host. To adjust the memory load,
either:

o Increase the amount of physical RAM on the host

OR

o Decrease the amount of RAM allocated to the virtual machines. To decrease the amount of
allocated RAM, either:
 Decrease the total amount of RAM allocated to all of the virtual machines on the host

OR

 Reduce the total number of virtual machines on the host.

2. Determine whether the virtual machines are ballooning and/or swapping.

To detect any ballooning or swapping:

. Run esxtop.
a. Type m for memory
b. Type f for fields
c. Select the letter J for Memory Ballooning Statistics (MCTL)
d. Look at the MCTLSZ value.

MCTLSZ (MB) displays the amount of guest physical memory reclaimed by the balloon driver.

e. Type f for Field


f. Select the letter for Memory Swap Statistics (SWAP STATS).
g. Look at the SWCUR value.

SWCUR (MB) displays the current Swap Usage.

To resolve this issue, ensure that the ballooning and/or swapping is not caused by the memory limit being
incorrectly set. If the memory limit is incorrectly set, reset it correctly. For more information, see:

o Impact of virtual machine memory and CPU resource limits (1033115)


o Balloon driver retains hold on memory causing virtual machine guest operating system performance issues
(1003470)

Storage Latency
To determine whether the poor performance is due to storage latency:

1. Determine whether the problem is with the local storage. Migrate the virtual machines to a different storage
location.
2. Reduce the number of Virtual Machines per LUN.
3. Look for log entries in the Windows guests that look like this:

The device, \Device\ScsiPort0, did not respond within the timeout


period.

4. Using esxtop, look for a high DAVG latency time. For more information, see Using esxtop to identify storage
performance issues (1008205).
5. Determine the maximum I/O throughput you can get with the iometer command. For more information,
see Testing virtual machine storage I/O performance for VMware ESXi and ESX (1006821).
6. Compare the iometer results for a VM to the results for a physical machine attached to the same storage.
7. Check for SCSI reservation conflicts. For more information, see Analyzing SCSI Reservation conflicts on
VMware Infrastructure 3.x and vSphere 4.x (1005009).
8. If you are using iSCSI storage and jumbo frames, ensure that everything is properly configured. For more
information, see:

o iSCSI and Jumbo Frames configuration on ESX/ESXi (1007654)


o Enabling IOAT and Jumbo frames (1003712)
o Enabling Jumbo Frames for VMkernel ports in a virtual distributed switch (1038827
9. If you are using iSCSI storage and multipathing with the iSCSI software initiator, ensure that everything is
properly configured. For more information, see these sections of the iSCSI SAN Configuration Guide:

o Networking Configuration for Software iSCSI and Dependent Hardware iSCSI


o Bind iSCSI Ports to iSCSI Adapters

If you identify a storage-related issue:

1. Ensure that your hardware array and your HBA cards are certified for ESX/ESXi. For more information, see
the VMware Hardware Compatibility List.
2. Ensure that the BIOS of your physical server is up to date. For more information, see Checking your firmware
and BIOS levels to ensure compatibility with ESX/ESXi (1037257).
3. Ensure that the firmware of your HBA is up to date. For more information, see Slow performance caused by out
of date firmware on a RAID controller or HBA (1006696).
4. Ensure that the ESX can recognize the correct mode and path policy for your SATP Storage array type and
PSP Path Selection. For more information, see Verifying correct storage settings on ESX 4.x, ESXi 4.x and ESXi
5.0 (1020100).

Network latency
Network performance can be highly affected by CPU performance. Rule out a CPU performance issue before
investigating network latency.

To determine whether the poor performance is due to network latency:

1. Test the maximum bandwidth from the virtual machine with the Iperf tool. This tool is available
from https://code.google.com/p/iperf/

Note: VMware does not endorse or recommend any particular third-party utility.

2.
a. While using Iperf, change the TCP windows size to 64 K. Performance also depends also on this
value. To change the TCP windows size:

1. On the server side, enter this command:

iperf -s

2. On the client side, enter this command:

iperf.exe -c sqlsed -P 1 -i 1 -p 5001 -w 64K -f m -t 10


900M

For more information, see http://openmaniak.com/iperf.php.

1. Run Iperf with a machine outside the ESXi/ESX host. Compare the results with what you expect you should
have, depending on your physical environment.
2. Run Iperf with another machine outside the ESXi/ESX host on the same VLAN on the same physical switch.
If the performance is good, and the issue can only be reproduced with a machine at another geographical
location, then the issue is related to your network environment.
3. Run Iperf between 2 VMs on the same ESX server/portgroup/vswitch. If the result is good, you can exclude
a CPU, memory or storage issue.

If you identify a bottleneck on the network:

1. Work through the steps in Troubleshooting network performance issues (1004087).


2. If you are using iSCSI storage and jumbo frames, ensure that everything is properly configured. For more
information, see:
o iSCSI and Jumbo Frames configuration on ESX/ESXi (1007654)
o Enabling IOAT and Jumbo frames (1003712)
o Enabling Jumbo Frames for VMkernel ports in a virtual distributed switch (1038827)

3. If you are using Network I/O Control, ensure that the shares and limits are properly configured for your
traffic. For more information, see Network I/O Resource Management in vSphere 4.1 with vDS (1022585).
4. Ensure that traffic shaping is correctly configured. For more information, see Traffic Shaping Policy in the
ESXi/ESX Configuration Guide.

You might also like