You are on page 1of 22

More Fun with vSphere Alarms

Disclaimer
This document is provided as is. It is not part of official VMware product documentation.

Contents
Disclaimer ............................................................................................................................................ 1 1. Alarm trigger types .......................................................................................................................... 2 Whats new in 4.1 ................................................................................................................................ 3 2. Using default alarms ........................................................................................................................ 4 Why you need to define actions for default alarms............................................................................ 4 Moving alarms around ........................................................................................................................ 7 3. 4. 5. 6. 7. 8. 9. 10. 11. Event trigger details ........................................................................................................................ 7 Alarm actions ................................................................................................................................. 11 Putting it all together .................................................................................................................... 11 How do I copy alarm definitions between vCenter servers? ........................................................ 14 How do I create an alarm thats based on a certain vCenter event? ............................................ 15 How can vSphere alarms help with managing security and compliance? .................................... 17 Monitoring HA ............................................................................................................................... 18 What if Im in Germany? ........................................................................................................... 21 Conclusion ................................................................................................................................. 22

About me ........................................................................................................................................... 22

Horst Mundt, Sr. Technical Account Manager VMware, 2010 1

In terms of Alarms, vCenter 4 has much more to offer than vCenter 2.5. There is a whole range of default alarms available when you install vCenter 4, and they will give you a very good first shot for monitoring your vSphere environment. If youve never wondered what exactly the default alarms mean, or how to tune them thats fine. If youre interested in a bit more detail read on. This doc assumes that you are familiar with vSphere alarms in general. I wont explain every detail. There is also a great introduction to vSphere alarms at http://www.vmworld.com/docs/DOC-3766.

1. Alarm trigger types


vCenter 4 has three different types of alarm triggers: event triggers, condition triggers , and state triggers. Confused by condition vs. state ? I was, since they can both translate to the same word in my native language. So heres what they mean in vSphere: - A condition trigger always refers to a numeric value exceeding a certain threshold. Example: CPU Usage in MHz > 500 - A state trigger always corresponds to one element out of a discrete set of (non-numeric) possible states that a managed entity can have with regard to a given property. For instance, the possible states that the Host connection state property can be in are connected, not connected, or not responding. You can actually combine condition and state triggers within a vSphere alarm definition:

The third trigger type event triggers - cannot be combined with any of the two other types in any vSphere alarm definition. As the name implies, event triggers relate to certain events that happened in the vSphere environment, for example a VM was powered of, an ESX host lost access to its storage etc. Sometimes this can be a little bit confusing. Should you be looking for a state trigger or an event trigger if you want to create a new alarm for a certain situation? The first place to look for this kind of information is the vSphere Basic System Administration guide (BSA) (http://www.vmware.com/pdf/vsphere4/r40_u1/vsp_40_u1_admin_guide.pdf). For vSphere 4.1 this has been moved to the Datacenter Administration Guide Horst Mundt, Sr. Technical Account Manager VMware, 2010 2

(http://www.vmware.com/pdf/vsphere4/r41/vsp_41_dc_admin_guide.pdf). It has a good section on Alarm Triggers. However it does not give you the triggers at-a-glance, so Ive created an Excel sheet that lists all the condition, state, and event triggers from the BSA and can be used as a planning sheet for setting up your vSphere alarms. Thats the sheet called Alarm Triggers from BSA in the attached Excel workbook.

Whats new in 4.1


There were not too many change in vSphere 4.1. Here are the triggers that are listed in the 4.1 manual but not in the 4.0 manual:
Entity Virtual Machine Virtual Machine Virtual Machine Virtual Machine Virtual Machine Virtual Machine Virtual Machine Virtual Machine Virtual Machine Virtual Machine Virtual Machine Virtual Machine Virtual Machine Virtual Machine Virtual Machine Virtual Machine Virtual Machine Virtual Machine Virtual Machine Cluster Cluster Trigger Type Trigger Name / Event Category Event Event Event Event Event Event Event Event Event Event Event Event Event Event Event Event Event Event Event Event Event Deployment Deployment Deployment Deployment Deployment Deployment Deployment Deployment Deployment Deployment Deployment Deployment Deployment Deployment Deployment Deployment Deployment Deployment HA HA HA Description / Available Events VM Created VM auto renamed VM being cloned VM being creating VM deploying VM emigrating VM hot migrating VM migrating VM reconfigured VM registered VM removed VM renamed VM relocating VM upgrading Cannot complete clone Cannot migrate Cannot relocate Cannot upgrade Insufficient failover resources Cluster overcommitted Virtual Machine heart beat failed

The HA Cluster overcommitted trigger sounds quite useful. Horst Mundt, Sr. Technical Account Manager VMware, 2010

Personally I find that one of the most useful changes in 4.1 is that you can now get an alarm if an uplink on a distributed switch fails. This was not possible in 4.0. Note that the Basic System Administration Guide does not list every possible trigger. In fact it does not even list all the triggers that are available in the vSphere Client. For a complete listing of the event triggers that are new in 4.1 please refer to the tab called All_API_Events_41 in the attached excel sheet. It has a column that allows you to filter for events that are (not) available in vSphere 4.0. In total , 4.1 has 117 new event triggers.

2. Using default alarms


The vSphere Basic System Administration Guide also lists most of the default alarms defined in vSphere. Again, Ive copied them to an excel sheet for easier use in planning. Thats the sheet called vSphere default alarms in the attached Excel workbook.

Why you need to define actions for default alarms


The datacenter administration guide says VMware provides preconfigured alarms for the vCenter Server system that trigger automatically when problems are detected. You only need to set up actions for these alarms. And indeed you should set up alarm actions for those default alarms. This is especially important for the so-called stateless alarms. Lets have a look at an example: The cannot connect to storage alarm is probably quite a useful alarm to have. If we look at its definition we see that the triggers have a status of unset.

Horst Mundt, Sr. Technical Account Manager VMware, 2010 4

The alarm has an action defined , it sends an SNMP trap.

So whats going to happen if I remove the storage from one of my hosts? Lets look at the vCenter events for that host:

Looking at the events from bottom up we see: 1. 2. 3. 4. A Lost connectivity to storage device event is generated The alarm state changed from gray to gray It triggers an action SNMP trap is sent

We also see that the affected host does not show any errors:

Horst Mundt, Sr. Technical Account Manager VMware, 2010 5

So what would have happened if I had not had any alarm action defined on that alarm? Well, nothing. Id never notice that there was a problem with the host unless Id take a close look at the vCenter events. Now lets try something different. We take a look at the Network connectivity lost alarm:

Other than the storage alarm it has a status setting of Alert. Ive removed the default Send trap action:

So lets see what happens if I remove a network connection from one of my ESX hosts.

Horst Mundt, Sr. Technical Account Manager VMware, 2010 6

As we see, the host turns red. Id probably notice this in my vSphere client. I dont get a notification though, since I dont have an alarm action defined. Key takeaway: Always define suitable actions for the default alarms. Otherwise they might be less useful then youd expect.

Moving alarms around


If you have worked with the default alarms in vSphere you will have noticed that they are defined on top level, i.e. they apply to all objects that are managed by your vCenter server. One question I get to hear frequently is Can I overwrite alarm settings on a lower level?. Why would you want to do that? Well, lets say your alarm action is set to send an email if an ESX host gets disconnected. All these emails eventually generate an SMS to your mobile phone. You have to react 24x7. You dont want to spend your weekends hunting after alarms generated by some notso-important hosts in your test environment. Fortunately, thats an easy one. You can disable alarm actions on any managed entity (datacenters, clusters, hosts, VMs, datastores, ) by right-clicking the entity in the vSphere client and choosing disable alarm actions. No more SMS on Sundays.

However, this turns off alarm actions completely (Alarms will still be shown in the vSphere client, but no more emails, snmp traps, script executions , or other actions). If you want to keep some of the default alarms on a given entity, but disable others and/or change the alarm actions on some, the theres unfortunately no built-in way to do this. But theres a great PowerCLI script by LucD that can help you achieve this. It basically copies (or moves, as you like) the alarm definitions to lower levels in the object hierarchy where you can modify them. Check http://lucd.info/?p=1799 for details.

3. Event trigger details


We will now focus on event triggers. And of course the first question that comes to mind is Wheres the list of event triggers that are available in vSphere?. Horst Mundt, Sr. Technical Account Manager VMware, 2010 7

And the short answer to that is Its in the attached Excel sheet. Thats the sheet called All_API_events in the attached Excel workbook. The long answer is well slightly longer: You can define event triggers on any event thats available in the vSphere API. The vSphere API reference is available at http://www.vmware.com/support/developer/vc-sdk/visdk400pubs/ReferenceGuide/index.html. Have fun. Not the answer you were looking for? OK, heres some more detail. Actually the API provides a lot of information that can be gathered by querying you vCenter server. If you have a close look at the API reference youll notice that it does not list the event triggers. So I guess they can change between vCenter releases (at least slightly). Heres a PowerCLI scriptlet I used to query the event triggers from vCenter 4.0 U1.
connect-viserver $eventMan = get-view eventManager $eventMan.get_Description() | select -expand Eventinfo | Export-Csv NoTypeInformation

Easy, huh? Of course I manually formatted the resulting excel sheet and also added some grouping. Thats the first column in the sheet, and its entirely mine, including any potential mistakes The 422 different event trigger may be a bit overwhelming, so lets get back to the default alarms based on event triggers for the moment. Of course the alarm names in vSphere client are pretty selfexplaining, but maybe you want to know exactly how the alarms work. Lets have a closer look at the anatomy of an event trigger. If youve worked with condition and state alarms you may have noticed that the trigger conditions can be combined in an OR-fashion or in an AND-fashion. In the vSphere client this is called Trigger if any conditions are satisfied (ORfashion) or Trigger if all of the conditions are satisfied (AND-fashion):

Now with event triggers you dont have that choice:

Horst Mundt, Sr. Technical Account Manager VMware, 2010 8

With event triggers, different trigger expressions are always combined in an OR-fashion, i.e. the alarm will trigger if any of the events happens. But you can see that there are advanced settings associated with this trigger. Well get back to that in a moment (BTW you should always read this as advanced setting may be associated with this trigger). So where are the exact definitions of the default event triggers in vSphere? Again, the short answer is in the attached excel. Thats the sheet called Default_event_triggers in the attached Excel workbook. And, again, the long answer is they can be retrieved using the API. The Powercli code to get them is in the attached Get_Alarms2.ps1. Its a tad more complicated than the previous script.

If you look at the excel sheet, youll notice that some triggers have additional fields called Comparisons. Thats what they are called in the vSphere API. In the VI client they are called Advanced Settings. Example:

Horst Mundt, Sr. Technical Account Manager VMware, 2010 9

Corresponds to:

Horst Mundt, Sr. Technical Account Manager VMware, 2010 10

4. Alarm actions
Now that weve covered the alarm triggers, lets have a short look at the actions that can be taken when an alarm is triggered. Heres what the vSphere client offers:

As you can see, the available options differ slightly, depending on if you are setting an alarm on a host or on a VM. You should be used to the notification stuff from vCenter 2.5, but the other options are new in vSphere and they are pretty powerful . My favorite for testing is Run a command. This will run a script on vCenter server that can be used to process the alarm information and pass it on to any other monitoring toll of your choice. vCenter will pass certain information on the alarm to the script by using environment variables. Well see how that works in a moment.

5. Putting it all together


Heres an example. We define a custom alarm that will trigger when a VM is powered down, and execute a script (in a real environment youd probably rather send an SNMP trap or an email, but lets do the script for educational purposes). Heres the script:

Pretty basic, just writes the message alarm triggered into a file, and appends the environment variables. Heres our alarm definition:

Horst Mundt, Sr. Technical Account Manager VMware, 2010 11

So our expectation is that this will run the script called C:\alarm.cmd whenever a VM gets powered off. And indeed it will:

Horst Mundt, Sr. Technical Account Manager VMware, 2010 12

Lets have a look at the file C:\alarm.txt (thats where our alarm action script wrote its output). We see that indeed the script has generated the message we expected, and the environment variables contain useful information about the alarm that can be consumed by other tools:

If you want to do some more advanced stuff, make sure to read http://blogs.vmware.com/vipowershell/2009/09/how-to-run-powercli-scripts-from-vcenteralarms.html Now the fun part starts. Take the same alarm definition that triggers when a VM is powered down. But change the alarm action from Run a command to Power on VM.

Horst Mundt, Sr. Technical Account Manager VMware, 2010 13

What will this give us? It will give us an alarm that is triggered whenever a VM is powered down, and as an action it will immediately power on that VM. Go ahead and try it. The vSphere API for managing alarms is powerful, but it does not give away its treasures easily. For instance, if you try the reverse of the above example (alarm that triggers when VM is powered on, and then power off VM ), you may find that it does not work. If youre running in a DRS cluster, try the DRS VM powered on trigger instead. Word of warning: Dont try to apply this to your whole cluster if you vCenter server is running in a VM.

6. How do I copy alarm definitions between vCenter servers?


Imagine youve put a lot of time and effort into fine tuning your alarms definition in one of your vCenter servers. Now you want to have the exact same alarm definitions in another vCenter server. Its probably a bit more complicated to achieve this than you might imagine. Copying event based alarms definitions is rather straightforward using the vSphere API. But alarms that are based on performance counters require some extra work. Lets have a look at an example: We define an alarm that triggers if the host disk utilization exceeds a certain threshold:

If we look at the alarm definition through the API, we see that the alarm refers to a certain performance metric that is identified by a counter ID:

In this case the counter ID is 101. So the alarm definition has no direct information about disk usage it just refers to a performance counter. We can get the details on that performance counter by using the vSphere API performance managers QueryPerfCounter method. And indeed well find that this is the performance counter for average disk usage:

Horst Mundt, Sr. Technical Account Manager VMware, 2010 14

Now if we just created an alarm in the destination vCenter using the same counter ID we could run into a situation where counter ID 101 means something completely different in the destination vCenter, especially if the destination vCenter is a different version. So we need to remember the semantics of the performance counters , not just the IDs. Attached is a sample script that shows the whole process of copying alarms definitions between vCenter servers.

7. How do I create an alarm thats based on a certain vCenter event?


Sometimes you see an event in vCenter and would like to create an alarm that triggers whenever that event happens. Lets say you want to raise an alarm every time someone changes a custom field on a virtual machine. In the vCenter events this shows up like this:

Unfortunately theres no such thing in the drop down list for event triggers in vCenter. So the first step is to find out the internal (API) name of that specific event. The Excel sheet that comes with this document has two tabs called All_API_Events (one for 4.0, one for 4.1). We search one of these tabs for the string Changed custom field:

Horst Mundt, Sr. Technical Account Manager VMware, 2010 15

This gives us the API Name for the event: CustomFieldValueChangedEvent. Now we have two options. Either we google for a script that creates event based alarms (heres a good one: http://www.lucd.info/2009/11/27/alarm-expressions-part-2-event-alarms/#more-1058) and modify it to suit our needs. Or we use a simple trick. Have you ever noticed that the Drop down list for Event triggers in the vSphere client isnt really what it seems to be? You can actually just type something instead of selecting one of the predefined options. So we type our CustomFieldValueChangedEvent (prefixed by vim.event.).

Now we change the content of the custom fields on one of our VMs, et voil

we get a nice alarm on that VM. Its not a very useful alarm, but you get the meaning. Note that this trick is strictly speaking probably not supported , and theres no guarantee it will work in future vCenter releases, but I couldnt find any difference between an alarm that was generated using a script and an alarm generated using this GUI shortcut. If you have a closer look at the FullFormat column on one of the All_API_Events sheets youll notice that some of the events description start with things like esx.clear, com.vmware, esx.problem or vprob.net:

Horst Mundt, Sr. Technical Account Manager VMware, 2010 16

If you want to use these triggers you dont prefix them with vim.event.

8. How can vSphere alarms help with managing security and compliance?
Heres an example that may be more useful in a real life environment. If your environment has specific security or compliance requirements youll probably want to get notified if someone changes roles or permissions in vCenter. Like this:

Horst Mundt, Sr. Technical Account Manager VMware, 2010 17

9. Monitoring HA
You might want to get notified if HA restarts one of your VMs on another host because the original host failed. This is quite easy to do by defining an appropriate VM alarm:

However you might be even more interested in knowing if HA failed to restart a VM. So lets look at the vCenter events for some failed restarts. We might encounter a situation where a surviving host in the HA cluster has insufficient resources to start the VM:

Youll get an event like this also if the VM is connected to a port group that does not exist on the host thats trying to restart it (which is a very broad interpretation of not enough resources if you ask me). We also might encounter a situation where a surviving host tries to restart a VM but fails. This is usually the case if the original host is isolated but still keeps a lock on the storage.

The interesting thing about this is that the events pop up in vCenter while the VM is disconnected:

Horst Mundt, Sr. Technical Account Manager VMware, 2010 18

After some time HA will give up and see an event telling us that HA has reached the maximum retry count for this VM1:

So lets try to create an alarm that goes to yellow (warning) state on a Failover unsuccessful event and goes red on the Not enough resources or Reached maximum restart count event. First we look up the Not enough resources for failover event:

Obviously the event is called NotEnoughResourcesToStartVMEvent. Nice. Now we look up the Reached maximum restart count event:

This ones called VmMaxRestartCountReached.


1

This is five by default, determined by the parameter das.maxvmrestartcount. See VMware KB 1009625 for details

Horst Mundt, Sr. Technical Account Manager VMware, 2010 19

Turns out there is a predefined item in the vsphere client for the unsuccessful failover event, so we just use that one. Heres the alarm:

We test the alarm by disconnecting all network interfaces from a host while keeping the connection to the storage. This will provoke an unsuccessful failover event, since I have the isolation response in my cluster set to leave powered on and thus the original host will keep the locks on the VM files. By the way if you want to try this for yourself and dont have a bunch of ESX hosts that you can use for testing: VMware Workstation 7 is a great tool for doing things like this. Just make sure you have enough RAM in your host, and include an iSCSI appliance for shared storage in your setup. Lets see what happens. First after roughly a minute we see a warning pop up on the VM:

We also see that a trap is sent (because I configured the alarm to do so):

Horst Mundt, Sr. Technical Account Manager VMware, 2010 20

If we are patient enough well again see HA give up after some time:

and the VM has an Alert, as we would expect:

In a production environment, youd probably want to send this trap or an email to a management system, so someone can react to it.

10.

What if Im in Germany?

vCenter server is available in localized versions in German , French , and Japanese. Now many vSphere admins prefer to have it in English, especially in large international companies. One thing that is particularly annoying about vCenter is the fact that it switches its messages to German as soon as it runs on machine that has German regional settings. I guess its similar in French and Japanese. The only way to prevent this is to replace the message files in the de (fr,jp) locale folder with the ones from the en folder. See VMware KB 1015646 for details.

Horst Mundt, Sr. Technical Account Manager VMware, 2010 21

11.

Conclusion

I hope this has given you some ideas what can be done with alarms in vSphere. I strongly recommend that you do any tests in an environment that is separate from your production systems. Have fun.

About me
I am a Senior Technical Account Manager for VMware in Germany. I work with customers who have fairly large VMware deployments since 2008. Monitoring the environment is a topic that is always good for discussions. Most of the content in this document has been inspired by discussions with customers and colleagues.

Horst Mundt, Sr. Technical Account Manager VMware, 2010 22

You might also like