You are on page 1of 16

Troubleshooting Windows Azure Applications

Typically on Windows Server Applications, troubleshooting is done by turning on IIS logs and event logs. These logs will survive restarts and developers examine them when a problem occurs. The same process can be followed in a Windows Azure application if remote desktop is enabled. Developers can connect to each instance to collect diagnostic data. The collection can then be done by simply copying the data to a local machine. However, this process is time consuming and will fail if the instance is reimaged. Also, it becomes quite impractical when dealing with many instances.

Windows Azure Diagnostics Windows Azure Diagnostics (WAD) provides functionality to collect diagnostic data from an application running on Windows Azure and store them on Windows Azure Storage.

Setup Collection The easiest way to setup WAD is to import the Windows Azure Diagnostics module to the applications service definition and then configure the data sources for which diagnostic data is to be collected.

<ServiceDefinition name="TroubleShootingSample" xmlns="http://schemas.microsoft.com/ServiceHosting/2008/10/ServiceDefinition" schemaVersion="2012-05.1.7"> <WorkerRole name="WorkerRole1" vmsize="Small"> <Imports> <Import moduleName="Diagnostics" /> </Imports> </WorkerRole> </ServiceDefinition>

www.aditi.com

A role instance configured with diagnostics module automatically starts the diagnostics monitor which is responsible for collecting diagnostic data. Only some of the data sources are added to the diagnostics monitor by default; however the rest must be added explicitly. The following table lists the types of diagnostic data that you can configure your application to collect. Collected by
Data Source Description Windows Azure Logs Trace messages sent to the trace listener DiagnosticsMonitorTraceListener which gets added by default to web.config or app.config. Information about IIS sites. Logs pertaining to diagnostics infrastructure, RemoteAccess and RemoteForwarder module. Information about failed requests to an IIS site or application. Logs events that are typically used for troubleshooting application and driver software Performance counters metrics. Mini/full crash dumps of the application. Custom data can be logged to a local storage, which will then get transferred to windows azure storage. Default Yes Web and Worker

Associated role type

IIS 7.0 Logs WAD Infrastructure Logs Failed request Logs Windows Event Logs Performance Counters Crash Dumps Custom error Logs

Yes Yes No No No No No

Web Web and Worker Web Web and Worker Web and Worker Web and Worker Web and Worker

Collect IIS failed request logs Enabling collection of IIS failed request logs can be done by adding the following to web.config of the associated webrole under system.webServer section:

<tracing> <traceFailedRequests> <add path="*"> <traceAreas> <add provider="ASP" verbosity="Verbose" /> <add provider="ASPNET" areas="Infrastructure,Module,Page,AppServices" verbosity="Verbose" /> <add provider="ISAPI Extension" verbosity="Verbose" /> <add provider="WWW Server" areas="Authentication, Security, Filter, StaticFile, CGI, Compression, Cache, RequestNotifications, Module" verbosity="Verbose" /> </traceAreas> <failureDefinitions statusCodes="400-599" /> </add> </traceFailedRequests> </tracing>

www.aditi.com

Collect Windows Event Logs Windows event logs collection has to be enabled imperatively via code. This is done by calling the GetDefaultInitialConfiguration method of DiagnosticsMonitor, adding the WindowsEventLog data source, and then calling the Start method of DiagnosticMonitor with the changed configuration. This is typically done within the OnStart method of the role. The default value is set to UseDevelopmentStorage=true and this should be used when running the application on compute emulator. Also while using Windows Azure Storage account, the protocol has to be always set to https. (To ensure security as logs more often than not contain sensitive information.)

DiagnosticMonitorConfiguration config = DiagnosticMonitor.GetDefaultInitialConfiguration(); config.WindowsEventLog.DataSources.Add("System!*"); DiagnosticMonitor.Start("Microsoft.WindowsAzure.Plugins.Diagnostics.ConnectionString", config);

Collect crash dumps Crash dumps collection has to be enabled imperatively via code. However it does not require changing of the diagnostic configuration. This is done by calling the method EnableCollection of CrashDumps. The Boolean parameter specifies whether full dump is to be collection (true) or mini dump (false).
CrashDumps.EnableCollection(false);

Setup Transfer Diagnostic data collected by WAD is not permanently stored unless its transferred to Windows Azure Storage. Once transferred, the diagnostic data can be viewed with one of several available tools like Cloud Storage Studio, Azure Storage Explorer etc. Specify Storage Account Upon importing the diagnostics module, a configuration setting named Microsoft.WindowsAzure.Plugins.Diagnostics.ConnectionString gets automatically associated. This setting specifies the Windows Azure Storage account to which the diagnostic data will be transferred.
</ServiceConfiguration> <?xml version="1.0" encoding="utf-8"?> <ServiceConfiguration serviceName="TroubleShootingSample" xmlns="http://schemas.microsoft.com/ServiceHosting/2008/10/ServiceConfiguration" osFamily="1" osVersion="*" schemaVersion="2012-05.1.7"> <Role name="WorkerRole1"> <Instances count="1" /> <ConfigurationSettings> <Setting name="Microsoft.WindowsAzure.Plugins.Diagnostics.ConnectionString" value=" DefaultEndpointsProtocol=https;AccountName=AccountName;AccountKey=AccountKey " /> </ConfigurationSettings> </Role>

www.aditi.com

Schedule Transfer We need to imperatively change the WAD configuration to schedule transfer of diagnostic data. Each data source added for collection has an associated data buffer (local disk storage). The data transfer is scheduled by calling the GetDefaultInitialConfiguration method on DiagnosticMonitor, choosing the configuration property corresponding to the data buffer, assigning a TimeSpan to the ScheduledTransferPeriod property of the data buffers configuration property, and then calling the Start method on DiagnosticMonitor with the changed configuration. The following snippet schedules the transfer of file based logs (IIS logs etc) to every 10 minutes:
DiagnosticMonitorConfiguration config = DiagnosticMonitor.GetDefaultInitialConfiguration(); config.Directories.ScheduledTransferPeriod = TimeSpaan.FromMinutes(10); DiagnosticMonitor.Start("Microsoft.WindowsAzure.Plugins.Diagnostics.ConnectionString", config);

On-Demand Transfer Diagnostic data can also be transferred on-demand from within the role or from an outside application which can be running either on-premise or on Windows Azure. If performing an on-demand transfer from an outside application, we need to obtain the deployment identifier, role name and role instance name for which diagnostic data is to be transferred. This can be done by performing the following steps: Log on to the Azure developer portal. Click Compute Services, and then expand the node for your application. Click the deployment for the application.

Record the ID value from the Properties pane. This is the deployment identifier of your hosted service.
Expand the deployment node, and then click the node for the role from which you want to collect diagnostic data. Record the Name value from the Properties pane. This is the name of the role. Expand the role node, and then click the node for the role instance. Record the Name value from the Properties pane. This is the identifier of the role instance. The on-demand transfer can be done by using role instance diagnostic manager, choosing data buffer associated with the data source whose logs are to be transferred and specifying the timer-interval of those logs.

CloudStorageAccount storageAccount = CloudStorageAccount.Parse("DefaultEndpointsProtocol=https;AccountName=<AccountName>;AccountKey=<AccountKey> "); DeploymentDiagnosticManager diagManager = new DeploymentDiagnosticManager(storageAccount, "<DeploymentID>"); RoleInstanceDiagnosticManager roleInstDiagMgr = diagManager.GetRoleInstanceDiagnosticManager("<RoleName>", "<RoleInstanceID>"); DataBufferName dataBuffersToTransfer = DataBufferName.Directories; OnDemandTransferOptions transferOptions = new OnDemandTransferOptions(); transferOptions.NotificationQueueName = "wad-on-demand-transfers"; TimeSpan timeInterval = new TimeSpan(3, 0, 0); transferOptions.From = DateTime.UtcNow.Subtract(timeInterval); transferOptions.To = DateTime.UtcNow; Guid requestID = roleInstDiagMgr.BeginOnDemandTransfer(dataBuffersToTransfer, transferOptions);

www.aditi.com

Tracing Tracing is a good way to monitor the execution of the application while it is running and also comes in handy for troubleshooting. WAD integrates well with .NET tracing system and collection of trace statements can be enabled in a Windows Azure application by simply adding the DiagnosticMonitorTraceListener to the configuration. (It gets added by default when creating projects in VS.)
configuration> <system.diagnostics> <trace> <listeners> <add type="Microsoft.WindowsAzure.Diagnostics.DiagnosticMonitorTraceListener, Microsoft.WindowsAzure.Diagnostics, Version=1.7.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35" name="AzureDiagnostics"> <filter type="" /> </add> </listeners> </trace> </system.diagnostics> </configuration>

The trace statements are collected as Windows Azure Logs by the diagnostic monitor. Viewing diagnostic data The diagnostic data transferred to Windows Azure Storage is either stored in Blobs or Tables. Tables WADLogsTable Windows Azure Logs WADDiagnosticInfrastructureLogsTable WAD Infrastructure Logs WADDirectoriesTable - Contains information about directories that the diagnostic monitor is monitoring. This includes IIS logs, IIS failed request logs, and custom directories. The location of the blob log file is specified in the Container field and the name of the blob is in the RelativePath field. The AbsolutePath field indicates the location and name of the file as it existed on the Windows Azure virtual machine. WADPerformanceCountersTable Performance Counters. WADWindowsEventLogsTable Windows Event Logs.

Blobs wad-iis-failedreqlogfiles Failed Request Logs. wad-iis-logfiles IIS Logs. <custom> - Custom Error Logs.

www.aditi.com

Manage WAD Configuration The configuration information is stored in an XML file in Windows Azure blob storage under /wad-controlcontainer/<deploymentID>/<rolename>/<roleinstance>. The diagnostic monitor periodically polls the configuration XML file for changes and applies them to the running instances. We can change the configuration either remotely from code running outside the Windows Azure application or from within the application itself. Following is sample configuration XML file.

<?xml version="1.0"?> <ConfigRequest xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchemainstance"> <DataSources> <OverallQuotaInMB>8192</OverallQuotaInMB> <Logs> <BufferQuotaInMB>1024</BufferQuotaInMB> <ScheduledTransferPeriodInMinutes>1</ScheduledTransferPeriodInMinutes> <ScheduledTransferLogLevelFilter>Information</ScheduledTransferLogLevelFilter> </Logs> <DiagnosticInfrastructureLogs> <BufferQuotaInMB>1024</BufferQuotaInMB> <ScheduledTransferPeriodInMinutes>0</ScheduledTransferPeriodInMinutes> <ScheduledTransferLogLevelFilter>Information</ScheduledTransferLogLevelFilter> </DiagnosticInfrastructureLogs> <PerformanceCounters> <BufferQuotaInMB>1024</BufferQuotaInMB> <ScheduledTransferPeriodInMinutes>0</ScheduledTransferPeriodInMinutes> <Subscriptions /> </PerformanceCounters> <WindowsEventLog> <BufferQuotaInMB>1024</BufferQuotaInMB> <ScheduledTransferPeriodInMinutes>1</ScheduledTransferPeriodInMinutes> <Subscriptions> <string>Application!*</string> </Subscriptions> <ScheduledTransferLogLevelFilter>Information</ScheduledTransferLogLevelFilter> </WindowsEventLog> <Directories> <BufferQuotaInMB>0</BufferQuotaInMB> <ScheduledTransferPeriodInMinutes>1</ScheduledTransferPeriodInMinutes> <Subscriptions> <DirectoryConfiguration> <Path>C:\Users\Administrator\AppData\Local\dftmp\Resources\bf046678-2437-4a71-9a65a363c826b5b3\directory\DiagnosticStore\CrashDumps</Path> <Container>wad-crash-dumps</Container> <DirectoryQuotaInMB>1024</DirectoryQuotaInMB> </DirectoryConfiguration> </Subscriptions> </Directories> </DataSources> <IsDefault>false</IsDefault> </ConfigRequest>

www.aditi.com

To remotely update the configuration, we need to obtain the deployment identifier, role name and role instance name as explained in On-Demand Transfer section. RoleInstanceDiagnosticManager can then be used to retrieve the current configuration and update it as per the requirement. Following code snippet shows how to add collection of a performance counter to the configuration and enable its transfer to windows azure storage.

CloudStorageAccount storageAccount = CloudStorageAccount.Parse(RoleEnvironment. GetConfigurationSettingValue("Microsoft.WindowsAzure.Plugins.Diagnostics.ConnectionString")); DeploymentDiagnosticManager deploymentDiagManager = new DeploymentDiagnosticManager(storageAccount, "<DeploymentID>"); IEnumerable<RoleInstanceDiagnosticManager> roleDiagManagers = deploymentDiagManager.GetRoleInstanceDiagnosticManagersForRole("<Role Name>"); PerformanceCounterConfiguration perfCounterConfig = new PerformanceCounterConfiguration(); perfCounterConfig.CounterSpecifier = @"\Processor(_Total)\% Processor Time"; perfCounterConfig.SampleRate = TimeSpan.FromSeconds(5); foreach (RoleInstanceDiagnosticManager roleDiagManager in roleDiagManagers) { DiagnosticMonitorConfiguration currentConfiguration = roleDiagManager.GetCurrentConfiguration(); currentConfiguration.PerformanceCounters.DataSources.Add(perfCounterConfig); currentConfiguration.PerformanceCounters.ScheduledTransferPeriod = TimeSpan.FromMinutes(10); //Update the configuration roleDiagManager.SetCurrentConfiguration(currentConfiguration); }

Following is the updated sample configuration XML file on running the above code snippet.
<?xml version="1.0"?> <ConfigRequest xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <DataSources> <OverallQuotaInMB>8192</OverallQuotaInMB> <Logs> <BufferQuotaInMB>1024</BufferQuotaInMB> <ScheduledTransferPeriodInMinutes>1</ScheduledTransferPeriodInMinutes> <ScheduledTransferLogLevelFilter>Information</ScheduledTransferLogLevelFilter> </Logs> <DiagnosticInfrastructureLogs> <BufferQuotaInMB>1024</BufferQuotaInMB> <ScheduledTransferPeriodInMinutes>0</ScheduledTransferPeriodInMinutes> <ScheduledTransferLogLevelFilter>Information</ScheduledTransferLogLevelFilter> </DiagnosticInfrastructureLogs> <PerformanceCounters> <BufferQuotaInMB>1024</BufferQuotaInMB> <ScheduledTransferPeriodInMinutes>10</ScheduledTransferPeriodInMinutes> <Subscriptions> <PerformanceCounterConfiguration> <CounterSpecifier>\Processor(_Total)\% Processor Time</CounterSpecifier> <SampleRateInSeconds>5</SampleRateInSeconds> </PerformanceCounterConfiguration> </Subscriptions> </PerformanceCounters> <WindowsEventLog>

www.aditi.com

Following is the updated sample configuration XML file on running the above code snippet.

<BufferQuotaInMB>1024</BufferQuotaInMB> <ScheduledTransferPeriodInMinutes>1</ScheduledTransferPeriodInMinutes> <Subscriptions> <string>Application!*</string> </Subscriptions> <ScheduledTransferLogLevelFilter>Information</ScheduledTransferLogLevelFilter> </WindowsEventLog> <Directories> <BufferQuotaInMB>0</BufferQuotaInMB> <ScheduledTransferPeriodInMinutes>1</ScheduledTransferPeriodInMinutes> <Subscriptions> <DirectoryConfiguration> <Path>C:\Users\Administrator\AppData\Local\dftmp\Resources\f3c856df-f9d6-4bff-995a89b93640b6ce\directory\DiagnosticStore\CrashDumps</Path> <Container>wad-crash-dumps</Container> <DirectoryQuotaInMB>1024</DirectoryQuotaInMB> </DirectoryConfiguration> </Subscriptions> </Directories> </DataSources> <IsDefault>false</IsDefault> </ConfigRequest>

Analyzing Crash Dumps using WinDbg The crash dumps are collected to a blob named wad-crash-dumps under which a virtual blob directory hierarchy is created for each role and its instances which crashed. Mini-Dump Setup Perform the following steps to prepare the environment for analyzing the crash dump: Download the dump that is to be analyzed to the local machine using one of the tools like Azure Storage Explorer, Cloud Storage Studio etc.

www.aditi.com

Install WinDbg from http://msdn.microsoft.com/en-US/windows/hardware/gg463009/ (Select the OS version appropriately). Make sure to select the Debugging tools for Windows feature in the installation wizard. Analyze To being analysis, start WinDbg and open the crash dump file (File -> Open Crash Dump or press CNTRL + D) downloaded from windows azure storage. Note: The example being showcased is for a mini-dump.

www.aditi.com

The first step to perform after loading the crash dump is to load SOS.dll which is shipped with .NET framework and helps in debugging managed programs. This can be done by the command .loadby SOS clr.

The next step is to view the managed stack trace at the time of crash. This is done by the command !CLRStack -a.

www.aditi.com

The next step is to view the managed stack trace at the time of crash. This is done by the command !CLRStack -a.

As shown in the screenshots above, the line of code responsible for the exception is rightly identified from the stack trace. (CrashTask.cs, Ln 15) To print the exception details, use the command !pe.

www.aditi.com

Note: Unhandled exception will also come up in Windows Event Logs

Full Dump Analyzing full dump is similar to mini-dump, with the difference being the access to the memory heap. Follow the setup steps mentioned for mini-dump. (Notice the size of the dump file in the screenshot below, its grown to 198 MB from 16MB in the case of mini-dump)

www.aditi.com

Run the command !clrstack a like we did for mini-dump. However, since we now have access to the memory heap, we can drill down into the objects pertaining to the parameters.

The region marked in red corresponds to the parameter being passed to the InvalidOperationException (see the source code snapshot in previous section). We can find out the contents of the parameter object using the command !dumpobj 00000000029718c0.

www.aditi.com

Similarly, to look at the CrashTask object that was used for calling Execute, we can use the command !dumpobj 0x000000000296d2c0 (The object address retrieved from the LOCALS section in the method call WorkerRole1.WorkerRole.Run in the previous screenshot) A list of all the commands available can be found at http://msdn.microsoft.com/en-us/library/bb190764.aspx . Best Practices Optimal WAD configuration It is quite important to have a suitable WAD configuration to prevent excess cost as well affecting the applications performance. Following points should be given due consideration: The WAD storage account being used to transfer the logs should be in the same data center as the application. This will prevent incurring transaction costs for the data transfer. Use configuration setting for the transfer schedule period. This setting can then be set appropriately for different environments by making use of the multiple service configuration files feature. For example, ServiceConfiguration.Local.cscfg could have the following setting:
<Setting name="ScheduleTransferPeriodInMinutes" value="1"/>

while ServiceConfiguration. Cloud.cscfg could have the setting:


<Setting name="ScheduleTransferPeriodInMinutes" value="1"/>

The WAD configuration code in the role can then be modified as shown below to make use of the configuration setting.
DiagnosticMonitorConfiguration config = DiagnosticMonitor.GetDefaultInitialConfiguration(); var scheduledTransferPeriodInMins = RoleEnvironment.GetConfigurationSettingValue("ScheduledTransferPeriod"); TimeSpan scheduledTransferPeriod = TimeSpan.FromMinutes(Convert.ToDouble(scheduledTransferPeriodInMins)); config.Logs.ScheduledTransferPeriod = scheduledTransferPeriod; DiagnosticMonitor.Start("Microsoft.WindowsAzure.Plugins.Diagnostics.ConnectionString", config);

www.aditi.com

This will certainly help the developers to access the logs fast enough when debugging applications on local environment and prevent cost as well as performance overhead when the application is deployed to the cloud. The transfer of Windows Azure Logs and Windows Event Logs must be regulated by appropriately setting the filter for log level. Following are the different log levels: Critical Error Warning Information Verbose The level is cumulative i.e. if the filter is set to Warning the both Critical and Error are included. We can use configuration setting to specify the logging level (as done above for transfer schedule period). ServiceConfiguration.Cloud.cscfg
<Setting name="LogLevelFilter" value="Error"/>

ServiceConfiguration.Local.cscfg
<Setting name="LogLevelFilter" value="Information"/> DiagnosticMonitorConfiguration config = DiagnosticMonitor.GetDefaultInitialConfiguration(); var logLevelFilterVal = RoleEnvironment.GetConfigurationSettingValue("LogLevelFilter"); LogLevel logLevelFilter = (LogLevel)Enum.Parse(typeof(LogLevel), logLevelFilterVal); config.Logs.ScheduledTransferLogLevelFilter = logLevelFilter; config.WindowsEventLog.ScheduledTransferLogLevelFilter = logLevelFilter; config.DiagnosticInfrastructureLogs.ScheduledTransferLogLevelFilter = logLevelFilter; DiagnosticMonitor.Start("Microsoft.WindowsAzure.Plugins.Diagnostics.ConnectionString", config);

Correctly Configure Data Buffer Sizes We need to have a rough estimate of the total storage required for the data sources that have been configured for collection by WAD. The maximum value is capped by size of the disk pertaining to the VM instance and by the OverallQuotaInMB property of the DiagnosticMonitorConfiguration class. For example, when using a small instance the maximum size of local storage available is 165GB. By default, OverallQuotaInMB is set to 4GB, so you are left with about 161GB for the application. OverallQuotaInMB sets the rewritable wraparound buffer for all the diagnostic data collected from all the configured data sources. Now, the default value of OverllQuotaInMB may not always be sufficient. If you have configured lot of data sources for collection, then there is a risk of the collected data getting overwritten (oldest data is deleted as new data is added) before it is transferred to windows azure storage (The deletion of oldest data occurs after transfer too). To go beyond the default value, add a <LocalStorage> element for DiagnosticStore with the sizeInMB attribute set to the new size to the ServiceDefinition.csdef file and change the OverallQuotaInMB value accordingly. It is also important to remember that OverallQuotaInMB is shared amongst all the data sources and that each corresponding data buffer can be individually configured to have its own max value by setting the property BufferQuotaInMB. It is therefore important to take care while setting the individual data buffer sizes so that the aggregate value does not exceed OverallQuotaInMB. If it does, then WAD will fail and the only way to see the error is to attach a debugger or have a try-catch block. The default is zero which means less than OverallQuotaInMB and it can also be explicitly set

www.aditi.com

Note: By setting cleanOnRoleRecyle attribute to false weensure that data is wiped out when the role recycles. However, this does not guarantee that the data will remain if the instance is moved (hardware problem etc.).

DiagnosticMonitorConfiguration config = DiagnosticMonitor.GetDefaultInitialConfiguration(); // Set an overall quota of 8GB. config.OverallQuotaInMB = 8192; // Set the individual data buffer explicitly and make sure it is less than the OverallQuotaInMB set above. config.Logs.BufferQuotaInMB = 1024; config.Directories.BufferQuotaInMB = 0; // Use the rest of the storage here config.WindowsEventLog.BufferQuotaInMB = 1024; config.PerformanceCounters.BufferQuotaInMB = 1024; config.DiagnosticInfrastructureLogs.BufferQuotaInMB = 1024; DiagnosticMonitor.Start("Microsoft.WindowsAzure.Plugins.Diagnostics.ConnectionString", config);

Use mini-dumps instead of full dumps Full dump files contain the processs memory at the time of crash. Since windows azure VM instances run 64-bit version of Windows, the size of full dump files can be quite large. For example, the full crash dump of an ExtraLarge VM instance can go up to 14GB (worst case scenario). It is therefore better to enable collection of mini-dumps unless a full dump is absolutely needed, for example to examine memory leaks, analyze object structures etc.

Brihadish Kaushik is Technical Lead at Aditi Technologies. He has been involved in developing applications using the various offerings of .NET technology ecosystem. He has had the opportunity to get his hands dirty with WCF, WPF, WWF, ASP.NET , the programming language that he is proficient being C#. Over the past 4 years he has been focusing on designing Windows Azure based solutions.

About Aditi
Aditi helps product companies, web businesses and enterprises leverage the power of cloud, e-social and mobile, to drive competitive advantage. We are one of the top 3 Platform-as-aService solution providers globally and one of the top 5 Microsoft technology partners in US. We are passionate about emerging technologies and are focused on custom development. We provide innovation solutions in 4 domains: Digital Marketing solutions that enable online businesses increase customer acquisition Cloud Solutions that help companies build for traffic and computation surge Enterprise Social that enables enterprises enhance collaboration and productivity Product Engineering services that help ISVs accelerate time-to-market www.aditi.com https://www.facebook.com/AditiTechnologies http://www.linkedin.com/company/aditi-technologies http://adititechnologiesblog.blogspot.in/ https://twitter.com/WeAreAditi

www.aditi.com

You might also like