You are on page 1of 42

Microsoft Application Center 2000 Resource Kit

Chapter 10 - Working with Performance Counters

This chapter focuses on using the Microsoft Application Center 2000 (Application Center)
performance monitoring feature; however, because performance monitoring and capacity
planning are such complex and inter-related topics, it was necessary to provide additional
background information. As a result, this chapter is divided into four major parts that:

 Provide an overview of performance tuning and capacity planning. 


 Present high-level guidelines for performance testing and tuning. 
 Illustrate how to work with the default Application Center performance counters and
create new ones. 
 Provide performance-monitoring examples by using different cluster topologies. 

There are numerous elements to consider when dealing with performance, but whether you're
dealing with a single server or several servers, you can divide these elements into the following
broad categories:

 The hardware 
 The applications (including Microsoft Internet Information Services 5.0 [IIS] and Active
Server Pages [ASP] runtime) 
 The database 
 The network 
 The operating system (as manifested through the processor, memory, and disks) 

These are the parts of a cluster environment that you have to deal with when trying to achieve
and maintain high levels of performance on an Application Center cluster.

Performance Management

When dealing with performance management, we're talking about the continuous process of
evaluating a server to determine whether or not it can deliver the level of performance that's
required, which is to say, the server's ability to handle a certain load of concurrent users.
Performance management is closed linked to capacity planning; the difference is that
performance management involves tuning the current system so that it can perform better,
thereby enabling it to support more users. Capacity planning, on the other hand, focuses on how
many users a site can support and how to scale the site so it can support more users.

While confronting the myriad of elements that make up a production system, you also have to
balance the goals and priorities of two viewpoints—that of the user and that of the administrator..

The User's Perspective

For most users, performance equates to speed—the perceived response time of the system they're
using. When they activate a hyperlink and the requested page is retrieved and displayed quickly
—typically in less than 10 seconds—their perception of performance is favorable. (It's
interesting to note that it's not uncommon for a user to think that a page takes longer to retrieve
and display than it actually does.)

From a user's perspective, the definition of performance and the primary goal of performance
tuning is the same—make it fast. This speed-based viewpoint encompasses the following:

 Initialization 
 Shut down 
 Page retrieval and rendering 
 Reasonable time-outs 

The Administrator's Perspective

From an administrator's viewpoint, performance is a measure of how system resources are


utilized by all the running programs. The scope of resource usage ranges from the lowest level
program (drivers, for example) up to and including the applications that are hosted on a server.

In terms of performance tuning, the administrator's primary goal is to make the system satisfy
client requests quickly and without errors or interrupts. His secondary tuning goals are:

 Conserving bandwidth 
 Conserving CPU resources and RAM utilization 

Unlike the user, who deals primarily with perception, the administrator can quantify resource
utilization through the collection, observation, and analysis of performance data (see Figure 10.1
on page 327).

You can use performance data to:

 Observe changes and trends in resource usage and workload distribution. 


 Quantify the relationship between the workload and its effect on system resources. 
 Test configuration changes or tuning efforts by monitoring the results. 

Regardless of the perspective you take, you have to approach tuning systematically and employ a
methodology for implementing and testing system configuration changes.

The business perspective 

The business perspective also plays a significant role in performance management. In this
context, someone has to do determine how much hardware is required, how to make provisions
for peak loads, how to balance out spikes with low overall load, and how to determine or satisfy
service-level agreements.

It's often necessary to make price and performance trade-offs—it may be too expensive to have
enough servers for maintaining low processor utilization at all times, so low average utilization
with spikes becomes acceptable.
An Overview of Performance Tuning
Tuning consists of finding and eliminating bottlenecks—a condition that occurs, and is revealed, when a
piece of hardware or software in a server approaches the limits of its capacity.

Before starting the performance tuning cycle illustrated in Figure 10.1, you have to establishes
the framework for ongoing performance tuning activities. You should:

 Identify constraints—A site's business case determines priorities, which in turn establish
boundaries. Constraints, such as maintainability and budget limits, are factors that cannot
be altered in search of higher performance. You have to focus performance work on
factors that are not constrained. 
 Specify the load—This involves determining what services the site's clients require and
the level of demand for those services. The most common metrics for specifying load are
the number of clients, client think time (the delay between when a client receives one
reply and when it submits the next request), and load distribution (steady or fluctuating,
average, and peak load). 
 Set performance goals—Performance goals have to be explicit, which involves
identifying the metrics that will be used for tuning as well as their corresponding
benchmark values. Total system throughput and response time are two common metrics
that are used to measure performance. After identifying the performance metrics, you
have to establish quantifiable and reasonable benchmark values for each one. 

The Tuning Cycle


The four phases of the tuning cycle shown in Figure 10.1 are repeated until you achieve the
performance goals that you established prior to starting the tuning process. Let's examine each phase,
starting with Collecting. (Fig 10.1) Collecting  Analyzing  Configuring  Testing  Analyzing ….

Collecting

The Collecting phase is the starting point of any tuning exercise. During this phase you're simply
gathering data with the collection of performance counters that you've chosen for a specific part
of the system. These counters could be for the network, the server, or the back-end database.

Regardless of what part of the system you're tuning, you require a baseline against which to
measure performance changes. You need establish a pattern of system behavior when the system
is idling as well as when specific tasks are executed (for example, adding a member to the cluster
and synchronizing it to the controller). Therefore, your first data-gathering pass is used to
establish a baseline set of values for the system's behavior. The baseline establishes the typical
counter values that you'd expect to see when the system is behaving satisfactorily.

Note Baseline performance is a subjective standard—you have to set a baseline that's appropriate
for your work environment and that best reflects your system's workload and service demands.

Once you've established your baseline, you can apply load to the system by using a tool such as
Web Application Stress (WAS) to simulate user load.
Analyzing

After you've collected the performance data that you require for tuning the part of the system that
you're working on, you need to analyze the data to determine where the bottleneck is.
Remember, a performance number is only an indicator—it doesn't necessarily identify the actual
bottleneck because a performance problem can be traced back to multiple sources. It's also not
uncommon for problems in one system component to be the result of problems in another
component (a memory shortage is the best example of this; it's indicated by increased disk and
processor use).

The following points, taken from the Microsoft Windows 2000 Resource Kit, provide guidelines
for interpreting counter values and eliminating false or misleading data that might cause you to
set inappropriate target values for tuning.

 Monitoring processes of the same name—Watch for unusually large values for one
instance and not the other. Sometimes, the System Monitor misrepresents data for
separate instances of processes of the same name by reporting the combined values of the
instances as the value of a single instance. You can work around this by tracking
processes by the process identifier. 
 Monitoring several threads—When you are monitoring several threads and one of them
stops, the data for one thread might appear to be reported for another. This is because of
the way threads are numbered. You can get around this by including the thread identifiers
of the process's threads in your log or display. Use the Thread\Thread ID counter for
this purpose. 
 Intermittent spikes in data values—Don't give too much weight to occasional spikes in
data. These might be due to the startup of a process and are not an accurate reflection of
counter values for that process over time. Counters that average, in particular, can cause
the effect of spikes to linger over time. 
 Monitoring over an extended period of time—We recommend using graphs instead of
reports or histograms because the latter views only show the last values and averages. As
a result, you might not get an accurate picture of values when you're looking for spikes. 
 Excluding start-up events—Unless you have a specific reason for including start-up
events in your data, exclude these events because the temporarily high values they
produce tend to skew overall performance results. 
 Zero values or missing data—Investigate all occurrences of zero values or missing data.
These can hamper your ability to establish a meaningful baseline. 

Configuring

After you've collected your data and completed the analysis of the results, you can determine which
part of the system is the best candidate for a configuration change and implement this change.

The cardinal rule for implementing changes is only implement one configuration change at a time. A
problem that appears to be related to a single component might be the result of bottlenecks involving
several components. For this reason it's important to address problems individually. If you make
multiple changes simultaneously, it may be impossible to accurately assess the impact of each change.
Testing

After implementing a configuration change, you'll have to complete the appropriate level of
testing to determine the impact of the change on the system that you're tuning. At this point, it's a
matter of determining whether or not the change:

 Improved performance. Did the change improve performance, and if so, by how much? 
 Degraded performance. Did the change cause a bottleneck somewhere else? 
 Had no impact on performance. Did the change have any noticeable impact at all on
performance? 

Testing do's 

 Check the correctness and performance of the application that you're using for testing by
looking for memory leaks and inordinate delays in response to client requests. 
 Ensure that all the tests are working correctly. 
 Make sure that all the tests can be repeated by using the same transaction mix and the
same clients generating the same load. (See "The Web Application Stress Tool," later in
this chapter.) 
 Document changes and results. 

Tip You can obtain the monitoring results of your testing from monitoring log files—which can
be exported to Microsoft Excel—and the Event log.

An Overview of Capacity Planning In order to realize the full potential of a site, you have
to satisfy the demands of your users, which typically consist of quality of service, quality of
content, and speedy access to the site's content and services. (The latter is, for most of your users,
the key contributing factor to a positive user experience.) Capacity planning is the process of
determining the most cost efficient method of increasing a Web site's performance and
scalability, while at the same time predicting the point at which a resource will cause a
bottleneck on the Web site.

The starting point for capacity planning is determining a site's capacity, which is determined by:

 The number of users it can handle before performance falls off 


 The server's ability to handle increased load, either due to an increased number of users
or increased content complexity 
 The nature of the site's content, which is to say, the complexity of its applications 

Note: Capacity is influenced indirectly by performance; a well-tuned site can increase capacity
by making better use of existing resources and, in some cases, free up resources. At some point,
regardless of how well tuned your site is, the site cannot handle more traffic without degrading
performance. This is the point at which you either have to scale up by upgrading/replacing the
existing servers or scale out by increasing the size of your cluster.
Ideally, you will have done some capacity planning that establishes acceptable performance
benchmarks and resource usage limits, and you will have either scaled or have a plan in place to
scale your system before performance degrades.

The key factors for successful capacity planning are:

 Understanding the nature of the site's content. Different types of content (for example,
static HTML pages and ASP pages) have a different—and often dramatic—impact on
system resources. Your capacity planning has to take into account how the existing
content types affect capacity, as well as how a change in the content mix could affect
resource usage. 
 Understanding the site's users. You have to be able to understand site usage patterns in
order to predict traffic growth and accommodate short-term usage spikes. 

Once again, you have to gather baseline data before you can determine when and how to increase
the capacity of your system.

We recommend that you use the "Capacity Planning" white paper (Microsoft TechNet) as a
guide for your capacity planning activities. The Microsoft Internet Information Server Resource
Kit (Microsoft Press, 1998) for IIS 4.0 and the Microsoft Information Services Resource Guide
(Microsoft Press, 2000) for IIS 5.0 also provide useful information about capacity planning for
Web sites.

White Paper: "Capacity Planning"

The "Capacity Planning" white paper, produced by Microsoft Enterprise Services, is available
from the TechNet Web site at the following URL:
http://www.microsoft.com/technet/archive/itsolutions/ecommerce/default.mspx.

"Capacity Planning" is a part of a series about applying Microsoft Enterprise Services


frameworks to e-commerce solutions, and although it deals with capacity planning for sites
running Windows 2000, IIS, Microsoft Site Server version 3.0, and Microsoft SQL Server
version 7.0, it's methodology is not limited to these products or this particular business solution.

The following topics and subtopics are covered in the white paper:

 "Introduction to Capacity Planning" 

The introduction covers the why, when, and how of capacity planning and introduces the
capacity planning equation: Number of supported users = Hardware capacity/load on
hardware per user 

 "Analyzing Your Site" 

"Dynamic Content Analysis" 


"Site Server 3.0 Commerce Edition" 

"Transaction Cost Analysis" 

"Predicting Site Traffic" 

"Analyzing a Typical User" 

"Acceptable Operation Parameters" 

"A Detailed Test Methodology" 

"User Cost Calculations" (for CPU, memory, disk, and network) 

 "Deriving Site Capacity" 

In this section, you'll learn how to calculate hardware needs and how to plan site
topology scalability—both vertically and horizontally. 

Top of page

Testing and Tuning the Infrastructure

When monitoring and tuning your infrastructure, you have to remember that the various elements
must be treated as a whole as well as individually. Before delving into these elements, it's
necessary to understand the two primary performance metrics that are used for both performance
tuning and capacity analysis—throughput and response time.

Throughput

Throughput is a measure that describes the rate at which a server can process requests. The
higher the throughput is, the better your servers can accommodate spikes in the load.

Typically, throughput is expressed in terms of requests per second or requests per day. Because
of browser behavior (for example, a single HTML page request might be coupled with separate
requests for imbedded images or frames), it can help to think of throughput in terms of page hits.

Note Some administrators estimate throughput on their sites by dividing the number of clients by
their think time, or the number of seconds that a user takes to read a page before clicking a link.
Using this approach, if you had 1000 users with an average think time of 10 seconds, the
throughput would be 100 requests per second. However, throughput is really a function of how
quickly requests arrive at the server and how quickly the server can respond to these requests.

The following factors can diminish throughput:

 Bandwidth 
 Page size 
 Application complexity 

You can measure throughput by using two performance counters that provide instant and
historical values. For static HTML pages, you should use Web Service(_Total)\Get
Requests/sec, and for ASP pages, you should use Active Server Pages\Requests/sec. After
obtaining baseline data, you should apply stress to your server to determine how the increased
load affects throughput and system resources.

Response Time

The two determining factors in response time are network latency, which is the time it takes a
request to move through the server request queue, and request execution time.

Network latency is the measure of how long a data packet takes to travel between two points. In
today's network environments, there are many factors that have an impact on latency, including
network congestion, link quality and bandwidth, the physical distance between the two points,
and the hop count between the two points.

Don't forget that network latency also affects the time it takes for a request to return from the
server to the client.

Note Even in a hypothetical network scenario with zero latency, a request can still spend time in
a server queue (request queue time) before it is processed. The number of outstanding requests in
the server queue determines this queue time; typically, server queue length is proportional to the
server load.

Two important response-time measures are the time-to-first-byte (TTFB) and the time-to-last-
byte (TTLB) values. These values are provided whenever you run test scripts by using the WAS
tool, which is documented in "The Web Application Stress Tool," later in this chapter.

Note TTFB and TTLB are calculated by using the time that a page is first requested and the
times that the first and last bytes of data are received on the client.

The second factor in response time is request execution time. In addition to adding to response
time, long execution times contribute to throughput degradation. Tuning your applications is the
primary method for reducing execution time as well as throughput, which is covered in "Testing
and Tuning Applications," later in this chapter.

Now let's examine the counters that you can use for measuring network and server performance.

The Network and Server

The Server Operations Guide in the Microsoft Windows 2000 Server Resource Kit identifies
numerous counters that you can use to monitor your system's hardware resources. The biggest
challenge that you're going to face is determining which resources to monitor and which counters
are appropriate for each resource.

However, you can use the suggested thresholds for the selected counters in Table 10.1 as a
guideline for evaluating server performance. If your system consistently reports these values, it's
quite likely that a bottleneck exists on the system and you should take the appropriate steps—
tune or upgrade the affected resource.

Once you understand the baseline or average values for your site, you can then use Microsoft
Health Monitor 2.1 to track deviations and alert you to potential problems.

Table 10.1 Suggested Counter Thresholds for a Server 

Resource Object/Counter Threshold Comments


PhysicalDisk\% Disk
Disk 90%  
Time
PhysicalDisk\ Disk Depends on Check the disk's specified transfer rate to
Disk Reads/sec, manufacturer's verify that the logged rate doesn't exceed
PhysicalDisk\Disk specifications specifications.(1)
This is an instantaneous counter; observe
PhysicalDisk\ Current Number of its value over several intervals. For an
Disk
Disk Queue Length spindles plus 2 average over time, use PhysicalDisk\
Avg. Disk Queue Length
Memory\Available Research memory usage, and then add
Memory Less than 4 MB
Bytes memory, if needed.
Research paging activity, the activity
that occurs when data is swapped out of
Memory Memory\Pages/sec 20
memory and stored on disk when
memory is low.
Network Segment\% Depends on For Ethernet networks, the recommended
Network
Net utilization network type threshold is 30 percent.
Review this value in conjunction with
Available Bytes and Pages/sec to
Paging File Paging File\% Usage 99%
understand paging activity on your
system.
Isolate the process that is using a high
Processor\% Processor percentage of processor time. Upgrade to
Processor 85%
Time a faster processor, or install an additional
processor.
A dramatic increase in this counter with
a corresponding increase in system
Processor\ Depends on the
Processor activity indicates a hardware problem.
Interrupts/sec processor
Identify the network adapter that is
causing the interrupts.
Server Server\Bytes Total/sec   If the sum Bytes for all servers is
roughly equal to the maximum transfer
rates for your network, you might need
to segment the network.
If this value reaches the threshold,
Server\Work Item
Server 3 consider tuning InitWorkItems or
Shortages
MaxWorkItems in the registry.
This value indicates the maximum
Server\Pool Paged Amount of
Server paging file size and the amount of
Peak physical RAM
physical memory.
If this value reaches the threshold, there
Server Work may be a processor bottleneck. This is an
Server 4
Queues\Queue Length instantaneous counter; observe it over
several intervals.
Multiple System\Processor This is an instantaneous counter; observe
2
Processors Queue Length it over several intervals.

(1) To monitor Logical and Physical Disk object counters, you have to activate them first by
typing diskperf –yv at the command prompt. They will be enabled after you restart the system.

Note Deciding whether or not server performance is acceptable is of course, highly subjective,
and should reflect the baseline values that you establish for your own environment.

The Web Server

The main elements that you have to consider when tuning your Web servers are:

 Memory 
 Processor capacity 
 Network capacity, latency, and bandwidth 
 Disks 
 Security features 

The following sections, which provide guidelines for handling each of these elements, are taken
from Appendix C, "The Art and Science of Web Server Tuning with Internet Information
Services 5.0."

Caution Remember, in keeping with the Application Center homogenous server philosophy,
virtually all of the IIS configuration settings on the cluster controller are replicated to every
cluster member. Therefore, you can't maintain unique IIS settings for each member. When you're
tuning IIS on the controller, you have to take a holistic approach and consider what impact the
Web server settings on the controller will have on the rest of the cluster members.

Memory

Monitor memory first to ensure that your server has enough before moving on to other
components. Because the IIS file cache is set up to use up to one-half of the available memory by
default, the more memory you have, the larger the cache can be—up to its limit of 4 GB. Lack of
memory is the number one performance bottleneck on Web sites.

Note Adding more memory doesn't guarantee that all your performance problems will be solved
—you should also monitor how the IIS cache settings are affecting performance.

Table 10.2 summarizes the key memory counters.

Table 10.2 Memory Counters 

Counter(s) Comments
Indicates available memory. At least 10 percent of
Memory:Available Bytes
memory should be available for peak use.
Use the first counter to determine the overall rate at
which the system is handling hard and soft page
faults. Memory:Pages Input/sec, which should be
Memory:Page Faults/sec, Memory:Pages
greater than or equal to Memory:Page Reads/sec,
Input/sec, and Memory:Page Reads/sec
indicates the hard page fault rate. If these numbers
are high, it's likely that too much memory is
dedicated to the caches.
Because IIS automatically trims the file system
cache if it is running out of memory, you can use
the File Cache Hits % counter trend to monitor
Memory: Cache Bytes, Internet memory availability. Use the second counter to see
Information Services Global: File Cache how well IIS is using the file cache. On a site made
Hits %, Internet Information Services up mostly of static files, this value should be 80
Global: File Cache Hits, and Internet percent or higher. You can compare Internet
Information Services Global: File Cache Information Services Global: File Cache Hits and
Flushes Internet Information Services Global: File Cache
Flushes to determine whether objects are flushed
too quickly (more often than they need to be) or too
slowly (thus, wasting memory).
Indicates the size of the paging file. The paging file
on the system drive should be at least twice the size
Page File Bytes: Total
of physical memory. You can improve performance
by striping the paging file across multiple disks.
Memory: Pool Paged Bytes, Memory:Pool
Nonpaged Bytes, Process: Pool Paged Use these counters to monitor the pool space for all
Bytes:Inetinfo, Process: Pool Nonpaged of the processes on the server as well as those used
Bytes:Inetinfo, Process: Pool Paged Bytes: directly by IIS, either by the Inetinfo or Dllhost
dllhost#n, and Process: Pool Nonpaged processes.
Bytes: dllhost#n

Tip Besides adding more memory, you can enhance memory performance by:

 Improving data organization on the disk. 


 Implementing disk mirroring or striping. 
 Replacing Common Gateway Interface (CGI) applications with ISAPI or ASP
applications. 
 Increasing paging file size. 
 Retiming the IIS file cache. 
 Eliminating unnecessary features. 
 Changing the balance of the file system cache to the IIS working set. 

Processor Capacity

Bottlenecks occur in the processor when one or more processes consume practically all of the
processor time. This forces process threads that are ready to be executed to wait in a queue.
Adding more hardware to overcome a processor bottleneck usually isn't effective and often
makes the situation worse. In a site that hosts primarily static content, a two-processor computer
is sufficient. With sites that host dynamic content, a four-processor system can handle the load.

Tip Before implementing a hardware change, such as adding another processor, rule out memory
problems and then monitor the processor activity.

Table 10.3 summarizes the key processor counters.

Table 10.3 Processor Counters 

Counter(s) Comments
Use to flag a bottleneck. If this counter has a
System: Processor Queue Length sustained value of two or more threads, there
is likely a bottleneck.
Use to flag a bottleneck. A bottleneck is
indicated by a high Processor: %Processor
Processor: %Processor Time
Time value and values that are well below
capacity for the network adapter and disk I/O.
Thread: Context Switch/
sec:Dllhost#n=>Thread#, Thread: Context Use to determine whether to increase the size
Switch/sec:Inetinfo=> Thread#, and System: of the thread pool.
Context Switches/sec
Use to determine how much time the
Processor: Interrupts/sec and Processor: % processor is spending on interrupts and DPCs.
DPC Time Client requests can be a major source of each
type of load on the processor.

If the counters in Table 10.3 indicate a processor bottleneck, you have to determine if the current
workload is significantly CPU-intensive. If it is, it's unlikely that a single system will be able to
keep up with processing requests, even if it has multiple CPUs. The only remedy in this scenario
is to add another server.

Network Capacity, Latency, and Bandwidth


The time it takes for client requests to be satisfied by a server response—latency—is one of the
largest limiting factors in a user's perception of system performance. This request-response cycle
time is for the most part out of your direct control as a system administrator. For example, there's
nothing you can do about a slow router on the network. Network bandwidth is the most likely
source of a performance bottleneck on a site that's serving primarily static content. You can
monitor the network and mitigate some of these issues by tuning your connection to the network
and maximizing your effective bandwidth as best you can.

You can measure effective bandwidth by determining the rate at which your server sends and
receives data. There are several performance counters that measure data transmission for the
various network service components available on the server. These include counters for the Web,
FTP, and SMTP services, the TCP object, the IP object, and the Network Interface object.

Table 10.4 summarizes the key network-related counters.

Table 10.4 Network-Related Counters 

Counter(s) Comments
Use to determine if your network connection is creating a
Network Interface: Bytes bottleneck. Compare this counter to the total band-width of your
Total/sec network adapter. You should be using no more than 50 percent
of the network adapter capacity.
Web Service: Maximum
If you are running other services that use the network
Connections and Web
connection, you should monitor these counters to ensure that the
Service: Total Connection
Web server can use as much of the connection as it needs.
Attempts

Note Remember to check memory and processor usage. If these numbers are high, the network
might not be the problem.

Disk Optimization

Generally speaking, if there is high disk activity other than logging, this means that other areas of
your system need tuning. However, the type of site you host can have a significant impact on the
frequency of disk seeks. For example:

 There is a very large file set that's accessed randomly. 


 The files on the site tend to be very large. 
 A database is running on the same server, and clients are making dissimilar requests. 
 Intensive logging routines are running. 

Table 10.5 summarizes the key disk-related counters.

Table 10.5 Disk-Related Counters 

Counter(s) Comments
Processor: %Processor Time, If all three of these counters have high values, the hard
Network Interface Connection: Bytes disk is not causing a bottleneck. However, if %Disk
Total/sec, and PhysicalDisk: %Disk Time is high and the other two counters are low, the disk
Time might be the bottleneck.

Security Overhead

There are performance costs associated with all security techniques. Because the Windows 2000
and IIS security services are integrated into several of the operating system services, you cannot
monitor security features separately from these services. The best way to measure security
overhead is to run tests against the Web server with the security feature turned off and then run
them again with the security feature turned on. Make sure that you run these tests against a fixed
server configuration with a fixed workload to ensure that the only variable is the security feature.

Table 10.6 summarizes the key security-related items to monitor.

Table 10.6 Security-Related Items 

Counter Comments
Authentication, IP address checking, Secure Sockets Layer (SSL) protocol, and
Processor encryption schemes require significant processing. You will see increased
Activity and the processor activity (in privileged and user mode) and an increase in context
Processor Queue switches and interrupts. If the processors aren't adequate for the load, you'll see
queues develop.
Physical The system has to store and retrieve more user information. In addition, SSL
Memory uses long keys—up to 1024 bits—for encrypting and decrypting information.
You will see an increase in network traffic between the Web server and the
Network Traffic domain controller that is used for authenticating logon information and
verifying IP addresses.
The most visible performance degradation is the result of encryption and
Latency and decryption, both of which use a significant number of processor cycles.
Delays Downloading files from servers by using SSL can be anywhere from 10 to 100
times slower than from servers that are not using SSL.

Tuning and Troubleshooting Suggestions

If your investigations lead you to believe that you need to address specific hardware-related
performance issues, consider the alternatives listed in Table 10.7, which are based on a single
Web server scenario.

Table 10.7 Tuning and Troubleshooting Your Web Server 

Suggestion Comments
Upgrade to larger L2 If you need to add or upgrade processors, select processors with a large
caches. secondary (L2) cache. Server applications, such as IIS, benefit from large
processor caches (2 MB or more if the cache is external, up to the
maximum available if it is on the CPU).
Upgrade to faster
Web applications, in particular, benefit from faster processors.
CPUs.
Aggressive time-outs help combat latency because open connections
Set aggressive
degrade performance. The default time-out setting in the metabase is 15
connection time outs.
minutes.
Set expires headers on static and dynamic pages to allow content to be
Use expires headers. stored in the client's cache. This improves response time and reduces the
load on the server as well as network traffic.
Buffering allows all application output to be collected in a buffer before
it's transmitted across the network. This cuts down on network response
times. Although it reduces response time and creates the impression that a
Enable ASP buffering. page is slower and less interactive, you can compensate for this by using
Response.Flush. ASP buffering is enabled by default after a clean
installation of Windows 2000, but it might not be enabled after an
upgrade.
Longer connection queues enable you to reduce overhead by enabling the
Lengthen connection server to maintain more connection requests. HTTP keep-alives maintain
queues and use HTTP a client's connection to the server even after the initial request is
keep-alives. complete. This feature reduces latency and CPU processing. Both these
techniques can help make better use of the available bandwidth.
Reduced file sizes generally improve performance. You can use
compressed format for image files and limit the number of images and
Reduce file sizes.
other large files. You can also reduce file sizes by tightening up HTML
and ASP code, and by removing redundant blocks of code in ASP files.
Disk writes for the separate log files that are maintained for each site can
Store log files on
cause bottlenecks. Store these files on a separate partition or disk from
separate disks and
your Web server. You can also avoid logging non-vital information. For
remove nonessential
example, you could place image files in a separate virtual directory and
information.
disable logging for that directory.
Use RAID and striped disk sets to improve disk access. Another option is
Use RAID and using a controller with a large RAM cache. If the site uses frequent
striping. database access, make sure that the database is on a different server than
the Web server.
Use process accounting, which logs the CPU and other resources used by
a Web site, to determine if process throttling should be implemented.
Process throttling limits the amount of resources that a site can use. Both
these features work for CGI applications and for applications that are run
Use CPU throttling, if
out of process. Take care to monitor your system carefully after
necessary.
implementing process throttling—it can backfire on you. Because the
throttled Dllhost process runs at a lower priority, it won't respond quickly
to requests from the Inetinfo process, which means that several I/O
threads can be tied up, thereby degrading server responsiveness.
Top of page
Testing and Tuning Applications

One of the benefits of running your Web servers in a cluster is that the impact of a slow running
application is alleviated. Unfortunately, this only serves to hide an application performance
issue; it doesn't fix the problem.

Before you deploy an application to a production server, it should be tested, not only for bugs
and memory leaks, but for performance as well.

Anticipating Application Load

To properly test an ASP application you have to determine what type of load is anticipated for
the application. We recommend that you break this load down as follows:

 Total number of unique application users—You can use the total of hits per month or,
for more granularity, the total number of hits per hour. 
 Total number of concurrent users—You should base this number on peak time usage. 
 Peak request rate—You should determine how many pages need to be served per
second in a worst-case scenario. 

Determining the Total Number of Users

In a production environment, it may be difficult to determine the total number of users and
concurrent users for the application. For Internet sites, you should:

 Break down the IIS server logs to segment usage data. 


 Take a best guess at how much traffic the site is likely to attract. 
 Project a worst-case usage scenario. 

If your site is primarily for intranet use, you should:

 Break down the IIS server logs to segment usage data. 


 Try to determine who is using the site. Is everyone or a selected group of users? Calculate
how many computers are on the corporate network, and try to identify usage peaks. 
 Project a worst-case scenario. 

Stress Test the Application

After you've established a context for testing, you can use the WAS tool to run test scripts
against the server. This tool enables you to simulate different types and degrees of user load and
collect performance data.

You can download the newest version of the tool from the WAS Tool site at
http://www.microsoft.com/technet/archive/itsolutions/intranet/downloads/webstres.mspx. While
you're at the site, you should also download the white paper "Web Application Stress Test and
Data Analysis." Prepared by the Unisys Consulting Service, this paper documents the work they
did for an enterprise customer who wanted them to assess and analyze the scalability and
performance of a Web application that made extensive use of SQL Server 7.0 stored procedures.
The customer's goals included determining the appropriate hardware platform for hosting the
application, addressing potential performance bottlenecks, and estimating the typical response
time that the application's users could expect.

We also recommend the following print-based resources for optimizing, testing, and tuning your
ASP applications:

 Reilly and Gibbs, Chapter 26, "Optimizing ASP Performance," Professional Active
Server Pages 3.0, WROX Press, October 1999. 
 Appendix A, "ASP Best Practices," in the Internet Information Services 5.0 Resource
Guide, Microsoft Press, 2000. 

Table 10.8 lists several additional online resources that deal with Web application performance
and tuning.

Table 10.8 Application Performance Resources 

Title Author Location


15 ASP Tips to
Improve
  http://msdn.microsoft.com/workshop/server/asp/asptips.asp 
Performance and
Style
Server Performance
George
and Scalability http://msdn.microsoft.com/workshop/server/iis/tencom.asp 
Reilly
Killers
Maximizing the Nancy
Performance of Your Winnick http://msdn.microsoft.com/workshop/server/asp/maxperf.asp 
Active Server Pages Cluts
Nancy
Got Any Cache? Winnick http://msdn.microsoft.com/workshop/server/feature/cache.asp 
Cluts
Tips to Improve ASP
Srinivasa
Application http://www.15seconds.com/issue/000106.htm 
Sivakumar
Performance
Timing the
Mike
Execution Time of http://www.4guysfromrolla.com/webtech/122799-1.shtml 
Shaffer
Your ASP Scripts
Testing the
Performance of Your Matt Odhner http://www.microsoft.com/technet/iis/wastip.asp 
Web Application
Improve the Suresh http://www.microsoft.com/data/impperf.htm 
Performance of Your
Kannan
MDAC Application
Top of page

The Web Application Stress Tool

We used the Web Application Stress (WAS) tool extensively when creating and testing our
sample clusters for functionality, performance, load balancing adjustment, and monitor testing.
Because this tool realistically simulates multiple browsers requesting pages from a Web
application, you can gather meaningful performance metrics.

You can create the scripts that the WAS tool uses in several ways: manually, by recording
browser activity, by pointing to an IIS log file, by pointing to the content tree, or by importing a
script. The many benefits of using Web Application Stress include:

 Using multiple user names and passwords to gain access to test sites that use the most
common forms of authentication and encryption including Distributed Password
Authentication (DPA), NTLM, and SSL. 
 Support for dynamic cookies that maintain a relationship with the WAS clients, which
enables realistic personalized test scenarios and session support. 
 Running a test script by using any number of clients, all of which can be controlled from
a single centralized WAS manager. 
 Configurable bandwidth throttling to simulate modem throughput. 
 A custom query-string editor that allows you to save name-value pair combinations as
templates and then use these templates across multiple tests. 
 Providing summary reports with extensive performance data, including percentiles that
remove outliers. In addition to the performance data gathered by the test targets, WAS
allows you to specify performance counters that you run against the targets, which can be
used to provide a validity check on performance data. 
 Support for page groups, which allows you to logically group files and control script flow
execution. 
 Configuration of time delays between requests (socket level) and script item requests,
which enables you to produce exact time sequences for testing trace conditions. 

Figure 10.2 shows the configuration options that are available in WAS. (Note: The last option,
which is not fully visible, is Name resolution. You can enable this option so that network
lookups on remote clients are supported.)
Figure 10.2 The WAS tool configuration window 

In addition to the configuration options shown in Figure 10.2, you can configure individual pages
that are used in your test scripts. Table 10.9 summarizes the main configuration settings that you
can use at the page level.

Table 10.9 Page-Level WAS Configuration Options 

Setting Description
Specify the GET, POST, HEAD, or PUT method for handling the
HTTP Verb
page.
Specify formatting; provide name, distribution, and value. Import ASP
Querystring
or HTML fields.
Post data Specify custom POST data in text or binary format.
Use default header information or provide custom HTTP headers.
Custom headers
Headers can be static or dynamic.
SSL Enable SSL for a page.
Remote Data Services
Enable Remote Data Services (RDS) and convert query to RDS format.
(RDS)

Figure 10.3 shows the WAS reporting interface and the sample report that was generated after
we ran one of our test scripts against a test cluster consisting of two Web servers.

Figure 10.3 Performance data generated by the WAS tool 


Note If you want to test loads for clients that are running the Microsoft Win32 API, download
the Windows DNA Performance Kit Beta from
http://www.microsoft.com/com/resources/windnaperf.asp.

Using WAS to Test NLB Web Clusters

Because a WAS stress test uses a small, limited set of client IP addresses and ports, the Network
Load Balancing (NLB) assumption of wide distribution in client numbers is invalidated. As a
result, you may observe uneven traffic across the cluster.

The following factors will affect the distribution of traffic in WAS testing for an NLB cluster:

 Load balancing affinity—For best results, the cluster should be configured for No
affinity. If Single IP or Class C affinity is used, be sure to use several WAS clients, with
different Class C addresses in the latter case. No affinity is often the most practical
choice. 
 The number of WAS clients—Each WAS client uses a single IP address for all HTTP
connections. The more clients that are used, the more diversity there is in client IP
numbers. 

Note Adding multiple IP addresses to a single client will not affect WAS behavior
because only one IP is ever used per computer. 

At the socket level, WAS uses an implicit bind when making a request. This means that
the operating system supplies the client IP address and port. Microsoft Windows NT
behavior is to always provide the interface address from its routing table. This interface
address is unique, so adding additional IP addresses to a network adapter does not
provide more diversity to the WAS client address space. 

 HTTP keep-alives—When HTTP keep-alives are enabled, all items in a page group are
requested over a single socket. Because this socket uses a common IP address and client
port, NLB sends all requests in that page group to the same Web server. Disabling keep-
alives forces a different client port for each item in the page group. This means that each
item can be served from a different Web server. 

Note With Single or Class C affinity, the keep-alive feature will not affect load
balancing. Disabling keep-alives applies only to the No affinity setting. 

Windows NT uses incremental local port numbers in the 1500 through 4000 range,
looping back to 1500 after exceeding the upper boundary. This provides excellent
diversity in port numbers; however, keep-alives must be avoided in order for large page
groups to take advantage of this. 

Top of page

Performance Counters
As we noted in Chapter 7, "Monitoring," Application Center enables a default set of performance
counters that are used to capture performance data on every cluster member and logs this data to
the Application Center Events and Performance Logging database. As soon as you create a
cluster on a server, or add a server to a cluster, counter logging is initiated and counter data is
written to the local instance of the ACLog database.

Note The default counters are defined in the file Perflogconsumer.mof, which is used to create
the Windows Management Instrumentation (WMI) counter instances that the Application Center
Events and Performance Logging database uses. In turn, a WMI performance-logging consumer
uses an agent to write counter information to the database. In order to display this counter list in
the user interface, Application Center queries the database with a query component.

Each of the installed counters can be enabled for graphing on the performance chart that's
available for the cluster or member nodes by using a Web page dialog that you can launch from
any performance chart that's displayed in the details pane of the snap-in. (See "Enabling Counter
Graphing," later in this chapter.)

The Default Performance Counters

The cross-section of counters selected as the Application Center default performance counters
are listed in Table 10.10. Based on the feedback provided by Microsoft Consulting Services,
product teams, early adopters, and beta testers, it was determined that these counters were the
ones most likely to be used on a regular basis by system administrators. These counters should
meet most of your normal operational performance monitoring requirements. You'll notice that
most of these counters have already been identified in earlier sections of the chapter that dealt
with monitoring the different aspects of a Web server environment. You can, of course, add
additional counters, which we'll cover later in this section.

In addition to listing the Application Center default performance counters alphabetically by


name, Table 10.10 also provides a short description for each counter, identifies the counter's unit
of measurement, and identifies the scope of the data. Scope describes what the data represents,
the present value, an accumulated value, an average, or data collected over a period of time.

Table 10.10 ApplicationCenter Performance Counters 

Counter Description Units Scope


The amount of physical memory that is available
to processes running on the computer. It is
Available Bytes calculated by summing space on the Zeroed,
Bytes Present value
(memory) Free, and Stand by memory lists. This figure
should be at least 5 percent of total memory at all
times.(1)
Bytes Total/sec The sum of Bytes Sent/sec and Bytes Integer Data per time
(Web Service) Received/sec. This is the total rate of bytes that period
are transferred by the Web Service.
The number of times TCP connections have made
Connections
a direct transition to the Syn-sent state from the Integer Present value
active (TCP)
Closed state.
This value can indicate excessive locking in code,
Context
perhaps creating a contention for resources. If too Data per time
Switches/sec Integer
high, add another server or check with Microsoft period
(System)
for the latest patches.
Current
The number of current client connections to the
Connections Integer Present value
Web Service.
(Web Service)
The number of requests outstanding on the disk at
the time the performance data is collected. It
includes requests in service at the time of the
reading. Multi-spindle disk devices can have
multiple requests active at one time, but other
Current Disk
concurrent requests are awaiting service. This
Queue Length Integer Present value
counter might reflect a transitory high or low
(physical disk)
queue length, but if there is a sustained load on
the disk drive, it is likely that this will be
consistently high. Requests are experiencing
delays proportional to the length of this queue
minus the number of spindles on the disks.(2)
Errors per The number of errors generated by ASP Data per time
Integer
second (ASP) applications, per second. period
The number of HTTP requests that are using the
Get Requests/sec Data per time
GET method, per second. The GET method is Integer
(Web Service) period
the most common method used on the Web.
ISAPI extension The number of ISAPI extension requests that are
Data per time
requests/sec simultaneously being processed by the Web Integer
period
(Web Service) Service, per second.
The number of times, per second, that the server
reads the page file on the disk or from memory
Page faults/sec that is not assigned to the working set. Most Data per time
Bytes
(memory) CPUs can handle a large numbers of page faults period
without consequence; however, if disk reads are
high, there might be performance degradation.
The number of bytes of memory that are taken up
Private Bytes
by a particular process (in this case, Inetinfo, Bytes Present value
(process: Inetinfo)
which is part of IIS).
% Privileged The percentage of non-idle processor time spent Percentage Average of
Time (CPU) in privileged mode. (Privileged mode is a accumulated
processing mode designed for operating system values
components and hardware-manipulating drivers.
It allows direct access to hardware and all
memory. The alternative, user mode, is a
restricted processing mode designed for
applications, environment subsystems, and
integral subsystems. The operating system
switches application threads to privileged mode
to access operating system services). %
Privileged Time includes time servicing
interrupts and deferred procedure calls (DPCs). A
high rate of privileged time might be attributable
to a large number of interrupts that are being
generated by a failing device. This counter
displays the average busy time as a percentage of
the sample time.
The percentage of time that the processor is
executing a non-idle thread. This counter was
designed as a primary indicator of processor
activity. It is calculated by measuring the time
Average of
Processor that the processor spends executing the thread of
Percentage accumulated
Utilization (CPU) the Idle process in each sample interval, and
values
subtracting that value from 100 percent.
Processor bottlenecks are characterized by high
Processor:% Processor Time numbers while the
network adapter remains well below capacity.(3)
The percentage of non-idle processor time spent
in user mode. (User mode is a restricted
processing mode designed for applications,
environment subsystems, and integral sub-
systems. The alternative, privileged mode, is
Average of
% User Time designed for operating system components and
Percentage accumulated
(CPU) allows direct access to hardware and all memory.
values
The operating system switches application
threads to privileged mode to access operating
system services.) This counter displays the
average busy time as a percentage of the sample
time.
Request
The number of milliseconds that it took the most
execution time Milliseconds Last value
recent ASP request to complete.
(ASP)
Requests per Data per time
The number of requests executed, per second. Integer
second (ASP) period
The number of requests waiting for service from
the queue. This number should be small, except
Requests Queued during heavy traffic periods. Large numbers of
Integer Present value
(ASP) queued requests indicates that there is a
performance bottleneck somewhere in your
server.
Request wait The amount of time that the most recent ASP
Milliseconds Last value
time (ASP) request was waiting in the queue.
Total Server
Memory (SQL The total amount of dynamic memory the server
Bytes Present value
Server: Memory is currently consuming.
Manager)

(1) This value should be greater than 20 MB.

(2) This difference should average less than 2 for good performance.

(3) Processor utilization does occasionally peak at fairly high levels, but this level should not be
sustained for a long period.

The System Test team's favorite counters 

The Application Center System Test team identified the following counters as their favorites for
isolating performance bottlenecks and identifying memory leaks:

 Active Server Pages: Requests/sec 


 Active Server Pages: Errors/sec 
 Active Server Pages: Transactions/Sec 
 Distributed Transactions Coordinator: Response Time -- Average 
 Distributed Transactions Coordinator: Transactions/sec 
 Memory: Available MBytes 
 Network Interface: Bytes Total/sec 
 Processor: %Processor time 

You can obtain a current list of the installed counters on a server running Application Center by
using one of several techniques. The first method, of course, is via the Application Center user
interface:

 In the Application Center snap-in, in the performance chart view, click Add. 

The Add a Counter dialog box, which displays all the counters that are currently
installed on the system, appears. 

Note It is possible to get two different counter lists depending on the way you query for them.
The Add a Counter dialog box queries data from the Application Center Events and
Performance Logging database; all other methods query the WMI repository. If there are
counters that are not enabled for logging, the two lists will differ, with the one retrieved from the
database being shorter. You can retrieve old data for counters that are no longer being collected.
The second method involves using the WMI Tester (Wbemtest.exe) or WMI Common
Information Model (CIM) Studio (CIM Studio) and running one of these against the member
from which you want to obtain counter information. Follow these steps:

1. Connect to the namespace, root\MicrosoftApplicationCenter. 


2. Enumerate the instances of the class MicrosoftAC_CapacityCounterConfig. 

Finally, for the third method, you can run the Counters.vbs script that's provided on the
Application Center CD. In addition to obtaining a list of the installed counters, you can use this
script to "delete" a counter. In the context of the Counters.vbs script, "delete" means to stop
collecting data from the counter. It does not remove the counter from the ACLog database.

Caution You should be extremely cautious when writing any scripts that access ACLog and
remove counters. If done incorrectly, you can easily affect data integrity and corrupt the
database.

To run this script:

1. In Windows 2000, open a command prompt. 


2. In the Run box, type Counters.vbs and then press ENTER. 

Run without parameters, the script displays help for the two parameters that are available, /list
and /delete. Use the /list parameter to list the installed counters and the /delete parameter,
accompanied by a counter name enclosed in quotation marks, to delete the specified counter.

Here is the Counters.vbs script:

set args = wscript.arguments


cmd = ""
if args.Count > 0 then
cmd = args(0)
end if
select case cmd
case "/list"
listCounters
case "/delete"
deleteCounter(args(1))
case else
showHelp
end select
function e(str)
wscript.echo(str)
end function
//
// Display help if script is executed without parameters
//
function showHelp()
e("/list to display installed counters")
e("/delete <counter name> to stop collecting a counter")
end function
//
// List the counters
//
function listCounters()
Set wbemLocator = CreateObject("WbemScripting.SWbemLocator")
Set wbemService =
wbemLocator.ConnectServer(strComputerName,"root\MicrosoftApplicationCenter")
wbemLocator.Security_.ImpersonationLevel=3
Set counterInstances =
wbemService.InstancesOf("MicrosoftAC_CapacityCounterConfig")
For Each counter in counterInstances
counterName = counter.Name
e(counterName)
Next
end function
//
// Stop logging data from the specified counter
//
function deleteCounter(counterName)
Set wbemLocator = CreateObject("WbemScripting.SWbemLocator")
Set wbemService =
wbemLocator.ConnectServer(strComputerName,"root\MicrosoftApplicationCenter")
wbemLocator.Security_.ImpersonationLevel=3
wbemService.Delete("MicrosoftAC_CapacityCounterConfig.Name=""" + counterName
+"""")
e("Deleted counter: " + counterName)
end function

Adding Additional Performance Counters

If the counters that are provided don't completely meet your monitoring requirements, you can
load additional counters into the Application Center namespace. Creating new counters isn't
difficult; however, you should determine whether or not new counters are needed to meet an
ongoing operational requirement.

When to Create New Counters

We recommend that you only create new cluster-wide counters if you intend to gather data on an
ongoing basis with the intention of accumulating historical data for reporting and planning
purposes. In this case, you would create the counter on the cluster controller so that the updated
counter collection is replicated to all the cluster members the next time there's a full
synchronization—which you can force manually after you create the new counter(s).

In situations where you require additional monitoring capability for a short period of time, such
as performance tuning on a single member, you can add performance counters to that member.
Remember to take the member out of the synchronization loop before creating the new counter
so that the local counter collection isn't overwritten by the counter definitions on the controller.
After you've finished collecting performance data, you can bring the member back into the
synchronization loop; the next time a full synchronization occurs, the counter collection will be
restored to its original state. If a new counter is added on a member, you need to connect directly
to that member—in the Connect to server dialog box, click Manage this server only—in order
to see the counter on the member. If you don't do this, you will see only the counter list for the
cluster controller.

An alternative to creating a new counter is to use the available operating system tools, such as
Performance Monitor and Network Monitor, to perform in-depth monitoring of the server in
question. With these tools, you can log the necessary data you need for ongoing analysis without
changing the structure of the ACLog database and in general, it will be easier to isolate the
information you require for tuning the server or an application.

Creating a New Counter

Creating a new counter is accomplished by writing a counter definition and saving it as a MOF
file or by modifying the sample counters file that's provided on the Application Center CD.

The following code illustrates a typical counter definition that defines a counter for the
Application Center namespace:

// Specifies the WMI namespace for the instance


#pragma namespace("\\root\\MicrosoftApplicationCenter")
//
// Counter consumer class definition
//
instance of MicrosoftAC_CapacityCounterConfig
{
Name = "CPU 0 Interrupts/sec";
CounterPath = "\\Processor(0)\\Interrupts/sec";
CounterType = 1;
Units = "";
AggregationMethod = 1;
ClusterAggregation = 1;
DefaultScale = 0;
};

After you run Mofcomp against this script, the new counter is created as an instance of the
MicrosoftAC_CapacityCounterConfig class. After the performance log consumer retrieves
this information and logs it, a stored procedure detects the counter identifier and then writes an
entry to the counter metadata table. Data integrity is enforced through this process.

Let's analyze the preceding sample in more detail and then create a new counter definition that
defines a new counter for the Application Center counter collection.

The required properties for a counter are as follows:

 Name—The counter name, which is used to identify the counter in the Application
Center Events and Performance Logging database and the Application Center user
interface. The name must be unique among all the counters that are being logged. 
 CounterPath—The counter path, which must be specified by using Performance Data
Helper (PDH) syntax with English or the default system names: \\PerfObject(
ParentInstance/ObjectInstance#InstanceIndex )\\Counter 
 CounterType—The counter type is 1 by default. This is an internal property. Do
not change it. 
 Units—A string value that is used to specify the units of the counter that are displayed in
the user interface. 
 AggregationMethod—Specifies the aggregation method that will be used to do server-
wide rollup calculations. AggregationMethod determines how counter values are rolled
up from one time interval to another, for example, from two hours to one day. You
should not aggregate any counter that collects state, such as On or Off. 

The following values can be used to specify an aggregation method for the counter:

o 0 = None—no aggregation is used; the existing value is rolled up. 


o 1 = Average—when the counter value is rolled up on the server from one interval
to another, the source values are averaged. 
o 2 = Sum—all the values for the recording period are totaled. 
o 3 = Last—the last value recorded by the counter is used. 
o 4 = Min—the minimum value for the recording period is used. 
o 5 = Max—the maximum value for the recording period is used. 
 ClusterAggregation—Specifies the method that is used to roll up server values to
provide a cluster-wide aggregated value. 

Warning Do not use the Min or Max aggregation methods for ClusterAggregation when a
counter specifies Sum—a cumulative counter—for server aggregation. The results are not
useful, very unpredictable, and not supported. In addition, a ClusterAggregation value of 0
indicates no aggregation. As a result, this counter will not be displayed in the cluster-wide view.
An example of this is Thread\ID Process. ID Process is the unique identifier for this process;
ID Process numbers are reused, so they only identify a process for the lifetime of that process.

Let's say, for example, that we want to add two counters to verify that there is a potential
processor bottleneck caused by a client request. The two counters are Processor:Interrupts/sec
and Processor:% DPC Time. The first counter tells us how much time the processor is spending
on hardware interrupts, and the second tells us how much time is spent on deferred procedure
calls.

The easiest way to obtain the counter information that is required for the counter definition is as
follows:

1. On the server, in the Microsoft Management Console (MMC), open the Performance
Monitor snap-in, and then click Plus. 

The Add Counters dialog box appears. 

2. Click the down arrow to the right of the Performance object box, and then click
Processor, which is the object that you want to monitor. 
3. Scroll down the list of counters for the object, and select the one that you want to use. 
Figure 10.4 shows the Performance snap-in with the %DPC Time object selected as the counter.
Note also that the _Total instance is selected by default.

Figure 10.4 The Performance snap-in and the Add Counters dialog box 

Using the information provided in the Add Counters dialog box, we can start building our MOF
file to add the new counters. For the counter path, we have:

 \\ PerfObject = Processor 
 (ParentInstance/ObjectInstance#InstanceIndex) = _Total 
 \\ Counter = %DPC Time 

The next code sample contains our new counter definition for the %DPC Time counter:

// Specifies the WMI namespace for the instance


#pragma namespace("\\root\\MicrosoftApplicationCenter")
//
// DPC counter consumer class definition
//
instance of MicrosoftAC_CapacityCounterConfig
{
Name = "DPC Interrupts/sec";
CounterPath = "\\Processor(_Total)\\%DPC Time";
CounterType = 1;
Units = "Interrupts/sec";
//
// Use averaging for cluster aggregation because summing this value across //
the cluster does not provided meaningful results
//
AggregationMethod = 1;
ClusterAggregation = 1;
DefaultScale = 0;
};

We can repeat the preceding steps to obtain information about the %Interrupt Time counter so
that we can add it to the preceding code. When all of the necessary coding is finished, we'll save
the file—as a text file with a .mof file name extension—on the server where we want to add the
counter. Next, we'll open the command-line window, and run Mofcomp against the file to add it
to the Application Center counter collection. Finally, to verify that the counters were
successfully added, from the command line, we'll run Counter.vbs /list to obtain a list of the
currently active counters. This list verifies that the WMI class instances were successfully stored
in the WMI repository. To verify that the counter is available for logging in the Performance
view, open the Add counter dialog box, and then confirm that the counter name is listed. If the
counter isn't listed, check the Event view to see if any error events were generated from running
Mofcomp to add the counter.

Note You should add new counters on the cluster controller. Because counters are a replicated
property, any new counter information is replicated to all the cluster members. In addition, the
list of cluster-wide counters that is displayed in the Application Center snap-in is retrieved from
the controller.

Enabling Counter Graphing

Through the Application Center user interface, you can enable counter graphing on a per-
member basis or across the cluster. This provides flexibility in managing your members,
particularly when some, such as ACDW802AS in the test cluster we set up, do not have the same
performance capabilities as the other members.

The steps in enabling counter graphing in a performance chart are as follows:

1. In the console tree (on a member or the controller), click membername to display its
status page in the details pane. In addition to member status, the details pane also displays
an area where counter graphs are plotted. 
2. Click Add to activate the Add a Counter dialog box. 
3. In the Counters list, click the counter(s) you want, and then click Add. 
4. Click Close when you've finished adding counters. 

Figure 10.5 illustrates the user interface for enabling a counter.


Cluster-wide performance graphs are displayed when you select the cluster node view. Server
counter graphs are automatically rolled up to the cluster view—in accordance with the counter
aggregation settings—when the same counter is enabled on every member. (See Figure 10.6,
later in this chapter, for an illustration of cluster-wide counter displays.)

Figure 10.5 Using the Add a counter dialog box to enable graphing for a counter 

Top of page

Performance Monitoring Samples


This collection of samples is provided to illustrate how, with a minimal collection of
performance counters, you can monitor a cluster and its members. It also demonstrates how you
can test a cluster and its applications by applying a load to the cluster with the WAS tool.

Before proceeding further with our monitoring examples, there are two items that need to be
highlighted: the test configuration we're using for our examples and the counter graphs.

Cluster Test Configuration

It's important to note that the test server configurations we used for working with cluster
scenarios in this book are not representative of typical production servers. You should not infer
any performance expectations from these tests.

If you examine the configuration summary provided in Table 10.11, you'll see that our test
servers are by no means capable of delivering the same levels of performance as the servers that
most of you use in a production environment. Keep this in mind when looking at the
performance results provided later in this chapter. View these results as conceptual illustrations
in the context of our test computers; don't use the results as performance metrics for your own
equipment.

Table 10.11 Computer Configurations Used in Test Clusters 

Server name Cluster type and role CPU Memory Bus Speed
ACDW516AS Web, controller 1xP6-550 256 MB 66 MHz
ACDW802AS Web, member 1xP6-366 256 MB 66 MHz
ACDW518AS Web, member 1xP6-550 256 MB 66 MHz
ACDW522AS COM+, controller 1xP6-366 256 MB 66 MHz
ACDW811AS COM+, member 1xP6-233 256 MB 66 MHz
ACDW822AS Web, stager 1xP6-266 128 MB 66 MHz

Counter Graphs

When you're graphing different counters, you should be aware of how the individual counter
values are rolled up at the server and cluster level. Table 10.12 lists the default counters and the
aggregation method that is used at the server and cluster levels. Figure 10.6 illustrates how three
counters (Processor Utilization, Web Service GET Requests per second, and ASP Requests
per second) are rolled up to the cluster level.

Table 10.12 Counter Aggregation at the Server and Cluster Levels 

Counter name Server aggregation Cluster aggregation


ASP Errors per second Average Sum
ASP Requests Queued Average Sum
ASP Requests Queued Max value Max value
ASP Requests per second Average Sum
ASP Request Execution Time Average Average
ASP Request Wait Time Average Average
Memory Available Bytes Average Average
Memory Page Faults per second Average Average
Physical Disk Queue Length Average Sum
Inetinfo Private Bytes Average Average
Processor Utilization Average Average
Processor User Time Average Average
Processor Privileged Time Average Average
Log Database Total Memory Average Average
Context Switches per second Average Average
TCP Connections Established Average Sum
Web Service Current Connections Average Sum
Web Service GET Requests per second Average Sum
Web Service Bytes Total per second Average Sum
Web Service ISAPI Requests per second Average Sum

In Figure 10.6, the graph plots are denoted as follows:

 A: Processor Utilization 
 B: Web Service GET Requests per second 
 C: ASP Requests per second 

Note This labeling convention is used for all the sample performance graphs in the balance of
this chapter.
Figure 10.6 Performance graph for a Web cluster with two load-balanced nodes 

Referring to Figure 10.6, note that the Processor Utilization is averaged, whereas Web Service
GET Requests and ASP Requests are summed.

NoteApplicationCenter performance charts exhibit the same behavior as the Windows


2000 Performance Monitor. The values used for the counter graph appear out of
synchronization with the numeric values (for example, Min, Max, and Average) that appear
below the graph. This is because the graph uses the values for the specified period (for example,
15 minutes), but the numeric display uses all of the values that are accumulated during the
session—the session context is defined by when the Application Center snap-in is first activated.
Let's move on and work with some performance monitoring examples that employ the servers
and applications that we described in Chapter 8, "Creating Clusters and Deploying
Applications."

The Base Environment

Our test environment uses the same application, clusters, and members that we described in
Chapter 8, "Creating Clusters and Deploying Applications." We started testing by using the
following configuration, and as we tested, we changed this topology by scaling out the front-end
and back-end clusters. Only a few of the performance graphs produced by our testing are shown
in this section. However, the entire collection of performance graphs for the various cluster
topologies is included in Appendix E, "Sample Performance Charts."

Initial Topology and Cluster Configuration

The following cluster topology was used for performance testing:

 A front-end Web cluster (RKWebCluster) that consists of a single member, the cluster
controller, ACDW516AS 
 A back-end COM+ application cluster (RKCOMCluster) that also consists of one
member, ACDW522AS, the cluster controller 

The Web cluster was configured as follows:

 Web request forwarding disabled 


 NLB client affinity set to custom (none) 
 HTTP keep-alives disabled 
 Load balancing weight equal for all members 

Application

We used the Pre-Flight Check application for testing and distributed it across two tiers, as
described in Chapter 8, "Creating Clusters and Deploying Applications." The HTML and ASP
pages are hosted on the Web tier, and the COM+ applications, AC_PF_VB and AC_PF_VC, are
hosted on the COM+ application tier. Component Load Balancing (CLB) was enabled by
configuring the AC_PF_VB and AC_PF_VC components to support dynamic load balancing,
and ACDW522AS was identified as the member for handling component requests.

Performance Counters

Before applying a test load to the controller, we added three counters to the performance graph
for the controller: Processor Utilization, Web Service GET Requests/second, and ASP
Requests/second. We also added the Processor Utilization counter to the performance chart for
the COM+ server. Because we're not doing any in-depth performance tuning or capacity
planning, these counters are sufficient to give us a good indication of cluster performance under
load and illustrate the effect of scaling out a cluster and adjusting server load balancing weights.
WAS Configuration

We used the ACPreflight script, which is included on the Resource Kit CD, for our tests and
retained the script's default settings for HTTP verbs, page groups, users, and cookies.

Four WAS clients were used for testing, and the following settings were changed from their
default configurations:

 Stress level (threads)—88 


 Use random delay—0 to 1500 milliseconds 
 Suspend and Warmup—5 minutes 

Scenario: Single-Node Web Cluster and Single-Node COM+ Application Cluster

In this first scenario, we wanted to push processor utilization up fairly high, which is why we
reduced the amount of random delay that is used for TCP connections. Figure 10.7 shows the
results we achieved.

Figure 10.7 Performance results on a single node Web cluster 

The next scenario illustrates the affect that adding an additional server has on the performance
indicators shown in Figure 10.7.

Scenario: Two-Node Web Cluster and Single-Node COM+ Application Cluster

For this scenario, we added the server ACDW802AS, which, as you may recall from Table
10.11, is a less robust computer than the cluster controller. However, even this server had a
significant impact on the controller's performance. Figure 10.8 shows the affect that this server
had on the controller's processor utilization.
Figure 10.8 Cluster controller (ACDW516AS) performance after creating a two-node
cluster 

If you compare the new Processor Utilization (A), Web GET Requests (B), and ASP Requests
(C) indicators with those for the same server in Figure 10.7, you can see a noticeable difference
in resource usage after adding another cluster member. Throughput decreases as well, but only
on the cluster controller. The graph lines labeled D and E show the cluster total throughput for
HTML and ASP pages. As you can see, total throughput is higher than on a one-node cluster.

Scenario: Three-Node Web Cluster and Single-Node COM+ Application Cluster

For this scenario, we added a third member, ACDW518AS, to the Web cluster. As expected,
resource utilization decreased on the two original cluster members, but as you will note in Figure
10.9, the new member is underutilized in comparison to the other members.

The graph labeled ACDW802AS in Figure 10.9 includes Processor Utilization (A1) for the
cluster controller as well as the member. For clarity, the controller's Web GETs and ASP
requests aren't displayed, but their performance is at the same level as the member. If you look at
the performance graph (ACDW518AS in Figure 10.9) for the new member, you'll see that
although ACDW518AS has the same level of throughput as the other members—indicating that
the load is well-distributed on our test cluster—Processor Utilization is significantly lower on
this member.
Figure 10.9 Resource utilization and throughput on ACDW518AS 

In the next scenario, we'll adjust the load balancing weight to reduce resource usage on the
controller (ACDW516AS) and ACDW802AS.

Adjusted Load Balancing Weight

In order to take advantage of the lower processor utilization on ACDW518AS, we decided to


increase the load balancing weight on this member. In the membername Properties dialog box,
we set the server weight at the midway mark between the Average load and More load
indicators.

Note The impact of the amount of load added to a single member is more pronounced as the
number of clients increases. Nonetheless, the graphs in Figure 10.10 serve to illustrate how a
load balancing weight adjustment affects server performance.
Figure 10.10 Resource utilization on ACDW518AS after adjusting the load-balancing
weight 

Referring to Figure 10.10, the graph labeled ACDW516AS shows our base performance metrics
for Processor Utilization (A), Web GET Requests (B), and ASP Requests (C) after load
balancing was adjusted. Note the following:

 Both graphs show a spike. Note that the throughput is dropping and processor utilization
is increasing, just before the first date/time indicator on the chart. This is the point where
the weight was adjusted on ACDW518AS and the convergence took place. 
 In the ACDW518AS graph, you can see where throughput and processor utilization
increased after convergence. 
 The white plot line (A1) in the ACDW516AS graph shows processor utilization for the
cluster controller, taken from a previous test that was run before the load balancing
adjustment. As you can see, processor utilization did decrease on the controller, as did
throughput. Similar performance results were experienced on the ACDW802AS cluster
member. 
Note Even though the controller is the same class of server as ACDW518AS, higher processor
utilization on the controller was expected. There are two reasons for this. First, there is a
performance cost associated with the controller role; and second, there is the monitoring cost.
For our tests, we used the cluster controller for all the performance monitoring displays that were
generated.

Our final scenario demonstrates the effect of scaling out the COM+ application cluster.

Scenario: Three-Node Web Cluster and Two-Node COM+ Application Cluster

Up to this point, we've been using a single server on the back-end component server tier to
handle all the COM+ requests coming from the Web cluster. Processor utilization on this server
typically ranged from 50 percent to 65 percent during our tests. Figure 10.11 provides two
graphs. The first, labeled ACDW522AS/ACDW811AS, shows the processor utilization for the
component servers in RKCOMCluster, the COM+ application cluster. The second graph, labeled
RKWebCluster, provides a cluster-wide performance view.
Figure 10.11 Resource utilization after scaling out the COM+ application cluster 

Let's examine the graphs shown in Figure 10.11 in more detail, starting with the component
servers. The cluster controller's processor utilization graph is labeled A; the new member's graph
is A1. As you can see, the point where the new member is brought online is noticeable
(approximate time 2:53 P.M.) and the reduction in the controller's processor load is significant.
After the cluster is scaled out, processor utilization for both servers is evenly matched, which
indicates that component requests are being well distributed between the two servers.

The RKWebCluster graph provides a cluster-wide performance view, with Web GET
Requests and ASP Requests aggregated as totals for all the members.

Note The throughput levels indicated in the RKWebCluster graph remained consistent in all of
our test scenarios. There were, of course, occasional drops when members were added, load-
balancing weights adjusted, or cluster synchronization took place.

Two processor utilization graphs (A and A1) are shown. The dark line is the graph for processor
utilization after the COM+ application cluster is scaled out; the while line shows processor
utilization when there was only one COM+ application server in RKCOMCluster. As these
graphs indicate, scaling out the COM+ application cluster resulted in reduced processor
utilization across the Web tier.

The testing we did for this chapter is by no means exhaustive, but it gives you an idea of the
performance monitoring capability that is at your disposal. To summarize, the Application
Center performance monitoring interface:

 Provides a commonly used collection of pre-installed performance counters 


 Supports the creation of additional counters on a per-member or cluster-wide basis 
 Supports a single console view for monitoring an individual member, an entire cluster,
and multiple clusters 

Top of page

You might also like