Performance Monitoring Best Practice

Performance Monitoring in Windows 2000: Best Practices
Microsoft Corporation
1/31/2003
Microsoft® Windows® 2000 provides a rich set of performance counters for both the
operating system and applications. This paper provides an introduction to the best
practices that utilize these counters to conduct effective performance monitoring for
Windows 2000 servers. Furthermore, it also covers key performance thresholds to assist
in troubleshooting some of the more common performance issues in systems using
Windows 2000.
1. Introduction
Monitoring performance is a critical process in the management of computer systems. Through monitoring,
system administrators obtain performance data that can be analyzed in real time or collected for processing at
a future point in time. The data is used in locating possible performance issues as well as planning for the
growth in demand for system resources. The steps and procedures for monitoring, however, widely vary
depending on the target environment.
Performance monitoring can generally be classified into three different sets of activities; regular monitoring,
troubleshooting and resource planning. Regular performance monitoring ensures that administrators always
have up-to-date information about how their system is operating. Performance data for a specific system over
a range of activities and loads can be used to establish a baseline—a range of measurements that represent
acceptable performance under normal operating conditions. When administrators troubleshoot performance
related system issues, monitoring and collecting performance data and comparing against the baseline
measurement give administrators important information about system resource utilization when the problem
occurred. Finally, monitoring system performance provides the data with which to project future growth and to
plan for changes in system configurations.
The Microsoft Windows 2000 operating system is one of the leading operating systems in use for many
different types of servers. As documented in [1], the Windows 2000 system is well instrumented and provides
abundant utilities for performance monitoring and analysis. In this paper, we present our guidelines for taking
advantage of these features and the practices required in managing real life environments. In doing so, we
categorize server environments and point out useful features that can apply to different configurations.
The target audience of this paper is system engineers and administrators (“administrators” hereafter) who are
interested in monitoring the performance of small to medium size server environments. For such people, the
task of managing the performance as well as the system itself falls under their responsibility. For larger
environments, performance experts with detailed knowledge of monitoring and analysis features would need
to be appointed.
The organization of this paper is as follows. Section 2 gives the overview of the monitoring architecture and brief
descriptions of its sub-components. Section 3 presents practices that we suggest for planning, configuring, and
using performance features on Windows 2000. The summary of important counters and their usages are also
given in Section 3. Section 4 concludes the paper.
2. Windows 2000 Performance Subsystem

Windows 2000 collects statistics about system resource usage, such as disks, memory, processors, and
network components through the use of performance counters. In addition, applications and services running
on the system can also extract the performance information relating to their functionality through performance
counters. In Windows 2000, the method to access and view these counters can be through the Windows
Performance Monitor tool that ships out-of-the-box, via other performance monitoring applications, or by using
a low level programming interface. Figure 1 shows an overview of the Windows performance subsystem.
Windows 2000 Performance Sub
User def
a
Figure 1
Windows NT4 Performance Syste

Custom Performance
Performance log and alerts grap
Monitor Tool - MMC snapin A
The operating system exposes performance data for system resources through the system performance
libraries. Counter values are obtained from these libraries via the RegQueryValueEx() interface. This low level
registry interface to the performance subsystem allows a user to programmatically retrieve raw (unformatted)
performance data at any time. Counter descriptions and help texts are also available through the PDH.DLL
registry
interface. The registry interface is optimized for fast data retrieval and therefore favors returning raw performance
data instead of formatted performance data such as rates, or percentages. Although this minimizes the
RegQueryValueEx() WMI
performance overhead in collecting the data, applications that wish to access and view this data are required to
understand and parse the format of the raw returned data.
Perflib
To complement this low level interface and to offer more convenient retrieval functions, Windows 2000
provides the Performance Data Helper (PDH) set of programming interfaces. These functions provide the ability to
set up queries that specify a list of the performance counter names to be retrieved. PDH functions also include the
capabilities for writing the retrieved data to a file in binary or in text format for post processing. Also introduced in
Windows 2000 is the ability to retrieve raw performance data using Windows Management Instrumentation (WMI)
interfaces, Microsoft’s implementation of the Desk Top Management Force’s WBEM architecture (Web Based
Enterprise Management). As depicted in Figure 1, PDH offers an option to obtain performance counter data either
through the registry interface orSystem
through WMI. Performance Sysm
Performance Extension DLL’S Serv
2.1 Performance Objects and Counters
DLL’S
When administrators use performance tools to retrieve performance data from the operating system, services
or applications, the system collects the data from the appropriate system object managers. These object
managers include for example the Memory Manager, the input/output (I/O) subsystem or the performance
dynamic link libraries (DLL) that are provided in conjunction with the service or application. Windows 2000 (and
Windows® NT) defines the performance data that it collects in terms of objects, counters, and instances. Objects
relate to the performance data that a component manages, e.g. the Memory Manager manages memory objects.
Counters for an object describe the performance characteristics specific to that object, for example, the amount of
physical memory available to processes running on the system. Instances can exist for certain objects, e.g. the
process object has multiple instances, each representing a different process currently running on the system.
By default, Windows 2000 installs numerous performance objects corresponding to hardware components
and other resources in the system. The following table shows a summary of the default number of performance
objects installed by the Windows 2000 family of operating systems. The numbers shown in the table may increase
if there are subsequent applications (that provide performance counters) installed on the system. For more
detailed information about the Windows 2000 performance sub-system, readers should refer to [2].
Table 1 – Performance counters available in Windows 2000
Windows No. of No. of counters
Operating counters including
System excluding instances
instances
Windows
2000 Pro
587 3797
Windows
2000
Server 924 6832
Windows
2000 adv.
Server 1039 10211
3. Monitoring Practices
As shown in Table 1, there are a large number of counters and counter instances available on Windows 2000
systems. So how does one determine what counter data to monitor and collect? The answer to this question relies
on the target environment and its intended functionality. Devising general guidelines for selecting appropriate
counters and monitoring policies for a wide range of systems is a difficult task. In this paper, we start by
categorizing the workloads for common configurations.
System configurations and server usages vary dramatically based on the requirements of the business,
ranging from the need for a small number of servers to support a small business to monitoring and collecting data
from thousands of servers in an enterprise environment. Regardless, based on the Windows architecture and the
performance subsystem, there are key items to take into consideration that can assist greatly in determining the
best practices for monitoring a particular environment. Ultimately, the key is to monitor the system resources
being consumed by the application or service of interest. Therefore understanding the workload for the target
environment should precede all the other steps.
3.1. Understanding workload

Generally, the server resource usage is characterized by the workload it supports. Additionally, services —
software programs or processes that perform specific system functions to support other programs particularly at a
low level — also place demand on system resources as they are running. In addition to the monitoring resource
requirements defined by the system, it is also important to monitor the service activities to gauge how well the
services are functioning. Many services running under Windows 2000 Server also provide performance counters
that can be used to measure service activity and resource utilization.
Table 2 Objects to FOCUS Monitoring Based on Server Workload
Server workload Heaviest resource usage Objects to monitor

Disk, Memory, Process, Cache, Disk, Memory, Process, Redirector, Processor, and
Application servers
Processor System
Servers used for Disk, Processor, Memory, Memory, Disk, System, Server, Process, Processor, and
backups Process, Network Network Interface
Physical Disk, Logical Disk, Processor, Network Interface and
System
Database servers Disks, Processor and Network
If using Microsoft SQL Server™, see the product documentation
for information about installed performance objects.
Memory, Processor, System, Network Interface, protocol
counters (TCP, UDP, ICMP, IP, NBT Connection, NetBEUI,
NetBEUI Resource, NetBIOS,), Physical Disk, and Logical Disk
Memory, Processor, Network,
Domain controllers
and Disk
For Active Directory, also monitor the NTDS and Site Server
LDAP Service objects; for Windows 2000 servers, monitor the
Browser object. As applicable, monitor DNS or WINS objects.
Memory, Physical Disk, and Logical Disk, Network Interface,
Server, Server Work Queues
File and print Memory, Cache, Disk, and
servers Network
For print servers, use the Print Queue object for monitoring
queue activity.
Memory, Cache, Processor, System, Physical Disk, and Logical
Disk
Mail/messaging
Processor, Disk, and Memory
servers
If using Microsoft Exchange, see the product documentation for
information about installed performance objects.
Cache, Memory, Processor, Physical Disk and Logical Disk,
Network Interface
Disk, Cache, Processor,
Web servers Memory and Network
If using Internet Information Services, see the product
components ESP Redirector
documentation for information about installed performance
objects.
3.2. Selecting counters to monitor
Table 2 lists the typical server workloads, the resources affected and the performance objects recommended
for monitoring. In addition to the resources listed in Table 2, monitoring the activity of the Memory, Processor and
Disk components is a good practice in all server configurations. Table 3 shows the minimum counters
recommended for general server monitoring. Later in this paper will be a through discussion of what values could
be considered generally acceptable. They are offered as a general discussion and are no substitute for good
baseline data gathered in your environment.
Table 3 Counters for monitoring servers
Performance
Component aspect being Counters to monitor
monitored
Physical Disk\ Sec/Read
Physical Disk\ Sec/Write
Physical Disk\ Disk Reads/sec
Physical Disk\ Disk Writes/sec
Physical Disk\ Avg Queue Length Read
Disk Usage Physical Disk\ Avg Queue Length Write
Physical Disk\%Idle Time
Logical Disk\ % Free Space
Interpret the % Disk Time counter carefully. Because the _Total instance of this
counter may not accurately reflect utilization on multiple-disk systems, it is
important to use the % Idle Time counter.
Physical Disk\ ALL COUNTERS
Disk Bottlenecks System\File Control Operations/sec
System\File Data Operations/sec
Memory\ Available Bytes
Memory\ Cache Bytes
Memory Usage
Memory\%Committed Bytes in Use
Memory\Pages Input or Reads/s
Memory\ Pages/sec
Memory\ Page Inputs or Reads/sec
Memory\ Page Output or Write/sec
Memory\ Transition Faults/sec
Memory\ Pool Paged Bytes
Bottlenecks or
Memory Memory\ Pool Paged Resident Bytes
leaks
Memory\ Pool NonPaged Bytes
Although not specifically Memory object counters, the following are also useful for
memory analysis:
Paging File\ % Usage object (all instances)
Cache\ Data Map Hits %
Processor\ % Processor Time (all instances)
Processor\% DPC Time
Processor Usage Processor\% Interrupt Time
Processor\% Privileged Time
Processor\% User Time
Processor\ % Processor Time (all instances)
Processor\% DPC Time
Processor\% Interrupt Time
Processor\% Privileged Time
Processor Bottlenecks Processor\% User Time
System\ Processor Queue Length (all instances)
Processor\ Interrupts/sec
System\Context switches/sec
System\System Calls/sec
3.3. Collection-monitoring policy
Depending on the configuration, collecting performance data can be done in two ways.
Centralized data collection (that is, collecting performance data from remote systems to a
centralized repository) is simple to implement because only one logging service needs to be
running in the system hosting the centralized repository. However, this scheme may be
constrained by available memory on the logging system. Furthermore, frequent updating can add
undesired impact to network traffic. Hence, centralized monitoring is useful for a small number of
servers (25 or fewer).
Distributed data collection (that is, data collection that occurs locally on the individual
computers) does not incur the memory and network traffic problems of centralized collection
scheme. However, it does result in delayed availability of the data, requiring that the collected
data be transferred separately to the administrator's system for review. This type of monitoring is
useful if the network is likely to be part of the problem because it isolates the computers from the
network during data collection. However, it should be noted that local monitoring creates more
disk traffic on each monitored computer.
3.4. Sampling interval

The sampling interval has a considerable impact on the monitoring overheads (in terms of
both the server performance and the network traffic) and data file size. Administrators should
determine the appropriate sampling interval based on the purpose and duration of the monitoring
process, the number of counters being monitored, and the network and server constraints of the
target environments.
For routine monitoring, 15-minute logging intervals are recommended. When administrators
are looking to resolve a specific issue, the sample interval should be reduced to a time interval
that highlights the problem. For monitoring the activity of a specific process at a specific time, a
short sampling interval is preferable for detailed measurements; however, for monitoring a
problem that manifests itself slowly, such as a memory leak, a longer sample interval should be
used.
Administrators should also consider the overall length of time to be monitored when choosing
the sampling interval. Sampling every 15 seconds is reasonable if the collection period lasts for
no more than four hours. If monitoring is to continue for eight hours or more, an interval shorter
than 300 seconds (five minutes) may result in a file too large to be analyzed by an application in a
reasonable amount of time. In addition, setting the sample interval to a frequent rate (a value < 5
seconds) not only causes the system to generate a large amount of data but also increases the
overhead of running the collection process, such as the Windows Performance Logs and the
Alert service.
Monitoring a large number of objects and counters may also generate large amounts of data
and consume considerable disk space. It is important to maintain a balance between the number
of objects monitored and the sampling interval to keep the expected log file size within
manageable limits. On the other hand, if a long sampling interval is used during logging, it may
not be possible to determine from the collected log file the data fluctuations that occur between
those intervals. The two techniques use to address these considerations are two use a circular
log to collect data or set the log up to close at a certain size and automatically start a new one.
In previous Windows NT® versions, disk counters are not available by default. The utility
diskperf needs to be run with the –y or –yE option to enable these counters. In Windows 2000,
Physical Disk counters are enabled by default, whereas Logical Disk counters are not. To obtain
performance counter data for logical drives or storage volumes, the diskperf -yv command at the
prompt is used. This causes the disk performance statistics driver used for collecting disk
performance data to report data for logical drives or storage volumes. By default, Windows 2000
uses the diskperf -yd command to obtain physical drive data. For more information about using
the diskperf command, readers should type diskperf -? at the command prompt. Note that in
Windows XP and the forthcoming Windows .NET server releases this command will no longer be
necessary as physical and logical disk counters are enabled by default.
3.5. Establishing a Baseline

As mentioned in the introduction, a baseline is a set of measurements that are derived from
the collection of data over an extended period during varying but typical types of workloads and
user connections. The baseline is an indicator of how individual system resources or a group of
resources are used during periods of normal activity.
When determining the baseline, it is very important to know the types of workload and the
days and times of operation. This will help associate workload with resource usage and
determine the “reasonableness” of performance in those intervals. For example, if performance
degrades for a brief period at a given time of day, and at that time many users are logging on or
off, it may be considered an acceptable slowdown. Similarly, if performance is poor every evening
at a certain time and it is known that that time coincides with nightly backups, again that
performance loss may be expected. Still, such conclusions should only be drawn when the
degree of performance losses and their causes are known.
If the collected performance data over a reasonably long period of time reflects periods of
low, average and peak usage, a subjective decision should be made as to what constitutes
acceptable performance for the system being monitored. That performance is then the baseline.
Then, the baseline can be used later to detect performance bottlenecks or to watch for long-term
changes in usage patterns that require an increase in capacity.
3.6. Storing the performance data

The logged data must be stored and retained for possible analysis and comparison at a later
time. Storing the data in a database is preferable in that the information can then be queried and
used in reports with ease. Using database analysis tools, results can be analyzed and examined
in detail with respect to a variety of parameters. Trend analysis and capacity planning based on
the performance data can then be performed.
3.7. Analyzing and troubleshooting performance issues

The baseline establishes the typical counter values one would expect to see when the system
is performing under normal operating conditions. This section provides guidelines to interpret the
counter values and disregard false or misleading data.
Administrators should not give too much weight on occasional spikes in data. These may be
due to the startup of a process and are not an accurate reflection of counter values for that
specific process over time. Administrators should allow for a ”warming-up” period before
measuring for the baseline, especially if applications and services are being started, or while the
operating system is booting up. Because of the temporarily high values, this “warming-up” period
tends to skew overall performance results.
When analyzing performance data collected over a period of time, view the entire collection
and compare to the baseline. This process will highlight deviations from the baseline which can
then be investigated. Even without the baseline being present, comparing all activity just before
and after a performance issue has manifested itself can isolate the problem quickly in certain
cases.
Identify processes that are consuming most resources by performing a Pareto (80-20 rule)
analysis and determine if they fall within normal operating conditions.
Deviations from the baseline are the typical indicator of performance problems. All counter
deviations from normal operating ranges should be investigated. However, as a secondary
reference, the following section describes recommended thresholds for the key Memory, Disk,
Network, Processor and System counters. The figures below represent problems in a general
environment. When problems occur, checking for these conditions may result in easy
identification of a performance problem. These rules should be considered as a guideline and be
taken in context with other measurements. Please excuse the format of the observations below,
they are in a note format.
4. Summary
This paper has covered some of the basic fundamentals in developing best practices for
performance monitoring of Windows 2000 systems. It has provided the reader with an insight into
the Windows 2000 performance subsystem and offered simple practices for monitoring and
troubleshooting performances issues on windows 2000 systems. Further reading on performance
monitoring can be obtained from the material sited in the references.
References
[1] Windows 2000 Server Resource Kit, Microsoft Press
[2] Windows 2000 Platform SDK, Microsoft Developer’s Network (MSDN)
[3] Understanding the Windows NT Disk I/O Subsystem, Jee Fung Pang & Melur
Raghuraman, CMG’98 Proceedings.
[4] Inside Windows NT, Second Edition, David A. Solomon, 1998. Microsoft Press.
[5] Advanced Windows NT, Third Edition, Jeffrey Richter, 1997. Microsoft Press.
© 2001 Microsoft Corporation. All rights reserved. Microsoft, Windows and Windows NT
are either registered trademarks or trademarks of Microsoft Corporation in the United States
and/or other countries. Other product and company names mentioned herein may be the
trademarks of their respective owners.
NOTES:
First, NEVER look at a NT 4.0 performance monitor log with Windows 2000 System Monitor
“sysmon.exe”. Use System Monitor from XP or Server 2003. Also there are clear benefits in
using the latest version of System Monitor to read even Server 2000 logs. So use XP or Server
2003 for all displaying of log data no matter what format.
Also, it is assumed that all of the concepts and fundamentals discussed in the troubleshooter are
known to the System professional. For further in depth background knowledge, please consult
the following book: Inside Windows NT, third edition is available. While the entire book is highly
recommended, the following chapters are the most relevant to this document:
• Processes and Threads

• Memory Management
• Cache Manager
• The I/O System
It is very important to note that while many of the old troubleshooting techniques are valid in the
Windows 2000 OS, many of the old threshold values that we observed in the NT 4.0 are too low
to indicate any issue when applied to Windows 2000. The most effective way of developing
accurate threshold values is developing your own baseline data. These reported threshold
values were developed by observing the internal testing of Windows 2000. Base line information
is important, so always keep the logs of well performing systems as a baseline.
Second, system monitor is an external view of the machine; it does not allow any insight into the
kernel operation although it does allow us to see some information concerning each user
process. Simplifying the state of the machine will make trouble shooting a easier task. The
simple method of replacing or running without specific drivers may lead to a quicker resolution
than trying to interpret several logs. IIS and content indexing are examples of services should be
disabled if not needed. Other third party services that should be disabled are screen savers,
virus scanners, file replication agents and any other file system or network services that you do
not understand the function of or cannot establish their function through research.
Third, a system monitor log is an overview of the internal state of the machine. It does not inform
the engineer what the end users observe. Without a detailed explanation of exactly what the end
users see and the administrators observe directly in event or error logs it is difficult to reach any
conclusion. Example: The End User’s complaint is that the machine is slow. Without other
information, it is impractical to tell if the CPU, memory or disk is the source of the bottleneck.
Without an idea of exactly which actions are slow and exactly when this slowness occurs, the
engineer will not know what to look for. All one can do is look for obvious bottlenecks in the
system such as the ones outlined in the overview below.
• Memory
o % Committed bytes in use
 This value should be stable in long term use over the course of a day
(except in terminal server).
 Any Value over 80% is something to look into, especially if the Commit
Limit changes even slightly.
• If the Commit Limits changes at all, the system has run out of
page file space and has attempted to expand the page file.
However, the system is generally too slow to react to the
expansion.
o Available bytes
 Any value less than 4 MB is an issue, very short time periods (10sec) not
fewer than 2.5 MB are usually acceptable but still need to be
investigated. Extended periods fewer than 4 MB generally mean that the
system is out of physical memory. Extended periods less than 2.5MB
certainly means that the system is out of physical memory.
 Memory cannot be added as a solution, blindly. Further investigation is
needed to determine what is consuming physical memory.
 Remember that extensive paging will occur at this point and the system
will slow down. Paging is the result not the cause of this activity.
 The engineer will need to next investigate the Cache, paged pool,
nonpaged pool and then EACH process in that order.
 We tend to cache everything so on a server with many megabytes of free
memory, this memory will actually be used in cache,
o Cache bytes
 We use this counter not to look memory issues but to look at disk and
process issues. The system will rob pages from this to service other
memory requests but high numbers here usually mean a disk bottleneck.
It is important to think of this as an available pool the system can rob
from if necessary.
 Low numbers are 30 to 100 MB, high are numbers over 400 MB. High
numbers raise a flag, which will need to be investigated.
 Limit is 960 MB in 2000, except occasionally, the limit is 512 MB.
• Terminal server, /3gb boot flag, over 16 gig physical memory.
• /PAE will allow the use of almost ALL memory; 6GB caches have
been reported
 This counter NEVER indicates a memory leak. On file servers we would
expect this counter to rise when the CPU counter goes up and decrease
when the CPU goes down.
o Commit limit
 This counter should never change, any changes indicate a page file that
is too small and has been extended. If this happens, investigate by
looking at all of the process memory counters.
 Many process do not wait on the page file extensions to occur and do not
gracefully handle the rejection of a memory request.
 Disk space is cheap, recommend a major expansion of the page file and
relog the data. Note that this is completely opposite of what is
recommended when the engineers see limits in the physical memory
case.
o Committed Bytes
 This counter is the sum of each process’s total written bytes in virtual
space. As such, it is the total bytes committed to the disk system and
should be relatively stable on a multiday log. It should go up each day
and down each night. If the counter continues to rise each day, start
looking for memory leaks. Other possible explanations could be the case
of new user sessions being connected each day and old ones not going
away. Possible causes of inflated numbers are bad system or
application design as well as memory leaks.
 Next step is to investigate each process by using the process object in a
detailed chart.
o Free System page table entries
 Totally ignore unless this drops below 10,000, consult inside Windows
NT for further explanation. Usually exhaustion of Free System page
table entries will generate a blue screen but could generate a mysterious
hang.
o Generally, the system will not run out of system pages
due to a major enhancement of the available system
pages in Windows 2000. Please note that this enhance
applies only on systems with more than 256 MB of
memory; systems with less than 256 MB will exhibit
exactly the same behavior found on Windows NT 4.0.
o You also get NT 4 behavior with the /3GB switch.
o Pages faults/second
 It is important not to confuse this counter with the following two counters.
This is a soft fault counter, generally not an issue with the system. Soft
faults are only memory references to another page existing in memory.
These memory references are quite fast and do not result in any
performance penalty. Remember, high numbers need to be investigated
but usually do not mean anything unless the Pages/second or Pages
input /sec are high. Page faults can range all over the spectrum with
normal application behavior; values from 0 to 1000 per second can be
normal. This is where a normal baseline is essential to determine the
expected behavior. The event logs are also useful.
 Look at context switches/second for supporting behavior. If this counter
is high, look for a specific process to demonstrate high CPU or other
unusual behavior.
o Pages/second
 Investigate if over 40 pages per second on a system with a slow disk,
usually even 200 pages per second on a system with a fast disk
subsystem may not be an issue. Please note that values of 5 to 20
pages that appear in many other sources of documentation are out of
date.
 Always break up this counter in pages output and pages input separately
if the counter is above 40/second.
 Pages/sec is the number of pages read from the disk or written to the
disk to resolve memory references to pages that were not in memory at
the time of the reference. This is the sum of Pages Input/sec and Pages
Output/sec. This counter includes paging traffic on behalf of the system
Cache to access file data for applications. This is the primary counter to
observe if you are concerned about excessive memory pressure (that is,
thrashing), and the excessive paging that may result. This counter,
however, also accounts for such activity as the sequential reading of
memory mapped files, whether cached or not. The typical indication of
this is when you see high number of Memory: Pages/sec, a "normal"
(average, relative to the system being monitored) or high number of
Memory: Available Bytes, and a normal or small amount of Paging File:
% Usage. In the case of a non-cached memory mapped file, you also
see normal or low cache (cache fault) activity.
 Pages output/second are an issue only when disk load becomes an
issue. Remember, these are pages written to by an application and
need to be backed out to the pagefile. This is not resource intensive and
as long as disk write time for the logical partition does not exceed 30%,
you should not see any system impact. The correct method of
observing the disk by its write time is by looking at its inverse counter.
In this case, the disk idle time should be 70% or greater.
o Pages input /sec
 Separate pages output/sec from pages input/sec. Pages output will
stress the system but the application will not see this. An application will
only wait on an input page and the engineer will need to know what the
application tolerance for waiting on pages input will be. For example,
SQL and most applications will tolerate very few pages input while
exchange will do much better. Again, you will need a good baseline to
compare to.
 If you suspect paging is the issue, the best threshold value to use is disk
read time on the logical disk holding the page file. Look for values of disk
read time of less than 15% and transfer times of less than 20 msec that
would tend to indicate paging is not an issue.
 If you have no other information, use the general rule that paging 20
pages input/sec per spindle will not slow down most applications.
o Pool Nonpaged bytes
 Here, the engineer is looking for two separate behaviors. First, memory
leaking behaviors, i.e. increasing non-paged pool usage. The
assumption is that nonpaged pool memory should reach a stabilizing
value after some operating time, generally two or three hours. Generally
this is true but not always. For example, adding many new users on a file
server will generate an increase in nonpaged pool usage but this is cyclic
behavior and we would expect this memory usage to drop as the users
log off.
 Be very suspicious of any deltas (sudden changes or spikes) in either
pool counters. This could be normal behavior but usually it is not. Try to
find any process or thread from the system process that increases CPU
when these deltas occur.
 Also, compare the deltas against process memory usage, IO usage and
handle usage.
 Excessive pool usage needs to be investigated if much higher than 30
MB. A typical pool usage would be 3 to 30 MB except for terminal server
or streaming video or audio.
 If usage is excessive, look at drivers, services, and then the system
process as sources of the problem.
 Limit is 256 MB although even a debug will show larger values.
 /3gb and TS limit non-paged pool to 128 MB.
o Pool Paged bytes
 Looking for the same behaviors as above but remember that applications
as well as services and drivers use nonpaged pool.
 Although the maximum amount of paged pool is approximately 335 MB
by default, the system will allocate only half as a maximum, i.e.,
approximately 165 MB. There is a KB: Q247094 as well as a public
website on memory tuning terminal servers that apply to any machine
with over 256 MB of memory. You will double available page pool bytes
by setting the page pool memory to FFFFFFFF (approximately 335 MB)
in the registry. If you wish to get the maximum paged pool for your
system, set page pool memory to FFFFFFFF.
 192 MB on /3gb.
 The largest prime users of page pool is the registry which can be quite
large on domain controllers,
 Next the prime users of page pool are file servers, especially when
serving out profiles. Also, terminal servers with many connections and
print servers with 10’s of thousands of print jobs pending are major users
of system pool. Printer queues use little space, it is the printing that take
the space.
 Expect to see 10 to 30 MB used plus what ever is needed for the above
tasks.
• Network Interface or other NIC stack counters

o Bytes total /sec
 Sanity check to verify if the system is under any load and if the load
changes with observed memory or CPU behavior.
o Current Bandwidth
 Check for saturation, 37% for random I/O especially SMB but can get to
70% for dedicated Winsock applications.
o Packets received discarded
 Sanity check: Hardware problems
o Packets received errors
 Sanity check: Hardware problems
• Physical Disk For Each Disk

 One has to investigate the data points closely. This is done by moving
the time windows CLOSER together in order to display each of the disk
counter’s data points. When the display shows less than 100 points and
the graph does not go to the right margin, the engineer knows that every
point has been displayed. At that point, one moves both time windows
across the graph until the engineer has had the opportunity to observe
each data point. This is tedious so one only does this if one suspects an
issue with the disk while looking at an overall trend chart, which covers
the entire time period. If one suspects a disk bottleneck, the engineer
will have to investigate all the data points in order to determine the
behavior of the disk subsystem over short time intervals. This is
important because many systems will not recover from disk bottlenecks
of 10 to 20 seconds long. If the engineer does not investigate each disk
data point, he may miss the disk issue as being the cause of some other
observable problem. However, this observation does not really apply to
terminal servers since their applications are used to waiting on the disk,
but user response time may suffer.
 Note that the counter values explained below, like all thresholds, are just
rules of thumb. This is especially true with disk drive subsystems.
 An example is on the next page of how the time interval affects the
apparent disk activity.
• Graph using 20 second time intervals on log.
• Graph using 4 second time intervals on log.

• Please note that there is a wider variation in the second log. This graph shows many
points where the system is hitting its throughput limit. Both of these logs were made of
the same system undergoing the same workload.
o % Idle Time
 This counter accurately determines the saturation of the disk subsystem.
Some installations (especially exchange servers and IIS servers) are
designed to make the disk subsystem the bottleneck which is the most
cost effective to build a large system. But in general, we are looking for
some idle time to be present. Disk idle time not disk read or write time is
the counter that accurately describes how busy the disk subsystem is.
The engineer cannot use the disk read or write time counters on
computer systems in order to determine how busy the system is because
they will commonly read more than 100%. Essentially write times
overlap so disk write times of 200% to 2000% are common on high
performance disk drive subsystems.
 % Idle Time is independent of the number of spindles and is independent
of the number of simultaneous I/O’s. A very important note, this vastly
simplifies our analysis.
 Thresholds are dependent on server roles.
• Application servers like 70% idle time.
• File and print servers like 50% idle time.
• Batch servers (like exchange) may show < 30% idle time.
o Avg Disk Queue Length
 Again, on these two counters, there are a lot of outdated values in
existing documentation.
 Again RAID “write caches” require the engineer to evaluate read and
writes separately.
 Less than 2 plus the number of spindles is an excellent value.
 Less than double the number of spindles is a good value.
• This requires further investigation of the disk transfer time in
order to see whether disk queue length would actually impact the
system.
 Less than triple the number of spindles is a fair value.
o Current Queue Length

 Again, on these two counters, there are a lot of outdated values in
existing documentation.
 Again RAID “write caches” require the engineer to evaluate read and
writes separately but this counter does not allow the engineer to split
writes from reads so the counter is used to support conclusions
determined from the other counters.
 This counter is an instantaneous counter, not an average one. If the
engineer suspects dynamic variation in disk loads, this is the counter
(s)he homes in on in the evaluation.
 Less than 2 plus the number of spindles is an excellent value.
 Less than double the number of spindles is a good value.
• This requires further investigation of the disk transfer time in
order to see whether disk queue length would actually impact the
system.
 Less than triple the number of spindles is a fair value.
o Avg Disk Sec/Transfer
 This is perhaps the most important counter because it is what the
application actually sees.
 Caching controllers require evaluation of reads separately from writes.
 Disk Transfer Times are rule of thumb.
• Excellent < 12 Msec
• Good < 20 Msec
• Fair < 30 Msec
• Poor < 40 Msec
• Cache/Exec < 1 Msec
• Cache/Good < 2 Msec
• Cache/Fair < 4 Msec
o Disk transfers/sec
 Current technology of disk drives show the following limits:
• 180 Sequential Transfers per 10,000 RPM of disk drive
o Some drives with good predictive read ahead will reach
180 Sequential Transfers per 10,000 RPM
• 60 Random Transfers per 10,000 RPM of disk drive
o It is necessary to know the disk speed and the type of
I/O in order to determine the maximum throughput.
o Hard and fast for database work
• 60-80 Transfers a second as a general rule of thumb as most I/O
is pseudo-random.
• For database apps be VERY conservative
• Caching disk drive controllers nullifies this for writes only and
yields a gain from 4x to 10x in write transfers only.
• The above listed limits are per spindle, not an overall limit for a
RAID set. Due to RAID set design, the limit or RAID set
throughput are somewhat difficult to calculate. Below is a
summary of the Disk I/O per second generated for each type of
RAID configuration based on a given number of reads and writes
per second.
o RAID 0: READS + WRITES = I/Os / sec
o RAID 1: READS + (2*WRITES) = I/Os / sec
o RAID 5: READS + (4*WRITES) = I/Os / sec
o RAID 0+1: READS + (2*WRITES) = I/Os per second.
o See the following whitepapers from your preferred
hardware RAID vendor for a detailed explanation of how
to observe Disk bottlenecks and to calculate disk I/O
limits.
o Split I/Os
 Should be close to ZERO; if not
• The stripe for RAID is TOO small
• For example statistical variation demands that 12k average I/O is
the maximum size a 16k stripe set can take.
• the disk is heavily fragmented or free space is too small
o Percent Free Space
 Never less than 15% free
 Try to maintain at least 25% free
 You will need 30%-40% free on highly dynamic systems.
• Process for the Total Instance only

o TOTAL INSTANCE
 Handle count
• References to objects used by a process. A VERY important
counter to use to correlate with memory for all memory issues.
• Correlation with pool leaks are very helpful. Major increases are
caused by handle leaks.
 Page Faults/sec
• For performance issues, one needs to look at pages
input/second but these counters are not available on a per
process level so we have correlate the process page fault
counter with the overall page fault counter and determine the
process pages input and output by looking at the overall pages
input and output.
 Private bytes
• Private bytes are not related to pool bytes in any way but very
commonly code paths within an application that leak private
bytes will leak pool bytes as well. This counter is a key in
looking for the source of pool leaks.
• Remember that this is a virtual byte counter and does not directly
relate to the actual physical memory used by the application.
This is memory allocated and written to by an application so this
counter does not directly indicate to the engineer any physical
memory stress. You should use the working set counter for
physical memory stress but will need to look at private bytes in
order to determine whether physical memory stress is expected
behavior. For example, if private bytes hold steady and working
set goes up, then the application is not requesting more memory,
the application is just touching more memory. This is not a
memory leak, the working set is going up simply because the
application is getting busier. You can confirm this by looking at
the CPU utilization for that application.
 Virtual bytes
• Memory reserved by the application to be used. ALWAYS review
this, it is possible to have an application memory leak when it is
nothing more than an application behaving badly. Understand
when and how virtual bytes go up and then determine whether
this is a “leak” or not.
 Working set bytes
• The portion of private bytes resident in physical memory that is
owned by one application.
• Will go up and down in normal usage, the secret is determining
why. You can only do this with additional applications’
information, problem description, etc.
 Pool Nonpaged bytes
• This may correlate with system pool usage, if it does, then you
have found the process that is leaking memory. If no process
correlates to the overall system pool usage; that does not mean
that one of the applications is not causing the problem.
 Pool Paged bytes
• This may correlate with system pool usage, if it does, then you
have found the process that is leaking memory. If no process
correlates to the overall system pool usage; that does not mean
that one of the applications is not causing the problem.
o DETAIL
 Add all % processor Time CPU counters
• Remember the thresholds mentioned earlier for the processor
CPU counters.
 Add Thread counts
• See handle leaks, bad application programming practice can
cause this as well.
• Processor For EACH processor and overall if numbers are high
o % DPC Time
 Very important counter. This counter, along with the counter below,
shows the amount of time sustained doing I/O.
 Time required to complete a I/O.
 This counter is rolled up in the system counter but will not show up in any
of the system’s thread counters. So look for confirmation there (system
process CPU vs. system threads CPU).
 Look for variations in this counter; sometimes a hardware device will get
blocked yielding a delta increase in this counter.
 Threshold is 15%.
o % Interrupt Time
 Very important counter, this counter, along with the counter below,
shows the amount of time sustained doing I/O.
 Time required to setup an I/O request.
 This counter is rolled up in the system counter but will not show up in any
of the system’s thread counters. So look for confirmation there (system
process CPU vs. system threads CPU).
 Look for variations in this counter; sometimes a hardware device will get
blocked yielding a delta increase in this counter.
 Threshold is 10%.
o % Privileged Time
 The time the operating system kernel is doing work.
 Usually the threshold is less than 30%.
 45% is a better value for pure print and file servers.
o % Processor Time
 The overall processor time can go over 100%. The limit is usually 100%
times the number of processors in the box.
 Some systems are by nature CPU bound. CPU bound systems are not
necessarily a bad thing.
 Any sustained CPU time over 90% is the same as 100%.
o % User Time
 The time the system is doing useful work on behalf of a process or a
user.
 80% is good; 60% is only fair.
• System
o Context Switches/sec
 A context switch occurs when a processor switches from running one
thread to another thread.
 Do not forget to divide by number of processors
 1000 – 3000 per processor is the range from excellent to fair.
 6000 or greater is considered poor. Upper limit is about 20,000 at 90 %
CPU
 3000 – 9,000 per processor for Terminal Server is allowed due to
blocking that naturally occurs.
 Abnormal high rates can be caused by page cache faults due to memory
starvation.
 Abnormally high rates are usually caused by an application memory
issue around heap memory allocations or another resource is being
blocked. Further determination of the cause of the issue requires a good
base line to compare against.
o Processes
 Sanity checks to determine number of processes to determine if it is
increasing or decreasing.
o Processor queue length
 Only one queue for all processors.
 On standard servers with long quantum’s (see KB on Quantum’s in
Windows2000):
• 4 or less per CPU is excellent
• < 8 per CPU is good
• < 12 per CPU is fair
 On terminal servers which have short quantum:
• 10 or less per CPU is excellent
• < 15 per CPU is good
• < 20 per CPU is fair
o Threads
 Sanity check to determine number of threads, especially to determine if it
is increasing or decreasing.
Appendix – Definition of counters referenced in this paper
The information in this appendix is a subset of that provided in the Windows 2000 Server
Resource Kit: Supplement 1 Performance Counters Reference guide. It is given here to
provide context for the topics discussed in the paper.
Process Object
The Process performance object consists of counters that monitor running application program
and system processes. All the threads in a process share the same address space and have
access to the same data.
Counter
Description
Name
Private Shows the current number of bytes that this process has allocated that cannot be
Bytes shared with other processes.
Handle Shows the total number of handles currently open by this process. This number is the
Count equal to the sum of the handles currently open by each thread in this process.
Shows the current size, in bytes, of the virtual address space that the process is using.
Virtual Use of virtual address space does not necessarily imply corresponding use of either
Bytes disk or main memory pages. Virtual space is finite, and by using too much, the process
can limit its ability to load libraries.
Shows the maximum size, in bytes, of virtual address space the process has used at
Virtual any one time. Use of virtual address space does not necessarily imply corresponding
Bytes Peak use of either disk or main memory pages. However, virtual space is finite, and the
process might limit its ability to load libraries by using too much.
Processor Object
The Processor performance object consists of counters that measure aspects of processor
activity. The processor is the part of the computer that performs arithmetic and logical
computations, initiates operations on peripherals, and runs the threads of processes. A computer
can have multiple processors. The processor object represents each processor as an instance of
the object.
Counter Name Description

Shows the percentage of time that the processor spent receiving and servicing
deferred procedure calls (DPCs) during the sample interval. DPCs are interrupts
% DPC Time that run at a lower priority than standard interrupts. % DPC Time is a component of
% Privileged Time because DPCs are executed in privileged mode. They are
counted separately and are not a component of the interrupt counters.
Shows the percentage of time that the processor spent receiving and servicing
hardware interrupts during the sample interval. This value is an indirect indicator of
the activity of devices that generate interrupts, such as the system clock, the
% Interrupt mouse, disk drivers, data communication lines, network interface cards, and other
Time peripheral devices. These devices normally interrupt the processor when they
have completed a task or require attention. Normal thread execution is suspended
during interrupts. Most system clocks interrupt the processor every 10
milliseconds, creating a background of interrupt activity.
% Privileged Shows the percentage of non-idle processor time spent in privileged mode.
Time Privileged mode is a processing mode designed for operating system components
and hardware-manipulating drivers. It allows direct access to hardware and all
memory. The alternative, user mode, is a restricted processing mode designed for
applications, environment subsystems, and integral subsystems. The operating
system switches application threads to privileged mode to obtain operating system
services. % Privileged Time includes time spent servicing interrupts and DPCs. A
high rate of privileged time might be attributable to a large number of interrupts
generated by a failing device.
Shows the percentage of time that the processor is executing application or
operating system processes other than Idle. This counter is a primary indicator of
% Processor processor activity. It is calculated by measuring the time that the processor spends
Time executing the thread of the Idle process in each sample interval, and subtracting
that value from 100%. Each processor has an Idle thread which consumes cycles
when no other threads are ready to run.
Shows the average number of hardware interrupts that the processor is receiving
and servicing per second. It does not include DPCs, which are counted separately.
This value is an indirect indicator of the activity of devices that generate interrupts,
such as the system clock, the mouse, disk drivers, data communication lines,
Interrupts/sec network interface cards and other peripheral devices. These devices normally
interrupt the processor when they have completed a task or require attention.
Normal thread execution is suspended during interrupts. Most system clocks
interrupt the processor every 10 milliseconds, creating a background of interrupt
activity.
Memory Object
The Memory performance object consists of counters that describe the behavior of physical and
virtual memory on the computer. Physical memory is the amount of random-access memory
(RAM) on the computer. Virtual memory consists of space in physical memory and on disk. Many
of the memory counters monitor paging, which is the movement of pages of code and data
between disk and physical memory. Excessive paging, a symptom of a memory shortage, can
cause delays which interfere with all system processes.
Counter
Description
Name
shows the ratio of Memory\ Committed Bytes to the Memory\ Commit Limit.
Committed memory is physical memory in use for which space has been reserved
% Committed
in the paging file so that it can be written to disk. The commit limit is determined by
Bytes In Use
the size of the paging file. If the paging file is enlarged, the commit limit increases,
and the ratio is reduced.
Shows the amount of physical memory, in bytes, available to processes running on
the computer. It is calculated by summing adding the amount of space on the
zeroed, free, and standby memory lists. Free memory is ready for use; zeroed
Available
memory consists of pages of memory filled with zeros to prevent later processes
Bytes
from seeing data used by a previous process; standby memory is memory that has
been removed from a process's working set (its physical memory) en route to disk
but is still available to be recalled.
Shows the sum of the values of System Cache Resident Bytes, System Driver
Cache Bytes
Resident Bytes, System Code Resident Bytes, and Pool Paged Resident Bytes.
Page Shows the rate at which the disk is read to resolve hard page faults. It shows
Reads/sec numbers of read operations, without regard to the number of pages retrieved in
each operation. Hard page faults occur when a process references a page in virtual
memory that is not in its working set or elsewhere in physical memory, and must be
retrieved from disk. This counter is a primary indicator of the kinds of faults that
cause system-wide delays. It includes read operations to satisfy faults in the file
system cache (usually requested by applications) and in non-cached mapped
memory files. Compare the value of Page Reads/sec to the value of Pages
Input/sec to find an average of how many pages were read during each read
operation.
Shows the rate at which pages are read from disk to resolve hard page faults. Hard
page faults occur when a process refers to a page in virtual memory that is not in
its working set or elsewhere in physical memory, and must be retrieved from disk.
Pages
When a page is faulted, the system tries to read multiple contiguous pages into
Input/sec
memory to maximize the benefit of the read operation. Compare Pages Input/sec to
Page Reads/sec to find the average number of pages read into memory during
each read operation
Shows the rate at which pages are read from or written to disk to resolve hard page
faults. This counter is a primary indicator of the kinds of faults that cause system-
wide delays. It is the sum of Memory\ Pages Input/sec and Memory\ Pages
Pages/sec Output/sec. It is counted in numbers of pages, so it can be compared to other
counts of pages, such as Memory\ Page Faults/sec, without conversion. It includes
pages retrieved to satisfy faults in the file system cache (usually requested by
applications) and non-cached mapped memory files.
Pool Shows the size, in bytes, of the non-paged pool. Memory\ Pool Nonpaged Bytes is
Nonpaged calculated differently than Process\ Pool Nonpaged Bytes, so it might not equal
Bytes Process(_Total )\ Pool Nonpaged Bytes.
Shows the size, in bytes, of the paged pool. Memory\ Pool Paged Bytes is
Pool Paged
calculated differently than Process\ Pool Paged Bytes, so it might not equal
Bytes
Process(_Total )\ Pool Paged Bytes.
Shows the rate at which page faults are resolved by recovering pages that were
being used by another process sharing the page, or were on the modified page list
Transition or the standby list, or were being written to disk at the time of the page fault. The
Faults/sec pages were recovered without additional disk activity. Transition faults are counted
in numbers of faults; because only one page is faulted in each operation, it is also
equal to the number of pages faulted.
Physical Disk Object
The Physical Disk performance object consists of counters that monitor hard or fixed disk drives
on a computer. Disks are used to store file, program, and paging data, are read to retrieve these
items, and are written to record changes to them. The values of physical disk counters are sums
of the values of the logical disks (or partitions) into which they are divided.

Avg. Disk Queue Shows the average number of both read and write requests that were queued
Length for the selected disk during the sample interval.
Avg. Disk
Shows the average time, in seconds, of a disk transfer.
sec/Transfer
Disk Reads/sec Shows the rate of read operations on the disk.
Disk Transfers/sec Shows the rate of read and write operations on the disk.
Disk Writes/sec Shows the rate of write operations on the disk.
Shows the percentage of time that the disk was idle during the sample
% Idle Time
interval.
System Object
The System performance object consists of counters that apply to more than one component of
the computer.

Processor Shows the number of threads in the processor queue. There is a single queue for
processor time even on computers with multiple processors. Therefore, users may
need to divide this value by the number of processors servicing the workload.
Queue Length Unlike the disk counters, this counter shows ready threads only, not threads that
are running. A sustained processor queue of greater than two threads generally
indicates processor congestion.
Shows the combined rate at which all processors on the computer are switched
from one thread to another. Context switches occur when a running thread
voluntarily relinquishes the processor, is preempted by a higher priority, ready
Context thread, or switches between user-mode and privileged (kernel) mode to use an
Switches/sec Executive or subsystem service. It is the sum of the values of Thread\Thread:
Context Switches/sec for each thread running on all processors on the computer
and is measured in numbers of switches. There are context switch counters on
the System and Thread objects.

Performance Monitoring Best Practice

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Performance Monitoring Best Practice

Uploaded by

Copyright:

Available Formats

Performance Monitoring in Windows 2000: Best Practices

2. Windows 2000 Performance Subsystem

Windows NT4 Performance Syste

3.1. Understanding workload

Server workload Heaviest resource usage Objects to monitor

3.4. Sampling interval

3.5. Establishing a Baseline

3.6. Storing the performance data

3.7. Analyzing and troubleshooting performance issues

• Processes and Threads

• Network Interface or other NIC stack counters

• Physical Disk For Each Disk

• Graph using 4 second time intervals on log.

o Current Queue Length

• Process for the Total Instance only

Counter Name Description

Physical Disk Object

Counter Name Description

Counter Name Description

You might also like