Performence

Automating a System Performance Check Using the
checkperf Utility
Victor Feng, October 2008
Introduction
This tech tip provides two scripts: checkperf and runqueue.d. The checkperf script
works on systems that run the Solaris 9 or Solaris 10 Operating System. The runqueue.d
script works on systems that run the Solaris 10 OS.
• Here's the source code for the checkperf script. Please rename this file so it has a
.sh extension instead of a .txt extension.
• Here's the source code for the runqueue.d script. Please rename this file so it has
a .d extension instead of a .txt extension.
The checkperf utility checks system performance in terms of CPU, memory, I/O, and
network TCP. The default warning threshold for each of these items can be changed.
Whenever one of the thresholds is reached, checkperf sends a warning email to a
specified recipient. The email might include suggestions about how to improve system
performance.
cron can be used with checkperf so that you don't have to go to each server to check its
system performance manually. checkperf can be scheduled to run during business hours.
checkperf will not affect system performance. By default, it uses sar to collect statistics
every 5 seconds for 5 minutes.
The minimal interval for which sar is able to collects statistics is one second. If a system
has many processes that take a couple of milliseconds to run, sar will not know that they
are in the run queue. Therefore, if DTrace is installed on the system (for example, if the
system runs the Solaris 10 OS), checkperf calls runqueue.d, which collects run queue
information every millisecond for 30 seconds.
The remaining sections of this tech tip demonstrate how checkperf reacts when a system
has various performance issues. Before we continue, you need to set a few variables in
checkperf:
• DIR: Specifies the directory where checkperf and runqueue.d are located (for
example, /home/<username>/bin)
• LOG: Specifies the file that will contain generated warning messages (for example,
/home/<username>/bin/perf_msg)
• RECEIVER: Specifies the email address of the person who should receive warning
messages (for example, <username@domain.com>)
Note: My system has 32 CPUs. For testing purposes, I turned off 30 of them using
psradm -f 2-31.
CPU Performance Warning
The parameter in checkperf for reporting CPU performance is CPU_UTIL_WARN, and its
CPU utilization warning threshold is set to 80 by default.
If the CPU utilization rate is more than 80%, checkperf checks the threads in the run
queue, checks whether the system has CPUs offline, and sends out email.
We can run dd if=/dev/zero of=/dev/null & to consume CPU resources:
root@host # dd if=/dev/zero of=/dev/null &

[1] 1571
[2] 1572
[3] 1573
[4] 1574
root@host # sar -uq 5 3

15:21:25 %usr %sys %wio %idle
runq-sz %runocc swpq-sz %swpocc
Average 69 31 0 0
Average 2.0 99 0.0 0
root@host # ./checkperf
root@host # more perf_msg
CPU average utilization: 100%(>80%)

There are 30 CPUs offline and use psradm to bring them online
Threads (per second) waiting for CPU to run: 2.0.
Recommend adding 2.0 CPUs to your system. Use prstat -L to see
if running processes have multiple threads so that you may switch to
thread-based-processor machine, such as the Sun Fire T2000 server.
The accurate threads waiting for CPU: 2.1
The "accurate threads waiting for CPU: 2.1" text is generated by runqueue.d, which
provides more accurate information about the run queue.
Memory Shortage Warning
There are two parameters in checkperf for checking memory:
• MEM_FREEPHY_WARN_PERCENT: This is the warning threshold for available

physical memory, and its threshold is set to 20 by default.
• MEM_FREESWAP_WARN_PERCENT: This is the warning threshold for available swap
space (virtual memory), and its threshold is set to 20 by default.
If the available swap space is less than 20%, checkperf also checks whether the total
size of physical swap devices is less than 1.5 times the size of physical memory. As I
demonstrated in a previous article, Impact of Swap Space on System Performance for the
Solaris 9 and 10 OS, the lack of physical swap space affects system performance when a
system is low on physical memory.
Here we will use the myfilltmp.sh script (which is shown in the previous article) to
consume memory:
root@host # ./myfilltmp.sh
root@host # sar -r 5 3
15:34:39 freemem freeswap
Average 122536 6180453
So, free memory is 122536*8/1024, which equals 957 Mbytes, and free swap space is
6180453*512/1024/1024, which equals 3017 Mbytes.
Available physical memory: 937 MB(<3275 MB)
Available swap space: 2956 MB(<3552 MB)
Recommend adding 20465 MB swap device. The total size

of physical swap devices should be 1.5 times physical memory.
I/O Performance Warning
The parameter in checkperf for reporting I/O devices' utilization is IO_UTIL_WARN, and
its I/O utilization warning threshold is set to 80 by default.
Let's generate some heavy I/O load:
root@host # cp myusr.tar myusr.tar2
root@host # sar -d 5 5
Average nfs1 0 0.0 0 0 0.0 0.0
sd1 99 6.8 134 80513 0.0 51.0
IO utilization on sd1: 100%(>80%)

Network Performance Warning
The following factors degrade TCP performance:
• Retransmission: Messages that are lost must be retransmitted.

• Duplicate packets: The local host might receive duplicate packets if it times out
on the original request, issues another request, and then receives the original
packet.
• Listen queues: A listen queue grows as the arrival rate of client requests to a
server exceeds the server's processing rate.
In checkperf, the warning threshold for the retransmission rate is 15% and the warning
threshold for the duplicate packet rate is 15%. The warning threshold for listen queue
drop is 100. Because the testing server does not have any retransmitted messages or any
duplicate packets, and listen queue drop is not greater than 100, the perf_msg file is
empty.
Putting It All Together
Finally, let's perform CPU, memory, and I/O performance checks all together:

root@host # ./myfilltmp.sh
root@host # cp myusr.tar myusr.tar2
CPU average utilization: 100%(>80%)

There are 30 CPUs offline and use psradm to bring them online
Threads (per second) waiting for CPU to run: 3.1.
Recommend to add 3.1 CPUs to your system. Use prstat -L to see
if running processes have multiple threads so that you may switch to
thread-based-processor machine, such as Sun Fire T2000 server.
The accurate threads waiting for CPU: 3.1
Available physical memory: 778 MB(<3275 MB)
Available swap space: 2821 MB(<3517 MB)
Recommend to add 20465 MB swap device. The total size of physical

swap devices should be 1.5 times physical memory.
IO utilization on sd1: 51%(>30%)

Because of the CPU utilization and lack of memory, the average disk utilization was not
able to reach 80%. I decreased the variable IO_UTIL_WARN to 30. From this example, we
can see that CPU and memory can affect I/O performance too.

Performence

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Performence

Uploaded by

Copyright:

Available Formats

Automating a System Performance Check Using the

Victor Feng, October 2008

CPU Performance Warning

We can run dd if=/dev/zero of=/dev/null & to consume CPU resources:

root@host # dd if=/dev/zero of=/dev/null &

root@host # sar -uq 5 3

root@host # more perf_msg

CPU average utilization: 100%(>80%)

Memory Shortage Warning

There are two parameters in checkperf for checking memory:

• MEM_FREEPHY_WARN_PERCENT: This is the warning threshold for available

root@host # more perf_msg

Available physical memory: 937 MB(<3275 MB)

Available swap space: 2956 MB(<3552 MB)

Recommend adding 20465 MB swap device. The total size

I/O Performance Warning

Let's generate some heavy I/O load:

root@host # cp myusr.tar myusr.tar2

root@host # more perf_msg

IO utilization on sd1: 100%(>80%)

The following factors degrade TCP performance:

• Retransmission: Messages that are lost must be retransmitted.

Putting It All Together

root@host # dd if=/dev/zero of=/dev/null &

root@host # cp myusr.tar myusr.tar2

root@host # more perf_msg

CPU average utilization: 100%(>80%)

Available physical memory: 778 MB(<3275 MB)

Available swap space: 2821 MB(<3517 MB)

Recommend to add 20465 MB swap device. The total size of physical

IO utilization on sd1: 51%(>30%)

You might also like