You are on page 1of 19

Collectl, an all-in-one tool for collecting Linux statistical data

Collectl,collect for Linux, is a single tool which integrates functions of various


tools:sar,iostat,mpstat,top,slaptop,netstat,nfstat,ps .. .
- Supported: Linux
- Requirement: Perl
Collectl features:
- run in command line or run as daemon
- Various output formats: raw,gunplot,gexprt(ganglia),sexpr,lexpr,csv(--sep ,)
- Send data to other programs (ganglia) remotely via socket instead of writing to a
file
- IPMI monitoring for fans and temperature sensors
- Support module (Perl scripts) for customized checks
- Monitor processs disk read/write, find the top processes keeping disk busy
The last one is the most impressive feature, I havent found other Linux tools can
do it. (DTtrace can in Solaris)
collectl examples
#help, all options
$collect x
#-s?, what to monitor:c cpu d disk collectl --showsubsys
#-c 5 : collect 5 samples and exit
#-oT: T - preface output with time only ; collectl --showoptions
$collectl -sc -c5 -i2 --verbose -oT
waiting for 2 second sample...
# CPU SUMMARY (INTR, CTXSW & PROC /sec)
#Time
User Nice Sys Wait IRQ Soft Steal Idle CPUs Intr Ctxsw Proc RunQ Run Avg1 Avg5
Avg15
12:39:34
0
0
0
0
0
1
0 97
1 1082
23
0 76
1 0.42 0.42 0.44
12:39:36
0
0
0
0
0
1
0 97
1 1088
24
0 76
1 0.42 0.42 0.44

The following demonstrates how collectl identify the process reading/writing most
data to disk
#Hammer disk by writing 50mb data with dd
$dd if=/dev/urandom of=test bs=1k count=50000
#collectl identifies the dd process
#in top mode, sort by iokb total I/O KB ; collectl showtopopts
$collectl -i2 --top iokb
TOP PROCESSES sorted by iokb (counters are /sec) 12:50:31
# PID User
PR PPID THRD S VSZ RSS CP SysT UsrT Pct AccuTime RKB WKB MajF MinF
Command
6861 root
18 6784 0 R 3M 572K 0 0.91 0.00 45 0:00.91 0 3680 0 97 dd
1 root
15
0 0 S 2M 632K 0 0.00 0.00 0 0:28.21 0 0 0 0 init
2 root
RT
1 0S
0
0 0 0.00 0.00 0 0:00.00 0 0 0 0 migration/0

Posted by honglus at 3:09 PM 2 comments:

Labels: Linux, Performance, Troubleshooting

Tuesday, October 11, 2011


Understanding Red Hat Linux recovery runlevels
1

If Linux system can boot but hang during starting a service, booting to recovery
runlevels can skip the service and gain shell to troubleshoot.
If Linux system cant boot at all, booting from rescue CD (first installation media)
and type linux rescue to gain shell to troubleshoot
Red Hat Linux boot order
The BIOS ->MBR->Boot Loader->Kernel->/sbin/init->
/etc/inittab->
/etc/rc.d/rc.sysinit->
/etc/rc.d/rcX.d/ #where X is run level in /etc/inittab
run script with K then script with S

Recovery runlevels
- runlevel 1
Execute up to /etc/rc.d/rc.sysinit and /etc/rc.d/rc1.d/
Runlevel 1 is identical to singleuser mode. It is switched to singleuser mode in last
step, just a number of trivial scripts executed before that.
$ls /etc/rc.d/rc1.d/S*
/etc/rc.d/rc1.d/S02lvm2-monitor /etc/rc.d/rc1.d/S13cpuspeed
/etc/rc.d/rc1.d/S99singlesingleuser
- single
Execute up to /etc/rc.d/rc.sysinit
- Emergency
Does not execute /etc/rc.d/rc.sysinit.
Because rc.sysinit is not executed, file system is mounted in read-only mode. You
need run mount o rw,remount / to remount it in read-write mode.
emergency runlevel is Red Hat term, it is identical to init=/bin/sh in any Linux
distribution
How to go to a runlevel
In the grub menu, type a to append one of following options to boot line.
1
single
emergency
init=/bin/sh
When Centos hung on starting up boot services, how to get to shell without rescue
CD
RHCE Notes - Troubleshooting booting issue
Posted by honglus at 5:15 PM No comments:
Labels: Linux, Troubleshooting

Thursday, September 15, 2011


Recover corrupted RPM database

RPM database consists of a number Berkeley DB files in /var/lib/rpm, the


exception is __db.* files, which like cache files are updated for every rpm operation
and they can be safely deleted.
#tested in Centos 5.5
$ ls /var/lib/rpm
Basenames
__db.001 __db.003 Filemd5s Installtid Packages
Provideversion Requireversion
Sigmd5
Conflictname __db.002 Dirnames Group
Name
Providename Pubkeys
Requirename
Sha1header
Triggername
$ file /var/lib/rpm/Packages
/var/lib/rpm/Packages: Berkeley DB (Hash, version 8, native byte-order)

If one of the DB files is partially corrupted and it is still readable by


/usr/lib/rpm/rpmdb_dump, you can reload the DB file and rebuild db.
$cd /var/lib/rpm
$rm -f __db*
$mv Packages Packages.orig
$/usr/lib/rpm/rpmdb_dump Packages.orig | /usr/lib/rpm/rpmdb_load Packages
$/usr/lib/rpm/rpmdb_verify Packages
#if you got this error: db_verify: PANIC: fatal region error detected; run recovery
#make sure /var/lib/rpm/__db.* are cleaned
#It is unlikely to rebuilddb if rpmdb_verify fails
$rpm -v rebuilddb

If one of the DB files is completely corrupted and it is not readable by


rpmdb_dump, you have to restore from backup,
$cd /var/lib/rpm
$cp Packages Packages.bak
#simulate a damaged RPM DB file
$ >Packages
$ cp Packages.bak Packages
# Simply restoring from backup file wont work
#file verification is successful
$ /usr/lib/rpm/rpmdb_verify Packages
#but any rpm operation fails
$rpm -qa
error: rpmdbNextIterator: skipping h#
294 Header V3 DSA signature: BAD, key ID e8562897
#Even rpm rebuilddb fails
$rm -f __db.*
$rpm rebuilddb
error: rpmdbNextIterator: skipping h#
294 Header V3 DSA signature: BAD, key ID e8562897
#Notice the error about signature: BAD? The Pubkeys have to be cleaned as well.
$ mv Pubkeys Pubkeys.bak
#all good after removing Pubkeys file, a new Pubkeys is generated automatically on rpm rebuilddb
$ rm -f __db.*
$ rpm rebuilddb
$ rpm -qa | head -2
man-pages-2.39-15.el5_4
bash-3.2-24.el5

Posted by honglus at 11:30 AM No comments:


Labels: Linux, Troubleshooting

Tuesday, March 1, 2011


When Centos hung on starting up boot services, how to get to shell without rescue
CD
Centos 5.5 hung on starting up udev service. My first instinct was to try to go to
interactive startup mode to skip udev, as message hints press i to enter
interactive startup.
I later discovered interactive startup mode is almost useless, firstly, it is hard to
active this mode by press I key, secondly not all services observe this mode.
Network service seems to be the only one.
A flag file: /var/run/confirm will be created, when key I (case insensitive) is
pressed. It seems only network service check this file.
[root@centos64 init.d]# grep -C 2 /var/run/confirm /etc/init.d/*
/etc/init.d/networkfi
/etc/init.d/network# If we're in confirmation mode, get user confirmation.
/etc/init.d/network:
if [ -f /var/run/confirm ]; then
/etc/init.d/networkconfirm $i
/etc/init.d/networktest $? = 1 && continue

So how can you gain shell access without rescue CD? The answer is to append
init=/bin/sh to kernel line in grub boot loader.
Lets review the Linux boot order
The BIOS ->MBR->Boot Loader->Kernel->/sbin/init->
/etc/inittab->
/etc/rc.d/rc.sysinit->
/etc/rc.d/rcX.d/ #where X is run level in /etc/inittab
run script with K then script with S

By default init=/sbin/init, which will transfer control in above order.


If you set init=/bin/sh, it will stop there and give login shell.
Booting to single user mode wont fix udev startup issue, because udev starts
before single user mode (udev is in /etc/rc.d/rc.sysinit , single user mode is in
/etc/rc.d/rc1.d)
Instructions:
In Grub menu, select the kernel, press a to edit boot option, then append
init=/bin/sh, then press enter to boot
After gaining the login shell, the fs is most likely in Read-only file system state.
Remount partitions to rewrite mode by mount o rw,remount /
Posted by honglus at 2:43 PM No comments:
Labels: Linux, Troubleshooting

Monday, February 28, 2011


Graphing sar output
4

In Linux, sysstat package installs tools: sar, iostat .. , in the mean time, setups a
cron job to run sar periodically. The sar binary output files are in /var/log/sa or
/var/log/sysstat.
The files are very useful for troubleshooting performance issues, if you dont have
monitoring solution in place.
To visualize the data into graph, you can use generic plotting tool: gnuplot or
special tool designed for sar: ksar.
Visualize sar output by gnuplot
gnuplot can be directly installed online in most Linux distributions.
file saved by sar cron job is binary, convert it to ascii format. The following example
output CPU usage
$LC_ALL=C;sar -u -f /var/log/sa/sa27
9]'
| sed '1s/^/#/' >sar-cpu.log

| egrep

'[0-9][0-9]:[0-9][0-9]:[0-9][0-

LC_ALL=C to ensure time format is H:M:S


Sed is used to add comment line to the first line: the header.
Create gnuplot script to show user CPU (3th column) and system cpu (5th column)
usage
$cat cpu.p
set title 'HOST CPU usage'
set xdata time
set timefmt '%H:%M:%S'
set xlabel 'time'
set ylabel ' CPU Usage'
set style data lines
plot 'sar-cpu.log' using 1:3 title 'User' ,\
'sar-cpu.log' using 1:5 title 'System'

Type gnuplot to enter interactive shell then run the script.


gnuplot>
gnuplot> load cpu.p
or
$gnuplot -persist cpu.p

#Other advanced operations


#Zoom in, display a set period of data only
gnuplot> set xrange ['01:51:01':'03:51:01']
gnuplot> replot
#Save the output to image
gnuplot> set terminal png
gnuplot> set output "cpu.png"
gnuplot> replot

Visualize sar output by ksar


The generic graphing tool, gnuplot, can process any data, it is not designed for sar.
As a trade-off, it needs lots of customization.
ksar is specifically desgined for sar and understand Linux, Mac and Solaris sar
output.
Kar can be downloaded at http://sourceforge.net/projects/ksar/, Ksar is written in
Java, so Java executables are prerequisite for ksar.
6

#Convert sar binary output to ascii for ksar, -A means include all counters
$LC_ALL=C;sar -A -f /var/log/sa/sa27 >sar-all.log

It is very easy to view any counter, once sar output files are imported into ksar.
[root@ kSar-5.0.6]$./run.sh -help
[root@ kSar-5.0.6]$./run.sh -input /tmp/sar-all.log

Posted by honglus at 4:18 PM 1 comment:


Labels: Linux, Performance, Troubleshooting

Monday, December 20, 2010


Why 32bit Linux only see 3GB memory in a machine with 4GB RAM.
The Symptom:
OS detected 4194304k but only 3105024k is available.
[32bit Linux]$ dmesg | grep -i mem
3200MB HIGHMEM available.
896MB LOWMEM available.
HighMem zone: 819200 pages, LIFO batch:31
ACPI: SRAT (v002 VMWARE MEMPLUG 0x06040000 VMW 0x00000001) @ 0xbfeef3e1
Memory: 3105024k/4194304k available (1584k kernel code, 39372k reserved, 622k data, 168k init,
2228096k highmem)
[32bit Linux]$free
total
used
free
shared buffers
cached
Mem:
3108956
73076 3035880
0
8000
37932
-/+ buffers/cache:
27144 3081812
Swap:
779112
0
779112

The Theory:
hugemem/PAE enabled kernel is ONLY needed for RAM from 4GB up to 64GB, The
generic 32bit Linux Kernel can see 4GB without hugemem/PAE enabled kernel.
The Cause:
Sometimes, why 32bit Linux can't see all the 4GB ram? The answer is from
physical RAM map provided by BIOS.
The Analysis:
[32bit Linux]$dmesg | less
Linux version 2.6.16.60-0.21-default (geeko@buildhost) (gcc version 4.1.2 20070115 (SUSE Linux)) #1
Tue May 6 12:41:02 UTC 2008
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009f800 (usable)
BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000ca000 - 00000000000cc000 (reserved)
BIOS-e820: 00000000000dc000 - 00000000000e4000 (reserved)
BIOS-e820: 00000000000e8000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 00000000bfee0000 (usable)
BIOS-e820: 00000000bfee0000 - 00000000bfeff000 (ACPI data)
BIOS-e820: 00000000bfeff000 - 00000000bff00000 (ACPI NVS)
BIOS-e820: 00000000bff00000 - 00000000c0000000 (usable)
BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
BIOS-e820: 00000000fffe0000 - 0000000100000000 (reserved)

BIOS-e820: 0000000100000000 - 0000000140000000 (usable)


Warning only 4GB will be used.
Use a PAE enabled kernel.
3200MB HIGHMEM available.
896MB LOWMEM available.

The interesting part of physical RAM map is the last line, and interesting number is
0000000100000000 which is hex value of 4GB.
Converting them decimal values to be more readable.
[32bit Linux]$ dmesg | awk --re-interval --non-decimal-data '/[0-9a-z]{16}/ { x1=sprintf ("%d",
"0x"$2) ; x2=sprintf ("%d", "0x"$4); printf "%d %s %d %s %d %s\n", x1/1024/1024, " - ",
x2/1024/1024, "=", (x1-x2)/1024/1024,$NF }'
0 - 0 = -0 (usable)
0 - 0 = -0 (reserved)
0 - 0 = -0 (reserved)
0 - 0 = -0 (reserved)
0 - 1 = -0 (reserved)
1 - 3070 = -3069 (usable)
3070 - 3070 = -0 data)
3070 - 3071 = -0 NVS)
3071 - 3072 = -1 (usable)
3584 - 3840 = -256 (reserved)
4076 - 4076 = -0 (reserved)
4078 - 4078 = -0 (reserved)
4095 - 4096 = -0 (reserved)
4096 - 5120 = -1024 (usable)

It is clear that BIOS has too many reserved parts in lower address space and push
the one trunk of 1GB over 4GB address space.
The Solution:
1)To release some reserved space and bring all usable spaces below 4GB, you
might try to disable some devices in BIOS. It is the best option, but might not be
achievable , consult with your hardware vendor.
2)Reinstall system with 64bit Kernel
3)Install hugemem/PAE kernel on current 32 Bit system. It is the last option
because hugemem/PAE kernel hurt performance due to the dynamic rempapping
with three-level paging model.
Posted by honglus at 1:05 PM No comments:
Labels: Linux, Troubleshooting

Friday, August 27, 2010


Fix out of order network interfaces in Linux

I removed two old NICs and assigned two news NICs for a Vmware VM (SLES 10)
and I expect the interface name to be eth0, eth1, but they appear as eth2, eth3.
dmesg output revealed that eth0 was renamed to eth2 and eth1 was renamed to
eth3 at some stage, It turned out udev rules renamed it.
Why?
30-net_persistent_names.rules had four entries, the first 2 recorded MAC address
of two old NICs ; the last 2 entries recorded MAC address of current two NICs. Upon
matching current MAC address, the udev rule renamed the interfaces to eth2 and
eth3.

$cat /etc/udev/rules.d/30-net_persistent_names.rules
# This rules are autogenerated from /lib/udev/rename_netiface.
# But you can modify them, but make sure that you don't use an interface name
# twice. Also add such interface name rules only in this rules file. Otherwise
# rename_netiface will create wrong rules for new interfaces.
# It is safe to delete a rule, as long as you did not disable automatic rule
# generation. Only if all interfaces get a rule the renaming will work
# flawlessly. See also /etc/udev/rules.d/31-net_create_names.rules.
#
# Read /usr/share/doc/packages/sysconfig/README.Persistent_Interface_Names for
# further information.
#
# Use only a-z, A-Z and 0-9 for interface names!
SUBSYSTEM=="net", ACTION=="add", SYSFS{address}=="00:50:56:b7:6d:df",
IMPORT="/lib/udev/rename_netiface %k eth0"
SUBSYSTEM=="net", ACTION=="add", SYSFS{address}=="00:50:56:b7:0b:2c",
IMPORT="/lib/udev/rename_netiface %k eth1"
SUBSYSTEM=="net", ACTION=="add", SYSFS{address}=="00:50:56:b7:1a:26",
IMPORT="/lib/udev/rename_netiface %k eth2"
SUBSYSTEM=="net", ACTION=="add", SYSFS{address}=="00:50:56:b7:14:a6",
IMPORT="/lib/udev/rename_netiface %k eth3"

How to fix it?


The fix is easy you can delete all four entries then reboot, the file will be populated
with correct entries automatically. if you don't want to reboot, edit the file with
correct entries then run /lib/udev/rename_netiface oldname newname manually.
Further discussion.
udev rule makes device naming very easy. you can ensure interfaces are named
according to PCI order, for example, you want to name onboard NIC as eth0 and
name PCI NIC as eth1. ( some times the order is reversed).
Check if the NIC name follows PCI order.

$ ls -l /sys/class/net/eth*/device
lrwxrwxrwx 1 root root 0 2010-08-26 09:46 /sys/class/net/eth0/device ->
../../../devices/pci0000:00/0000:00:11.0/0000:02:00.0

10

lrwxrwxrwx 1 root root 0 2010-08-26 09:46 /sys/class/net/eth1/device ->


../../../devices/pci0000:00/0000:00:11.0/0000:02:01.0

Posted by honglus at 10:18 PM No comments:


Labels: Linux, Troubleshooting, Virtualization

Thursday, August 5, 2010


Tune max open files parameter in Linux
A busy websever serving thousands of connections may encounter error like
java.net.SocketException: Too many open files, That is because the default value
open files per process in Linux is 1024 and each connection consume 1 file handle.
An open file may be a regular file, a directory, a block special file, a character
special file, an executing text reference, a library, a stream or a network file
(Internet socket, NFS file or UNIX domain socket.)
Linux has global setting and per process setting to control the max number of file
descriptor
Global Setting:
Max number of file handles for whole system. This value varies with memory size.
$sysctl fs.file-max
fs.file-max = 65535

Per process setting:


This value is per process (values in multiple child process dont count towards
parent process ).
The default value is 1024, which doesnt seem to vary with memory size.
$ulimit -a | grep "open files"
open files
(-n) 1024

The value can be changed with ulimit n command, but it is only effective on
current shell session. To impose limit on each new shell session, enable PAM
module pam_limits.so.
There are many ways to start new shell session: login, sudo, and su.
each need to be enabled with pam_limits by PAM config file /etc/pam.d/login,
/etc/pam.d/sudo, or /etc/pam.d/su
/etc/security/limits.conf is the configuration file for pam_limits.so to set values.
e.g Increase max number of open files from 1024 to 4096 for Apache web server,
which is started with user apache
apache
nofile
4096
pam_limits.so is session PAM, so change become effective for new session, no
reboot required.
11

Count the number of open files for a process.


ls -l /proc/PID/fd | wc l
or use lsof to count open files excluding memory-mapped file (mem)
sudo lsof n -p PID | awk '$4 != "mem" {print}' | wc l
lsof is slow, but it can count all processes belong to a user lsof n u username
Count the number of open files for whole system.
The first column of fs.file-nr output is current number of open files
$sysctl fs.file-nr
fs.file-nr = 1530

65535

Test ulimit.
You will be disappointed to test open files directly in shell by command tail f etc,
because the limit is imposed on process, but each tail f will start new process.
The following Perl script can open 10 files in a single process.
#!/usr/bin/perl -w
foreach $i (1..10) {
$FH="FH${i}";
open ($FH,'>',"/tmp/Test${i}.log") || die "$!";
print $FH "$i\n";
}

nfile has been set to 8 with: ulimit n 8


$ ulimit -a | grep files
open files

(-n) 8

Too many open files error appeared while halfway creating files

$ ./testnfiles.pl
Too many open files at ./ testnfiles.pl line 4

Posted by honglus at 10:47 AM 1 comment:


Labels: Linux, Troubleshooting

Friday, July 16, 2010


Debugging issues with strace in Linux.
strace runs the specified command until it exits. It intercepts and records the
system calls. -T option shows the time spent in system calls. It is particularly useful
to troubleshoot slow response issues, because you can pinpoint the step taking the
longest time. (-r option Print a relative timestamp upon entry to each system call,
which is actually the time spent in last system call, It is easier to read than T
output because it is displayed in first column)
#The following Telnet command took 20 secs to response, Was it issue with DNS or
12

web server? (It is an obvious DNS issue, but I just want to demonstrate how can
strace pinpoint the issue.)

$time strace -f -F -i -r -t -T -v -o /tmp/trace.log telnet www.google.com 80


telnet: could not resolve www.google.com/80: Name or service not known
real 0m20.080s
user 0m0.010s
sys
0m0.050s

#List line number and time spent by sorting time

$awk '{ print "LINE#"NR, $1}' /tmp/trace.log | sort -nk2 | tail -5


LINE#55 0.010000
LINE#140 5.000075
LINE#154 5.000075
LINE#136 5.000076
LINE#150 5.000076

#Print out the lines in question. it is clear that DNS timed out on waiting response
from DNS server 100.0.0.23, it tried four times(the remaining 3 timeout were not
included here) each time took 5 secs.

$awk '{ if ( NR > 125 && NR <= 136 ) {print "LINE#"NR, $0 } }' /tmp/trace.log
LINE#126
0.000000 [b7e601d1] stat64("/etc/resolv.conf", {st_dev=makedev(117, 0),
st_ino=50235, st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8,
st_size=83, st_atime=2010/07/16-09:47:25, st_mtime=2010/07/16-09:45:02, st_ctime=2010/07/1609:45:02}) = 0 <0.000000>
LINE#127
0.000000 [b7e2a0f1] gettimeofday({1279237645, 625155}, NULL) = 0 <0.000000>
LINE#128
0.000000 [b7e72402] socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 4 <0.000000>
LINE#129
0.000000 [b7e71f0c] connect(4, {sa_family=AF_INET, sin_port=htons(53),
sin_addr=inet_addr("100.0.2.3")}, 28) = 0 <0.000000>
LINE#130
0.000000 [b7e61e88] fcntl64(4, F_GETFL) = 0x2 (flags O_RDWR) <0.000000>
LINE#131
0.000000 [b7e61e88] fcntl64(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000000>
LINE#132
0.000000 [b7e2a0f1] gettimeofday({1279237645, 625155}, NULL) = 0 <0.000000>
LINE#133
0.000000 [b7e67296] poll([{fd=4, events=POLLOUT}], 1, 0) = 1 ([{fd=4,
revents=POLLOUT}]) <0.000000>
LINE#134
0.000000 [b7e7220c] send(4,
"B\262\1\0\0\1\0\0\0\0\0\0\3www\6google\3com\0\0\1\0\1"..., 32, MSG_NOSIGNAL) = 32 <0.000000>
LINE#135
0.000000 [b7e67296] poll([{fd=4, events=POLLIN}], 1, 5000) = 0 (Timeout)
<5.000076>
LINE#136
5.000076 [b7e2a0f1] gettimeofday({1279237650, 625231}, NULL) = 0 <0.000000>

Posted by honglus at 3:52 PM No comments:


Labels: Linux, Troubleshooting

Thursday, June 10, 2010


How to generate core dump for an application in Linux
In order to troubleshoot application issue, a process content in memory can be
dumped to a file and then the file can be analysed with debug tool, such as gdb in
13

Linux.
Another way of doing this is to send QUIT signal to the PID (kill 3 PID), but the
thread dump will be directed to stdout, which can be viewed with cat
/proc/PID/fd/1 |tee /tmp/dump.log, messages will be constantly directed to
/proc/PID/fd/1 until the process is stopped. So it is useful for real time debugging.
The following java application example use gcore command in gdb. (gcore or kill 3
will not stop the process )
Linux default core file size is 0, which means core dump is disabled, It needs to be
changed to unlimited
#ulimit -a | grep core
core file size
#ulimit c unlimited
#ulimit -a | grep core
core file size

(blocks, -c) 0
(blocks, -c) unlimited

Firstly, find the virtual memory size of the process, the PID is 10008 in following
example.
# ps aux | egrep 'VSZ| 10008'
USER
PID %CPU %MEM
VSZ
RSS TTY
root
10008 0.4 8.3 2231312 660100 ?
/opt/sunjdk6/bin/java

STAT START
TIME COMMAND
Sl
Jun03 43:57

The process VSZ=2.2GB, it will be the size of the core file. Go to a dump dir which
has more than 2.2GB free space
$cd /var/tmp
Attach to running process PID.
$gdb --pid=10008
At the gdb prompt, enter gcore command
gdb>gcore
wait for few minutes for core file to be generated. type in quit to exit gdb, answer
yes to detach the process.
gdb>quit
core file is generated
$ ls -lh /var/tmp/core.10008
-rw-r--r-- 1 root root 2.2G Jun 10 11:59 /var/tmp/core.10008

file command reveals the source program name

$ file /var/tmp/core.10008
/var/tmp/core.10008: ELF 64-bit LSB core file AMD x86-64, version 1 (SYSV),
SVR4-style, from 'java'

14

Posted by honglus at 4:40 PM No comments:


Labels: Linux, Troubleshooting

Wednesday, January 13, 2010


Troubleshooting a high system CPU usage issue on Linux/Solaris
A Linux server has high %system CPU usage, following are steps to find the root cause of
the issue and how to resolve it.

Vmstat show %system CPU usage is high.


# vmstat 2
procs -----------memory---------- ---swap-- -----io---- --system------cpu-----r b
swpd
free
buff cache
si
so
bi
bo
in
cs us sy id wa
st
1
0
1
0
0
0
0
0
0
0
2
0
0
0
0
0

0 191420

8688

35780

0 1006

31

4 96

0 124468

9208

98020

0 15626

2074 1195

188

0 76

0 24

0 110716

9316 110996

3268

4144 1366

84

0 94

97048

9416 122272

2818 11855 1314

109

1 80

0 20

80476

9544 137888

3908

2786 1272

172

0 54

0 46

72860

9612 145848

1930

0 1193

141

0 42

0 58

74300

9620 145860

6 1208

67

0 38

0 62

75680

9620 145860

6929 1364

101

0 70

6 24

Lets run mpstat to show more detailed CPU usage,it showed CPU was busy with
interruptions.
# mpstat 2
Linux 2.6.18-92.el5 (centos-ks)
02:03:50 AM CPU
%idle
intr/s
02:04:04 AM all
52.89
1015.56
02:04:06 AM all
0.00
1326.63
02:04:08 AM all
4.83
1327.54
02:04:10 AM all
0.00
1280.10
02:04:12 AM all
0.00
1183.08
02:04:14 AM all
0.00
1190.95

01/14/2010

%user

%nice

%sys %iowait

%irq

%soft

%steal

1.33

0.00

41.78

0.00

0.44

3.56

0.00

0.00

0.00

8.04

38.69

29.65

23.62

0.00

0.00

0.00

8.70

30.43

27.54

28.50

0.00

0.00

0.00

5.47

46.77

27.36

20.40

0.00

0.50

0.00

6.47

63.18

19.40

10.45

0.00

1.01

0.00

6.53

62.31

21.11

9.05

0.00

15

02:04:16 AM all
0.00
1365.83
02:04:18 AM all
98.00
1006.50

0.00

0.00

8.04

26.63

43.72

21.61

0.00

0.00

0.00

1.50

0.00

0.00

0.50

0.00

Use sar to find out which interrupt number was culprit. #9 was the highest
excluding system interrupt #0.
# sar -I XALL 2 10
02:07:10
02:07:12
02:07:12
02:07:12
02:07:12
02:07:12
02:07:12
02:07:12
02:07:12
02:07:12
02:07:12

AM
AM
AM
AM
AM
AM
AM
AM
AM
AM
AM

INTR
0
1
2
3
4
5
6
7
8
9

intr/s
992.57
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
350.50

[ Solaris equivalent command]


Solaris# intrstat 2
device |
cpu0 %tim
cpu1 %tim
-------------+-----------------------------bge#0 |
0 0.0
128 0.6
cpqary3#0 |
0 0.0
14 0.0

# cat /proc/interrupts
CPU0
0:
1:
2:
6:
8:
9:
11:
12:
14:
15:
NMI:
LOC:
ERR:
MIS:

702980
439
0
2
1
14464
12
400
6091
22
0
700623
0
0

XT-PIC timer
XT-PIC i8042
XT-PIC cascade
XT-PIC floppy
XT-PIC rtc
XT-PIC acpi, eth2
XT-PIC eth0
XT-PIC i8042
XT-PIC ide0
XT-PIC ide1

[ OpenSolaris equivalent command ]


Solaris#echo ::interrupts | mdb k
Native Solaris has to search the interrupt from output of prtconf -v

Solution:
When the card transmits or receives a frame, the system must be notified of the
event. If the card interrupts the system for each transmitted and received frame,
the result is a high degree of processor overhead. To prevent that, Gigabit Ethernet

16

provides a feature called Interrupt Coalescence. Effective use of this feature can
reduce system overhead and improve performance.
Interrupt Coalescence essentially means that the card interrupts the system after sending or
receiving batch of frames.
you can enable adaptive moderation ( Adaptive RX: off TX: off) to let system choose value
automatically or set individual values manually.
A interrupt is generated by the card to the host when either frame counter or timer counter
is met. Values 0 means disabled.
RX for example:
Timer counter in microseconds: rx-usecs/rx-usecs-irq
Frames counter:rx-frames/rx-frames-irq
# A sample output with default values.
# ethtool -c eth1
Coalesce parameters for eth1:
Adaptive RX: off TX: off
stats-block-usecs: 999936
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0
rx-usecs: 18
rx-frames: 6
rx-usecs-irq: 18
rx-frames-irq: 6
tx-usecs: 80
tx-frames: 20
tx-usecs-irq: 80
tx-frames-irq: 20
rx-usecs-low:
rx-frame-low:
tx-usecs-low:
tx-frame-low:
rx-usecs-high:
rx-frame-high:
tx-usecs-high:
tx-frame-high:
<>

0
0
0
0
0
0
0
0

[ Solaris equivalent command]


Varying on driver, Find out the driver's capability.
Solaris#ndd -get /dev/e1000g0 \? | egrep interrupt |intr
The Value shoud be set in driver conf file:
Solaris#/platform/`uname -m`/kernel/drv/*.conf

Alternative Workaround:
17

I couldn't config Interrupt Coalescence because virtual machine NIC didn't support it. but as
workaround, Increasing mtu can also decrease interrupt, ifconfig eth2 mtu 9000 resolved
the issue. It needs to set on both hosts peer, if they are not directly connected, make sure
the switch supports jumbo frames.
You don't need to care Interrupt Coalescence if your CPU resource is abundant, But for high
load NFS/CIFS/ISCSI/ NAS servers, it is very useful.

Posted by honglus at 4:40 PM 3 comments:


Labels: Linux, Performance, Solaris, Troubleshooting

Tuesday, March 31, 2009


Solaris/Linux: find port number for a program and vice-versa

#== Find port number for a program


- lsof tool(Platform independent)
$lsof -nc | sshd grep TCP

sshd 1962 root 3u IPv6 6137 TCP *:ssh (LISTEN)


sshd 2104 root 3u IPv6 7425 TCP 172.16.31.3:ssh->172.16.31.2:cs-services
(ESTABLISHED
- Linux
$netstat -anp |grep sshd

tcp 0 0 :::22 :::* LISTEN 1962/sshd


- Solaris
$ pfiles 16976
...
sockname: AF_INET 172.18.126.148 port: 22
..

#==Find program name for port number


- lsof tool(Platform independent)
$lsof -i TCP:22
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
sshd 1962 root 3u IPv6 6137 TCP *:ssh (LISTEN)
sshd 2104 root 3u IPv6 7425 TCP 172.16.31.3:ssh->172.16.31.2:csservices (ESTABLISHED)
- Linux
$netstat -anp grep 22
tcp 0 0 :::22 :::* LISTEN 1962/sshd
18

- Solaris
list open files for all process,then search the file for "port: 22"
$ ps -e -o pid | xargs pfiles > /tmp/pfiles.log

19

You might also like