You are on page 1of 47

Analyzing Server Crashes

Hangs

© 2002 BEA Systems, Inc. — Company Confidential


Agenda

• Crashes Versus Hang.


• All about Server crash.
• All about Server hangs-Analyzing Thread dumps
• Analysis of thread dump samples
• Resources.

© 2002 BEA Systems, Inc. Company Confidential 2


Crash Versus Hang

• Distinction between crash and hangs.


• Crash implies weblogic server java process no longer
exists.
• Hang implies that weblogic server java process still
exists but is not responding.
• Customers tend to use these terms interchangably.

© 2002 BEA Systems, Inc. Company Confidential 3


All about crashes
• Determine all potential sources of native code used by
the WebLogic Server.
• nativeIO.
• Type2 jdbc driver.
• Native libraries accessed with JNI calls.
• SSL native libraries.
• JVM itself. Most of the times its from JVM.
• Sometimes the JVM will produce a small log file that
may contain useful information as to which library the
crash has originated from. (hs_err_pid*.log)

© 2002 BEA Systems, Inc. Company Confidential 4


Debugging with hs_err_pid.log

We get current thread stack trace from hs_err_pid.log and


depending on current thread information the issue can be
debugged further:
• If current thread shows stack from nativeIO (performance
pack):
Workaround: Disable nativeIO.
Fix: File a bug with CCE.
• If current thread shows stack from native call from type 2
driver:
Workaround: Switch to pure JAVA type 4 driver instead of
type 2 driver.
Fix: Work with vendor of the database driver.

© 2002 BEA Systems, Inc. Company Confidential 5


Debugging with hs_err_pid.log

• If current thread shows stack from JNI call from application code:
Fix: Instruct customer that its application bug and needs to be fixed in
their code.
• If the current shows stack from native code from weblogic SSL
WorkAround:Use pure java version of SSL instead of native version
• If the current thread indicates crash from compiled/optimized code:
WorkAround: Turn off the compilation and hence optimization
(-Xint)
Javacode->bytecode->compilation->optimization(hotspots)
Fix: Work with JVM vendor support.

© 2002 BEA Systems, Inc. Company Confidential 6


Debugging with hs_err_pid.log

• If the current thread indicates crash from threading


library (applicable for solaris):
Workaround: Switch to alternate thread library.
The default thread library on solaris8 and below is:
/usr/lib/libthread.so.1
This can be switched to: (Default from Solaris9)
/usr/lib/lwp/libthread.so.1
Add /usr/lib/lwp to your LD_LIBRARY_PATH and
-XX:+OverrideDefaultLibthread

© 2002 BEA Systems, Inc. Company Confidential 7


Crashes without core

• Most crashes will cause a core dump.


However sometimes the core file may not be
available.
• Running out of disk space or quota to write
the file.
• Not having the correct access permissions to
create or write a file in the directory.
• The prior presence of a core dump of the
same name that is read-only or write-
protected.

© 2002 BEA Systems, Inc. Company Confidential 8


Crashes without core

• Check the "ulimit -c" (Have it set to unlimited).


• Use coreadm on solaris.($ coreadmn)
• Also check the following parameter which on Solaris
is in /etc/system file and can be used to disable core
files: set sys:coredumpsize=0
• On linux, the coredump is turned off by default on all
systems. In RedHat Advanced Server 2.1 it should
be under “/etc/security”. There should be a self-
explanatory file called limits.conf and look for the
word “core”. If set to "0" then coredump is disabled.

© 2002 BEA Systems, Inc. Company Confidential 9


Crashes with core

• Core file is available.


• A core file is a memory map of the running
process, and it saves the state of the
application at the time of its termination.
• Core file is dependent on the exact shared
libraries and OS.
• Core file *must* be analyzed on the customers
machine.

© 2002 BEA Systems, Inc. Company Confidential 10


Crashes with core
• If debugger is not available:
• Solaris 8,9
• Use pstack and pmap
-/usr/proc/bin/pstack core >pstack.txt
-/usr/proc/bin/pmap core >pmap.txt
Analyze pstack.txt and pmap.txt to understand which
library caused the crash.
http://support.bea.com/application?namespace=askbea&
origin=ask_bea_answer.jsp&event=link.view_answer_pag
e_solution&answerpage=solution&page=wls/S-16147.html

© 2002 BEA Systems, Inc. Company Confidential 11


Crashes with core
• Gather information from the core.
• Use a debugger
• Different for different operating systems
• Methodology is the same. Check to see what the
current thread is.
• More info is available at
http://supportlab.bea.com:8000/spWiki/attach?
page=SystemCorePattern%2FCorePattern.html

© 2002 BEA Systems, Inc. Company Confidential 12


Crash on Windows
• Get the windows debugging tools from
http://www.microsoft.com/whdc/devtools/debugging/installx86.
mspx
• Start up weblogic
• cd into <c:\Program Files\Debugging Tools for Windows> and
run
<adplus.vbs -crash -p JAVA_PID>
(ignore messages saying that NT_SYMBOL_PATH is not set).
• Wait till <java.exe> process dies. Upon this event, directory
<Crash_Mode...date> will be created with dump and log files.
• Open a case with Sun Support and send the dmp file.
( If we have the symbols, we can run the debugger against the dmp
file by opening the dmp file in windows debugger GUI)

© 2002 BEA Systems, Inc. Company Confidential 13


All about hangs

• Process still exists.


• Process not responding.
• No response sent to clients.
• java weblogic.Admin PING command doesn’t return a
normal reponse.
• Take multiple thread dumps (Kill -3 pid on unix
platforms. Ctrl break on windows)
• For linux use ps -efHl | grep 'java' **. **
to identify root pid.

© 2002 BEA Systems, Inc. Company Confidential 14


All about hangs

• Thread dumps for SUN JVM are sent to stdout.


• If you are using nohup, thread dumps are directed to
nohup.out.
• For beasvc use
-log:"d:\bea\user_projects\domains\myWLSdomain
\myWLSserver-stdout.txt"
• Use beasvc -dump -svcname:service-name

• You can also use java weblogic.Admin


THREAD_DUMP command.

© 2002 BEA Systems, Inc. Company Confidential 15


Not able to take thread dumps

• -Xrs option (JVM option) would make the OS immune


to any signals including SIGQUIT
(Sun JVM uses SIGQUIT to perform thread dumps)
If a process is not responding to kill -3 <PID> then it’s a
JVM bug.

© 2002 BEA Systems, Inc. Company Confidential 16


All about hangs

There are scenarios where the process appears to be


hung (non responsive) and there are free threads
available
• Process runs OutOfMemory. If java heap is full, server
process appears to be hung and not accepting any
requests because each request needs memory from
heap for allocating objects.
• Process running out of File descriptors. Server cannot
accept further requests because sockets cannot be
created.
• GC taking long times (more than 20secs). This appears
like a hang for end users.

© 2002 BEA Systems, Inc. Company Confidential 17


Thread queues and Threads

• weblogic.kernel.Default – Worker threads that serve the


external client requests.
• weblogic.kernel.system – Internal system work likeRJVM
heartbeats,Http state Dumps for JNDI updates in a
cluster etc
• Weblogic.socket.Muxer- Defaults to 3 on Unix systems
and 2 on Windows.Used for socket reads
• Weblogic.admin.rmi- Handle OA& M requests like
deployment of application,Application poller etc
• Weblogic.admin.html- only on admin server to handle
console requests.
• Core health monitor – runtime health of the server
• JmsDispatcher, JMS.TimerTreePool,
JMS.TimerClientPool -for jms

© 2002 BEA Systems, Inc. Company Confidential 18


Analyzing Thread Dumps

Common Thread states in thread dump:


• Runnable [marked as R in some VMs]:
This state indicates that the thread is either running currently or is
ready to run the next time the OS thread scheduler schedules it.
• Object.wait() [marked as CW in some VMs]:
Indicates that the thread waiting on an object using
Object.wait() .This thread would progress further either upon
notify() by another thread or if the condition for its wait() is
fulfilled.
For eg: wait(long timeout)
• Waiting for monitor entry [marked as MW in some VMs]:
Indicates that the thread is waiting to enter a synchronized
block.

© 2002 BEA Systems, Inc. Company Confidential 19


Analyzing Thread Dumps

• Analyze thread dump for following scenarios.


– Java Deadlock: More than one thread waiting to
release the lock.
– Threads blocked during n/w IO: Database or remote
process nor responding.
– Infinite Looping in the code.
• Multiple thread dump for with few seconds
interval helps to debug slow response time

© 2002 BEA Systems, Inc. Company Confidential 20


Analyzing thread dumps
• Classic deadlock
Look for the threads waiting for monitor entry:

For eg:
"ExecuteThread: '95' for queue: 'default'" daemon prio=5 tid=0x411cf8 nid=0x6c waiting
for monitor entry [0xd0f80000..0xd0f819d8]
at weblogic.common.internal.ResourceAllocator.release(ResourceAllocator.java:766)
at weblogic.jdbc.common.internal.ConnectionEnv.destroy(ConnectionEnv.java:590)
The above thread is waiting to acquire lock on ResourceAllocator object.
The next step is to identify the thread that is holding the ResourceAllocator object
"ExecuteThread: '0' for queue: '__weblogic_admin_rmi_queue'" daemon prio=5 tid=0x41b978
nid=0x77 waiting for monitor entry [0xd0480000..0xd04819d8]
at
weblogic.jdbc.common.internal.ConnectionEnv.getPrepStmtCacheHits(ConnectionEnv.java:174)
at weblogic.common.internal.ResourceAllocator.getPrepStmtCacheHitCount
(ResourceAllocator.java:1525)
This thread is holding lock on ResourceAllocator object,
but is waiting for ConnectionEnv object. This is a classic deadlock.

© 2002 BEA Systems, Inc. Company Confidential 21


Analyzing Thread dumps

• Threads in wait()
A sample dump:
"ExecuteThread: '10' for queue: 'SERV_EJB_QUEUE'" daemon prio=5 tid=0x005607f0
nid=0x30 in Object.wait() [83300000..83301998]
at java.lang.Object.wait(Native Method)
- waiting on <0xc357bf18> (a weblogic.ejb20.pool.StatelessSessionPool)
at weblogic.ejb20.pool.StatelessSessionPool.waitForBean(StatelessSessionPool.java:222)
The above thread would come out of wait() under two conditions
(depending on application logic)
1) One of the thread available in the execute queue pool would call notify() on this object
when an instance is available. (If the wait() is indefinite).
This can cause the thread to hang for ever if server never does a notify() to this object.
2) If the timeout exceeds, the thread would throw an exception and back to execute queue
thread pool.

© 2002 BEA Systems, Inc. Company Confidential 22


Analyzing Thread dumps

• Threads waiting for monitor entry and culprit thread


stuck on remote call.
This issue is more observed with a thread acquiring
lock on a synchronized object and hung up with
database (something wrong on database like database
not responding) and rest of the threads that need the
synchronized object are waiting for monitor entry.
• There are scenarios where thread holding the lock is
not apparent. In these cases most of the times it would
be locked at native layer which is a JVM bug. In those
cases, taking pstack is the first step.

© 2002 BEA Systems, Inc. Company Confidential 23


Tool for analyzing thread dump

• Samurai
http://yusuke.homeip.net/samurai/?english#content_1_0

Please read these links for self study of scenarios in


detail
• http://supportlab.bea.com:8000/spWiki/Wiki.jsp?page=
PatternWorkAssignments
• http://www.ftponline.com/weblogicpro/2005_03/magazin
e/columns/troubleshootersdiary/

© 2002 BEA Systems, Inc. Company Confidential 24


Performance Tuning Overview

© 2002 BEA Systems, Inc. Company Confidential 25


J2EE Tuning Zones

© 2002 BEA Systems, Inc. Company Confidential 26


Platform (OS) Tuning
Key Tuning Parameters

• TCP Parameters
– tcp_time_wait_interval
– tcp_keepalive_interval
– ndd -set /dev/tcp “parameter” “value”

• File Descriptors
– /etc/system
• set rlim_fd_cur 8192 (Soft Limit)
• set rlim_fd_max 8192 (Hard Limit)

© 2002 BEA Systems, Inc. Company Confidential 27


Platform (OS) Tuning
Key Tuning Parameters

• Prior to Solaris 2.7, the tcp_time_wait_interval parameter


was called tcp_close_wait_interval. This parameter
determines the time interval that a TCP socket is kept
alive after issuing a close call. The default value of this
parameter on Solaris is four minutes. When many clients
connect for a short period of time, holding these socket
resources can have a significant negative impact on
performance. Setting this parameter to a value of 60000
(60 seconds) has shown a significant throughput
enhancement when running benchmark JSP tests on
Solaris. You might want to reduce this setting further if the
server gets backed up with a queue of half-opened
connections.

• Tip: Use the netstat -s -P tcp command to view all


available TCP parameters

© 2002 BEA Systems, Inc. Company Confidential 28


Platform (OS) Tuning
Key Tuning Parameters

• Hard limits are a kernel-configurable item, and users


can't exceed them. Soft limits are the user defaults,
and users can change that using the ulimit program or
the limit/unlimit builtins. man setrlimit(2)
• Basically, soft limits can be changed to anything up to
the hard limit.
• Think of soft limits as the warning barrier. When a user
reaches the soft limit they will get an warning message
but are still allowed to use more space up to the hard
limit. Also, you can configure the system to set
expiration times for users who have exceeded thier soft
limit.
• Just remember that the max file descriptors is 1024.

© 2002 BEA Systems, Inc. Company Confidential 29


JVM Tuning
Options

• JVM vendor and version.


– User Certified Versions.
• JVM Heap Size Parameters.
• Garbage Collection Schemes (Sun 1.4.2 JVM)
– Generational Collector (Default, Stop the world)
– Throughput Collector
– Concurrent Low Pause Collector
– Incremental Low Pause Collector
• Unix Threading Model
– export LD_LIBRARY_PATH=/usr/lib/lwp
– One to One mapping between Java and O/S thread

© 2002 BEA Systems, Inc. Company Confidential 30


JVM Tuning
Heap Sizing Parameters

• Heap Size
– -Xms, -Xmx
• Young Generation Space
– - XX:NewRatio, -XX:NewSize, -XX:MaxNewSize,
• Survivor Space
– -XX:SurvivorRatio
• Permanent Generation
– -XX:PermSize & -XX:MaxPermSize
• Aggressive Heap
– -XX:+AggressiveHeap
• For more information and self learning look at http://
www.petefreitag.com/articles/gctuning/

© 2002 BEA Systems, Inc. Company Confidential 31


WebLogic Core Tuning
Options

• “NativeIO” Performance Packs.


• Tuning Default ExecuteQueue.
• Thread usage control.
• StuckThreadDetection.
• Connection Backlog Buffering.

© 2002 BEA Systems, Inc. Company Confidential 32


WebLogic Core Tuning
Performance Packs

• Uses a platform-optimized, native socket


multiplexor.
• Uses own socket reader threads and frees up
weblogic threads.
• Available for most of the Platform
– Solaris, Linux, HP-UX, AIX, Win
• Can be configured using WebLogic Admin
Console.

© 2002 BEA Systems, Inc. Company Confidential 33


WebLogic Core Tuning
Performance Packs

• Benchmarks show major performance improvements


when you use native performance packs on machines
that host WebLogic Server instances. Performance
packs use a platform-optimized, native socket
multiplexor to improve server performance. For
example, the native socket reader multiplexor threads
have their own execute queue and do not borrow
threads from the default execute queue, which frees up
default execute threads to do application work.
• However, if you must use the pure-Java socket reader
implementation for host machines, you can still
improve the performance of socket communication by
configuring the proper number of socket reader threads
for each server instance and client machine.

© 2002 BEA Systems, Inc. Company Confidential 34


WebLogic Core Tuning
Default Execute Thread Tuning

• Number of simultaneous operations that can


be performed by applications.
– Production Mode Default 25
• Tuning criteria.
– Request turn around time.
– Number of CPUs
• % Socket Reader Threads (Default 33%).
• In 8.1 Execute Queue can be tuned for
OverFlow condition
– Increases thread count dynamically.

© 2002 BEA Systems, Inc. Company Confidential 35


WebLogic Core Tuning
Thread usage Control

• Thread usage can be controlled by creating


additional Execute Queues
– Performance Optimization for critical application.
– Throttle the performance
– To protect application from Deadlock
• It can have Negative impact on overall
performance

© 2002 BEA Systems, Inc. Company Confidential 36


WebLogic Core Tuning
StuckThreadDetection & Connection Backlog Buffering.

• StuckThread Detection
– Detects when execute thread can not complete
work or accept new work.
– Warning purpose only, doesn’t change
behaviour/state of the thread.
– Stuck Thread Max Time , Stuck Thread Timer
Interval
• Connection Backlog Buffering
– The number of backlogged TCP connection
requests.

© 2002 BEA Systems, Inc. Company Confidential 37


WebLogic Core Tuning
Guidelines

• NativeIO gives better perfromance,


– consider Java IO if NativeIO is not stable.
• High number of thread can have negative
impact on performance.
– More threads does not imply that you can process
more work.
• Avoid application designs that require creating
new threads.

© 2002 BEA Systems, Inc. Company Confidential 38


JDBC Connection Pool Tuning
Options

• Connection Pool Sizing and Testing.


• Caching Statements.
• Connection Pool Request Timeouts.
• Recovering Leaked Connection.
• PinnedToThread.

© 2002 BEA Systems, Inc. Company Confidential 39


JDBC Connection Pool Tuning
Connection Pool Sizing and Testing

• Sizing
– Initial capacity and Maximum capacity.
– Shrink Frequency.
• Testing
– Test Frequency.
– Test Reserved/ Released Connections
– Maximum Connections Made Unavailable
– Test Table Name

© 2002 BEA Systems, Inc. Company Confidential 40


JDBC Connection Pool Tuning
Caching Statements.

• Reuses Callable and Prepared Statements in


Cache.
• Reduces CPU usage at Database side and
Improve performance.
• Cache Algorithms
– LRU (Least Recently Used)
– Fixed
• Statement CacheSize
– Configured per connection pool.
– It cache size for each connection in pool.

© 2002 BEA Systems, Inc. Company Confidential 41


JDBC Connection Pool Tuning
Recovering Leaked Connection.
Connection Request Timeout

• Leaked Connection
– Forcibly reclaims unused connection.
– Inactive Connection Timeout.
• Connection Request Timeout.
– Connection Reserve Timeout.
– Maximum number of request that can wait for
connection.
• PinnedToThread
– Pins Connection to ExecuteThread
– Connection.close() doesn’t return connection to pool.

© 2002 BEA Systems, Inc. Company Confidential 42


JDBC Connection Pool Tuning
Guidelines

• Configure initial capacity = maximum capacity.


• In most cases, maximum number of connection
used does not exceed number of execute
threads.
• Configure connection refreshing, if database
calls fails because of stale connections.
• Try to avoid PinnedToThread if database
resource is limited.

© 2002 BEA Systems, Inc. Company Confidential 43


Common Performance Problems
Memory Leak

• java.lang.OutOfMemoryError , is a symptom ,
however it is not a proof.
• Turn on verbose:gc for GC logs, i.e.
– [Full GC 154K->99K(32576K), 0.0085354 secs]
• Analyze GC for following scenarios,
• Full Garbage collection does not get chance to run before
OutOfMemory is thrown.
• OutOfMemory is thrown eventhough memory usage is not
reached to upper limit of the heap
• OutOfMemory is thrown during the load test ramp up.
– Tune -XX:MaxPermSize, -Xms, -Xmx, -XX:NewSize,
-XX:MaxNewSize XX:SurvivorRatio to resolve OOM.

© 2002 BEA Systems, Inc. Company Confidential 44


Common Performance Problems
Memory Leak

• Heap memory usage grows after each FULL


GC at steady state condition of the load test –
Potential memory leak
• Check for more common leaking objects.
– Caching in the application , i.e EJB pool objects,
HTTP Session objects , JMS Messages
• Use Memory Profiler to pinpoint memory
leaking code, i.e JProbe and OptimizeIT

© 2002 BEA Systems, Inc. Company Confidential 45


Performance Standards and
Tools

• Standards
– ECPerf
• J2ee Benchmark for Application Servers
– SPECjAppServer2001
• Benchmark to measure Application Server performance
– SPEC JBB2000
• Server side JVM performance benchmark.
• http://www.spec.org/jbb2000/
• Tools
– OptimizeIt, JProbe, PerformaSure.
– Mercury LoadRunner, WebLoad, Grinder(OpenSource)

© 2002 BEA Systems, Inc. Company Confidential 46


www.bea.com

© 2002 BEA Systems, Inc. — Company Confidential

You might also like