Tuesday, 1 April 2014

Troubleshooting Out of Memory errors in WebSphere

Troubleshooting Out of Memory errors
Effects of running Out  of Memory
1.     The garbage collection (GC) process struggles to free memory. GC is running all the time
2.     The long and continuous GC cycles cause high CPU in the application server
3.     The Application server is not able to process request as fast as they come in. This creates queuing in the Web and Application Servers.
4.     The JVM eventually stops responding and crashes. Requests are failed over to the next server.

When an Out of Memory occurs, three primary pieces of evidence are left at the scene.
1.     Verbose garbage collection log (How it happened)
2.     Heapdump (What was in memory when it happened)
3.     Javacore (What was running when it  happened).


Tools use to analyse the three evidence
1.     Garbage collection log - The IBM Support Assistance (ISA) provides the Garbage Collection and Memory Visualizer Tool to open the verboseGC log file
2.     Java heapdump  (heapdump.phd)  - Use Memory Analyzer Tool (MAT) in IBM Support  Assistant (ISA)
3.     Javacore - IBM Thread and Monitor Dump Analyzer for Java


Categorization Out of Memory problems
1.     Java heap exhaustion - The JVM cannot allocate an object because it is out of memory and no more memory could be made available by the garbage collector.
2.     Large object allocation - The application requesting a very large object  which Java cannot accommodate in the heap.
3.     Native memory allocation failure  - The memory space for the operating system process that correspond to Java has two main areas
The Java Heap which contains the instances of Java objects and is maintained by Garbage Collection

The Native Heap which contains - Compiled JIT code, Malloc allocation by application JNI code,

Common scenarios that could  lead to Out of Memory

1. Typically, heap exhaustion is caused by
               - Large categories and lack of pagination or filtering (“show all”)
               - Improperly sized cache (in-memory cache is too large)
               - Unbounded search
               - Scheduler processing a large job
               - Processing large backend messages
               - Improperly sized Java heap (too small)
2. Typically, 
Out of Memory due to a large object allocation is caused by
               - A 3rd-party catalog integration returning all products at once
               - Inbound web service receiving large messages
3. Typically, a native memory error is caused by
              - Improperly sized Java heap ( too big )



Usefuls links;
3. IBM SDK - Diagnosis documentation http://www.ibm.com/developerworks/java/jdk/diagnosis/ 
4.  https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014821664


Websphere Application Server administrator Interview Question PART 1

Websphere Application Server administrator Interview Question PART 1



 Where would you enable Verbose Garbage Collection? 

A.  
From the Admin Console: 
        Application Server -> ServerName -> Process Definition -> Java Virtural Machine – Select verbose garbage collection radio button.
 

What is Garbage Collection? 

A. 
Garbage collection is a process of automatically freeing objects that are no longer referenced by the program. 

How to find the admin console port? 

A.
  Navigate to the DMGR home and run the below command to find the Admin console Port
     grep -R "9060" * --exclude=*.log

From the output try to find as below


properties/portdef.props:WC_adminhost=9060
properties/firststepsport.props:9060

or

Open the portdef.props file  from the directoty $WAS_HOME/profiles/Dmgr/properties
and search for WC_adminhost

How to check WAS Version / Build Level?
A. 
./WAS_HOME/bin/versionInfo.sh

What is the Default SOAP port number?        
A.
 8879 



What are different ways to capture heap dumps for a Websphere JVM?
 or When to generate ? How to generate ? how to debug ?

Thread Dumps

If you get unexplained server hangs under WebSphere, you can obtain, from the WebSphere server, a thread dump to help diagnose the problem.

In the case of a server hang, you can force an application to create a thread dump.

On unix/Linux machines find the process id (PID) of the hung JVM and issue kill -3 PID.  Look for an output file in the installation root directory with a name like javacore.date.time.id.txt.
Using wasadmin prompt,
get the handle of the server
wsadmin>set jvm [$AdminControl completeObjectName type=JVM,process=server1,*]
execute
wsadmin>$AdminControl invoke $jvm dumpThreads

If an application server spontaneously dies, look for a file. The JVM creates the file in the product directory structure, with a name like javacore[number].txt.

Download thread analyzer from IBM website to analyze the generated thread dumps. (http://www.alphaworks.ibm.com/tech/jca)

Heap Dumps

A heapdump is a snapshot of JVM memory – it shows the live objects on the heap along with references between objects. It is used to determine memory usage patterns and memory leak suspects.

To enable automated heap dump generation support, perform the following steps in the administrative console: (heap dump will generated upon receiving the out.of.memory exceptios)

1. Click Servers > Application servers in the administrative console navigation tree.
2. Click server_name >Runtime Performance Advisor Configuration.
3. Click the Runtime tab.
4. Select the Enable automatic heap dump collection check box.
5. Click OK.

Generate Heap Dump manually
A. use kill -3 PID on unix/linux machines.


What are FFDC logs

A. FFDC is first failure data capture it is a log which the IBM asks when there is a PMR opened with them .



How to find the admin console port?

How to find the admin console port? 

Navigate to the DMGR home and run the below command to find the Admin console Port
     grep -R "9060" * --exclude=*.log

From the output try to find as below


properties/portdef.props:WC_adminhost=9060
properties/firststepsport.props:9060

or

Open the portdef.props file  from the directoty $WAS_HOME/profiles/Dmgr/properties
and search for WC_adminhost

Drill Down Memory usage in RHEL

Drill Down Memory usage in RHEL

Below script will used down to drill down the memory used by the process id in Linux

1. Find the process id by using the below script

     ps axu | awk '{print $2, $3, $4, $11}' | head -1 && ps axu | awk '{print $2, $3, $4, $11}' | sort -k2 -nr |head -5

2. Drill down the each process id using below script ( head - 20 => can be change according to the output)


    pmap -x <PID> | awk '{print $1, $2*0.000976563, $3, $4,$5,$6,$7,$8,$9}' | sort -k2 -nr |head -2