Community Articles

VidyaSargur · ‎01-20-2020

When we face high CPU issues, the next step is to identify the application and the thread causing the same. This article explains how to identify the thread/s causing high CPU issues.

In order to identify high CPU issues, the first thing we have to do is to identify the PID. This can be done by using a simple "top" command, so if you see a PID constantly using more than 70-100% CPU, then run the following command:

top -H -b -n1 -p <PID>

The command above will show us all the threads associated with that specific PID. The following is an example:

[kafka@c489-node2 conf]$ top -H -b -n1 -p 25910
top - 14:21:54 up 84 days, 12:32,  1 user,  load average: 6.79, 6.62, 6.69
Threads: 145 total,   0 running, 145 sleeping,   0 stopped,   0 zombie
%Cpu(s): 18.3 us,  4.0 sy,  0.0 ni, 77.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 20971520 total, 16874368 free,  1725732 used,  2371420 buff/cache
KiB Swap:  5242872 total,  5242872 free,        0 used. 17874368 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
  25910 kafka     20   0   13.6g 680608  22108 S  0.0  3.2   0:00.04 java
  26259 kafka     20   0   13.6g 680608  22108 S  0.0  3.2   0:02.96 java
  26260 kafka     20   0   13.6g 680608  22108 S  0.0  3.2   0:00.25 java
  26261 kafka     20   0   13.6g 680608  22108 S  0.0  3.2   0:00.25 java
  26262 kafka     20   0   13.6g 680608  22108 S  0.0  3.2   0:00.26 java
  26263 kafka     20   0   13.6g 680608  22108 S  0.0  3.2   0:00.26 java
  26264 kafka     20   0   13.6g 680608  22108 S  0.0  3.2   0:00.24 java
  26265 kafka     20   0   13.6g 680608  22108 S  0.0  3.2   0:00.25 java
  26266 kafka     20   0   13.6g 680608  22108 S  0.0  3.2   0:00.26 java
  26267 kafka     20   0   13.6g 680608  22108 S  0.0  3.2   0:00.26 java
  26268 kafka     20   0   13.6g 680608  22108 S  0.0  3.2   0:00.25 java
  26269 kafka     20   0   13.6g 680608  22108 S  0.0  3.2   0:00.27 java
  26270 kafka     20   0   13.6g 680608  22108 S  0.0  3.2   0:00.25 java
  26271 kafka     20   0   13.6g 680608  22108 S  0.0  3.2   0:00.26 java
  26272 kafka     20   0   13.6g 680608  22108 S  0.0  3.2   0:00.25 java
  26273 kafka     20   0   13.6g 680608  22108 S  0.0  3.2   0:00.26 java
  26274 kafka     20   0   13.6g 680608  22108 S  0.0  3.2   0:00.23 java
  26275 kafka     20   0   13.6g 680608  22108 S  0.0  3.2   0:00.27 java
  26276 kafka     20   0   13.6g 680608  22108 S  0.0  3.2   0:00.25 java
  26277 kafka     20   0   13.6g 680608  22108 S  0.0  3.2   0:00.25 java
  26278 kafka     20   0   13.6g 680608  22108 S  0.0  3.2   0:00.24 java
  26279 kafka     20   0   13.6g 680608  22108 S  0.0  3.2   0:00.25 java
  26280 kafka     20   0   13.6g 680608  22108 S  0.0  3.2   0:00.25 java
  26281 kafka     20   0   13.6g 680608  22108 S  0.0  3.2   0:00.23 java
  26282 kafka     20   0   13.6g 680608  22108 S  0.0  3.2   0:00.24 java
  26283 kafka     20   0   13.6g 680608  22108 S  0.0  3.2   0:00.53 java
  26284 kafka     20   0   13.6g 680608  22108 S  0.0  3.2   0:00.00 java
  26285 kafka     20   0   13.6g 680608  22108 S  0.0  3.2   0:00.00 java
  26286 kafka     20   0   13.6g 680608  22108 S  0.0  3.2   0:00.00 java

From the above output, we have to check %CPU column. Thread ids that you identify using 70% or more are the threads we are looking for. Above command should be taken 5 to 10 times every 5 to 10 seconds during the issue, this will help us to check a pattern in the threads, because sometimes one thread that was using 70%, 10 seconds later goes down to 0 or 10%, as a result, we have to check for the threads constantly using most of the %CPU memory, then if you see that one or more threads are using 70% or more in all the snapshots, then we have identified our high CPU thread. At the same time, we have to take some thread dumps, same timings 5 to 10 seconds, 5 to 10 times each, this will print all PID associated threads.

Thread dumps can be taken as follows:

1. Change to the PID user owner
2. JAVA_HOME/jstack -l <PID>

The example output is very large, on the other hand, each thread is seen as follows:

[kafka@c489-node2 conf]$ /usr/jdk64/jdk1.8.0_112/bin/jstack -l 25910
2020-01-20 14:29:45
Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.112-b15 mixed mode):

"Attach Listener" #386 daemon prio=9 os_prio=0 tid=0x00007fd524001000 nid=0xad96 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
	- None

....
more lines here
....

"kafka-network-thread-1003-ListenerName(PLAINTEXT)-PLAINTEXT-2" #95 prio=5 os_prio=0 tid=0x00007fd629b43000 nid=0x672a runnable [0x00007fd3a23da000]
   java.lang.Thread.State: RUNNABLE
	at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
	at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
	at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
	at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
	- locked <0x00000000c94d9e70> (a sun.nio.ch.Util$3)
	- locked <0x00000000c94d9e60> (a java.util.Collections$UnmodifiableSet)
	- locked <0x00000000c94d9d48> (a sun.nio.ch.EPollSelectorImpl)
	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
	at org.apache.kafka.common.network.Selector.select(Selector.java:686)
	at org.apache.kafka.common.network.Selector.poll(Selector.java:408)
	at kafka.network.Processor.poll(SocketServer.scala:665)
	at kafka.network.Processor.run(SocketServer.scala:582)
	at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
	- None

From the above output, you can see that the thread has the following identifier which is a hexadecimal value: nid=0x672a

Then from the previous output "top -H -b -n1 -p <PID>" and after identified our thread or threads using more than 70% constantly, we will get a decimal number, let's say 25910, which is the first in the thread list previously added to this article. We have to change this value from decimal to hexadecimal, for example,

25910 is 6536 in hexadecimal value

After that, we have to search for 6536 un our thread dumps to identify the thread causing the high CPU issue. In some cases you may see that the identifies thread is something like below:

"Gang worker#6 (Parallel GC Threads)" os_prio=0 tid=0x00007fd62803f800 nid=0x669a runnable

The above thread usually means that the available memory for the application is not enough and more memory is required.

In summary, the steps to identify a high CPU issue are:

Run top command to identify the PID using high CPU.
When PID is identified, run the following command 5 to 10 times every 5 to 10 seconds to identify the threads associated with the PID and their usage top -H -b -n1 -p <PID>, at the same time take some thread dumps, same amount (5 to 10 every 5 to 10 seconds) using jstack -l <pid> with the PID owner.
Finally, when we identify the thread/s consuming most of the CPU, change the value from decimal to hexadecimal and search for those in the thread dumps.

There is a command we can run to automate this process, basically, take the thread dumps and CPU thread output.

for i in {1..10}; do echo $i ; top -H -b -n1 -p <PID> > /tmp/cpu-$(date +%s)-$i.log; JAVA_HOME/jstack -l <PID> > /tmp/jstack-$(date +%s)-$i ;sleep 2; done

Example command with values:

for i in {1..10}; do echo $i ; /usr/jdk64/jdk1.8.0_112/bin/jstack -l 293558 > /tmp/kafka/jstack-$(date +%s)-$i ;sleep 2; done

Also, I have created a simple script to identify the threads causing the issue, feel free to modify the same accordingly:

#Execution: ./highCpu.sh <argument1 = PID> <argument2 = sleep in seconds>
echo "=== Script to identify threads for High CPU issues ==="
read -p "Select the path to store jstacks - i.e /tmp/highCPU: " jPath
read -p "Select the JVM path - i.e /usr/jdk64/jdk1.8.0_112/bin: " JVMPath
for i in {1..10}; do echo "Collecting thread dumps and cpu: " $i "of 10" ; top -H -b -n1 -p $1 > $jPath/cpu-$(date +%s)-$i.log; $JVMPath/jstack -l $1 > $jPath/jstack-$(date +%s)-$i ;sleep $2; done
echo
echo "PID %CPU"
tail -n +6 $jPath/cpu-* | awk '{ if ($9 > 80) { print $1 " " $9} }'
column1=$(tail -n +6 $jPath/cpu-* | awk '{ if ($9 > 80) { print $1} }')
hexValues=$(printf '%x\n' $column1)
echo
echo "Printing HEX values of high usage threads: " $hexValues

Cloudera Community

Community Articles

Troubleshooting high CPU issues

Apache Kafka