Support Questions
Find answers, ask questions, and share your expertise

Apache Hive Max JVM thread time waiting continuous increase

Explorer

Hello,

 

I'm using HDP 3.0.1 and got the issue with apache HIVE LLAP. When Max JVM thread time reached ~10K, performance degraded, query very slow.

 

Restart LLAP deamon could release thread time waiting, but it still continuous increase.

 

Any one can help me?

Screenshot_1.png

Thanks!

4 REPLIES 4

Super Mentor

@sontt 

Can you please try to collect Thread dump of that process and then get the list of threads.   We will know what kind of activities those threads are performing and what kind of threads are those ..like SSL related threads ..etc

 

1. Find the PID of that process

# ps -ef | grep llap

 

2. Collect couple of thread dumps in few seconds interval

# $JAVA_HOME/bin/jstack -l $PID  >> /tmp/llap_thread_dumps.txt 

 

3. get the unique thread names ... if possible can you please attach the threaddump here?  Or atleast post the name of those threads that we can find from the file '/tmp/llap_thread_dumps.txt'

# cat /tmp/llap_thread_dumps.txt  | grep 'nid' 

 

 

Explorer

Hi,

 

Thanks.

 

I attached thread dump from hive interactive process.

 

https://drive.google.com/file/d/1zThuKcczyp_33tKmsOqySG5Bu8rbRTbp/view

 

If you need anything else, please let me know.

Super Mentor

@sontt 

Thank you for sharing the Thread dump.
I see that many threads (around 1550+) are performing the "ShuffleManager$RunShuffleCallable.callInternal" operation and around 35 are in "ShuffleManager.getNextInput".

 

 

# grep 'org.apache.tez.runtime.library.common.shuffle.impl.ShuffleManager$RunShuffleCallable.callInternal' threaddump.log | wc -l

3300

 

Looks like some Tez Level tuning can be useful. Have you already tuned your Tez memory and other params?
here is a good community article on Tez Tuning which might be helpful here:
https://community.cloudera.com/t5/Community-Articles/Demystify-Apache-Tez-Memory-Tuning-Step-by-Step...

.

Generic possible cause may be that
1. Either Tez AM / container under high memory pressure causing GC. May need to tune/increase the memory if we see a memory pressure. "tez.am.resource.memory.mb"

2. AM/Application log might be helpful to review.

Explorer

Hi,

Thanks @jsensharma

Generic possible cause may be that
1. Either Tez AM / container under high memory pressure causing GC. May need to tune/increase the memory if we see a memory pressure. "tez.am.resource.memory.mb"

2. AM/Application log might be helpful to review.

 

1/ How can I check or verify this case with HDP? 

2/ I saw this log from container of application master (Tez), Is this thing you mentioned?

Screenshot_1.png

; ;