Created on
08-21-2019
02:13 AM
- last edited on
08-21-2019
03:43 AM
by
VidyaSargur
Hello,
I'm using HDP 3.0.1 and got the issue with apache HIVE LLAP. When Max JVM thread time reached ~10K, performance degraded, query very slow.
Restart LLAP deamon could release thread time waiting, but it still continuous increase.
Any one can help me?
Thanks!
Created 08-21-2019 02:33 AM
Can you please try to collect Thread dump of that process and then get the list of threads. We will know what kind of activities those threads are performing and what kind of threads are those ..like SSL related threads ..etc
1. Find the PID of that process
# ps -ef | grep llap
2. Collect couple of thread dumps in few seconds interval
# $JAVA_HOME/bin/jstack -l $PID >> /tmp/llap_thread_dumps.txt
3. get the unique thread names ... if possible can you please attach the threaddump here? Or atleast post the name of those threads that we can find from the file '/tmp/llap_thread_dumps.txt'
# cat /tmp/llap_thread_dumps.txt | grep 'nid'
Created 08-21-2019 03:01 AM
Hi,
Thanks.
I attached thread dump from hive interactive process.
https://drive.google.com/file/d/1zThuKcczyp_33tKmsOqySG5Bu8rbRTbp/view
If you need anything else, please let me know.
Created on 08-21-2019 04:44 AM - edited 08-21-2019 05:10 AM
Thank you for sharing the Thread dump.
I see that many threads (around 1550+) are performing the "ShuffleManager$RunShuffleCallable.callInternal" operation and around 35 are in "ShuffleManager.getNextInput".
# grep 'org.apache.tez.runtime.library.common.shuffle.impl.ShuffleManager$RunShuffleCallable.callInternal' threaddump.log | wc -l
3300
Looks like some Tez Level tuning can be useful. Have you already tuned your Tez memory and other params?
here is a good community article on Tez Tuning which might be helpful here:
https://community.cloudera.com/t5/Community-Articles/Demystify-Apache-Tez-Memory-Tuning-Step-by-Step...
.
Generic possible cause may be that
1. Either Tez AM / container under high memory pressure causing GC. May need to tune/increase the memory if we see a memory pressure. "tez.am.resource.memory.mb"
2. AM/Application log might be helpful to review.
Created 08-23-2019 03:56 AM
Hi,
Thanks @jsensharma
Generic possible cause may be that
1. Either Tez AM / container under high memory pressure causing GC. May need to tune/increase the memory if we see a memory pressure. "tez.am.resource.memory.mb"
2. AM/Application log might be helpful to review.
1/ How can I check or verify this case with HDP?
2/ I saw this log from container of application master (Tez), Is this thing you mentioned?