I'm using HDP 3.0.1 and got the issue with apache HIVE LLAP. When Max JVM thread time reached ~10K, performance degraded, query very slow.
Restart LLAP deamon could release thread time waiting, but it still continuous increase.
Any one can help me?
Can you please try to collect Thread dump of that process and then get the list of threads. We will know what kind of activities those threads are performing and what kind of threads are those ..like SSL related threads ..etc
1. Find the PID of that process
# ps -ef | grep llap
2. Collect couple of thread dumps in few seconds interval
# $JAVA_HOME/bin/jstack -l $PID >> /tmp/llap_thread_dumps.txt
3. get the unique thread names ... if possible can you please attach the threaddump here? Or atleast post the name of those threads that we can find from the file '/tmp/llap_thread_dumps.txt'
# cat /tmp/llap_thread_dumps.txt | grep 'nid'
Thank you for sharing the Thread dump.
I see that many threads (around 1550+) are performing the "ShuffleManager$RunShuffleCallable.callInternal" operation and around 35 are in "ShuffleManager.getNextInput".
# grep 'org.apache.tez.runtime.library.common.shuffle.impl.ShuffleManager$RunShuffleCallable.callInternal' threaddump.log | wc -l 3300
Looks like some Tez Level tuning can be useful. Have you already tuned your Tez memory and other params?
here is a good community article on Tez Tuning which might be helpful here:
Generic possible cause may be that
1. Either Tez AM / container under high memory pressure causing GC. May need to tune/increase the memory if we see a memory pressure. "tez.am.resource.memory.mb"
2. AM/Application log might be helpful to review.
Generic possible cause may be that 1. Either Tez AM / container under high memory pressure causing GC. May need to tune/increase the memory if we see a memory pressure. "tez.am.resource.memory.mb" 2. AM/Application log might be helpful to review.
1/ How can I check or verify this case with HDP?
2/ I saw this log from container of application master (Tez), Is this thing you mentioned?