Community Articles

Find and share helpful community-sourced technical articles.
avatar
Super Guru

SYMPTOM: RM is down due to below error. Earlier we were suspicion the ulimit could be culprit though we have increased it to 128K. But still no luck.

ERROR:

2016-07-25 12:19:47,125 WARN security.DelegationTokenRenewer (DelegationTokenRenewer.java:handleDTRenewerAppSubmitEvent(873)) - Unable to add the application to the delegation token renewer. java.lang.OutOfMemoryError: unable to create new native thread. 

Below was few steps followed -

1. Checked the error and saw that previously the same issue and increasing ulimit resolved the issue. 2. Checked the ulimit and lsof output - $ulimit -n 131072 $lsof |grep yarn |wc 1726 15553 242741 3. Checked the heap size for yarn process which was set to 8Gb and looks good.

Below error was displayed in RM out.log file

Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00007f89641cf000, 12288, 0) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 12288 bytes for commtting reserved memory.
# An error report file with more information is saved as:
# /tmp/hs_err_pid56149.log
Java HotSpot(TM) 64-Bit Server VM warning: Attempt to deallocate stack guard pages failed.
Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00007f89642d0000, 12288, 0) failed; error='Cannot allocate memory' (errno=12)

Below was log in "/tmp/hs_err_pid56149.log"

this looks a problem with memory allocation for threads at OS level

=== Stack: [0x00007f89641cf000,0x00007f89642d0000], sp=0x00007f89642ce900, free space=1022k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x99eb8a] VMError::report_and_die()+0x2ea V [libjvm.so+0x49721b] report_vm_out_of_memory(char const*, int, unsigned long, char const*)+0x9b V [libjvm.so+0x81d9ae] os::Linux::commit_memory_impl(char*, unsigned long, bool)+0xfe V [libjvm.so+0x81da6c] os::pd_commit_memory(char*, unsigned long, bool)+0xc V [libjvm.so+0x8157fa] os::commit_memory(char*, unsigned long, bool)+0x2a V [libjvm.so+0x81bf5d] os::pd_create_stack_guard_pages(char*, unsigned long)+0x6d V [libjvm.so+0x95249e] JavaThread::create_stack_guard_pages()+0x5e V [libjvm.so+0x958de4] JavaThread::run()+0x34 V [libjvm.so+0x81f988] java_start(Thread*)+0x108

===

stack suggest memory allocation (malloc) failed at OS level.check you have enough physical memory available at host.

ROOT CAUSE: Collected the jstack logs for process and found that -

the 'Truststore reloader thread' count is increasing which is the same issue what i earlier mentioned - https://issues.apache.org/jira/browse/YARN-5309.

$grep 'Truststore reloader thread' threadDump|wc -l 
   14873 
$ grep 'Truststore reloader thread' threadDump1|wc -l 
   14999 
$grep 'Truststore reloader thread' threadDump2|wc -l
   15063 
$grep 'Truststore reloader thread' threadDump3|wc -l 
   15149 
$grep 'Truststore reloader thread' threadDump4|wc -l 
   15230 
$grep 'Truststore reloader thread' threadDump5|wc -l 
   15347 

RESOLUTION: This is confirmed as BUG and patch has been provided to resolve the issue

https://issues.apache.org/jira/browse/YARN-5309

https://hortonworks.jira.com/browse/BUG-63499

698 Views