- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
SYMPTOM: RM is down due to below error. Earlier we were suspicion the ulimit could be culprit though we have increased it to 128K. But still no luck.
ERROR:
2016-07-25 12:19:47,125 WARN security.DelegationTokenRenewer (DelegationTokenRenewer.java:handleDTRenewerAppSubmitEvent(873)) - Unable to add the application to the delegation token renewer. java.lang.OutOfMemoryError: unable to create new native thread.
Below was few steps followed -
1. Checked the error and saw that previously the same issue and increasing ulimit resolved the issue. 2. Checked the ulimit and lsof output - $ulimit -n 131072 $lsof |grep yarn |wc 1726 15553 242741 3. Checked the heap size for yarn process which was set to 8Gb and looks good.
Below error was displayed in RM out.log file
Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00007f89641cf000, 12288, 0) failed; error='Cannot allocate memory' (errno=12) # # There is insufficient memory for the Java Runtime Environment to continue. # Native memory allocation (malloc) failed to allocate 12288 bytes for commtting reserved memory. # An error report file with more information is saved as: # /tmp/hs_err_pid56149.log Java HotSpot(TM) 64-Bit Server VM warning: Attempt to deallocate stack guard pages failed. Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00007f89642d0000, 12288, 0) failed; error='Cannot allocate memory' (errno=12)
Below was log in "/tmp/hs_err_pid56149.log"
this looks a problem with memory allocation for threads at OS level
=== Stack: [0x00007f89641cf000,0x00007f89642d0000], sp=0x00007f89642ce900, free space=1022k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x99eb8a] VMError::report_and_die()+0x2ea V [libjvm.so+0x49721b] report_vm_out_of_memory(char const*, int, unsigned long, char const*)+0x9b V [libjvm.so+0x81d9ae] os::Linux::commit_memory_impl(char*, unsigned long, bool)+0xfe V [libjvm.so+0x81da6c] os::pd_commit_memory(char*, unsigned long, bool)+0xc V [libjvm.so+0x8157fa] os::commit_memory(char*, unsigned long, bool)+0x2a V [libjvm.so+0x81bf5d] os::pd_create_stack_guard_pages(char*, unsigned long)+0x6d V [libjvm.so+0x95249e] JavaThread::create_stack_guard_pages()+0x5e V [libjvm.so+0x958de4] JavaThread::run()+0x34 V [libjvm.so+0x81f988] java_start(Thread*)+0x108
===
stack suggest memory allocation (malloc) failed at OS level.check you have enough physical memory available at host.
ROOT CAUSE: Collected the jstack logs for process and found that -
the 'Truststore reloader thread' count is increasing which is the same issue what i earlier mentioned - https://issues.apache.org/jira/browse/YARN-5309.
$grep 'Truststore reloader thread' threadDump|wc -l 14873 $ grep 'Truststore reloader thread' threadDump1|wc -l 14999 $grep 'Truststore reloader thread' threadDump2|wc -l 15063 $grep 'Truststore reloader thread' threadDump3|wc -l 15149 $grep 'Truststore reloader thread' threadDump4|wc -l 15230 $grep 'Truststore reloader thread' threadDump5|wc -l 15347
RESOLUTION: This is confirmed as BUG and patch has been provided to resolve the issue