Created 07-05-2017 12:22 PM
every time I execute a big hive query, my hadoop nodemanager killed by SEGSEGV. I'm using hadoop 2.8.0 and oracle jdk 1.8.0_131
The yarn java error file snippet as follow:
# # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00007f4b766818b1, pid=10904, tid=0x00007f4b33fff700 # # JRE version: Java(TM) SE Runtime Environment (8.0_131-b11) (build 1.8.0_131-b11) # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.131-b11 mixed mode linux-amd64 compressed oops) # Problematic frame: # C [libc.so.6+0x1628b1] __strlen_sse2_pminub+0x11 # # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again # # If you would like to submit a bug report, please visit: # http://bugreport.java.com/bugreport/crash.jsp # The crash happened outside the Java Virtual Machine in native code. # See problematic frame for where to report the bug. # --------------- T H R E A D --------------- Current thread (0x00007f4b727d2800): JavaThread "ContainersLauncher #10" [_thread_in_native, id=16663, stack(0x00007f4b33eff000,0x00007f4b34000000)] siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x000000000000000f Registers: RAX=0x0000000000000000, RBX=0x00000000000000b3, RCX=0x000000000000000f, RDX=0x0000000000000006 RSP=0x00007f4b33ffbdf8, RBP=0x00007f4b381b7e70, RSI=0x00007f4b33ffc2d0, RDI=0x000000000000000f R8 =0x00007f4b33ffbe60, R9 =0x0000000000000008, R10=0x00000000fffff000, R11=0x00007f4b766963f4 R12=0x0000000000000022, R13=0x00007f4b381b7f10, R14=0x00007f4b77123840, R15=0x0000000000000004 RIP=0x00007f4b766818b1, EFLAGS=0x0000000000010283, CSGSFS=0x0000000000000033, ERR=0x0000000000000004 TRAPNO=0x000000000000000e Top of Stack: (sp=0x00007f4b33ffbdf8) 0x00007f4b33ffbdf8: 00007f4b76f1cbed 0000000000000000 0x00007f4b33ffbe08: 00007f4b33ffc6d0 00007f4b33ffc6d0 0x00007f4b33ffbe18: 00007f4b76f200e4 00007f4b76f200ed 0x00007f4b33ffbe28: 00007f4b33ffbe60 00000000016f0520 0x00007f4b33ffbe38: 00007f4b33ffbed0 0000000000000004 0x00007f4b33ffbe48: 00007f4b76f1bbe8 00007f4b33ffc6d0 0x00007f4b33ffbe58: 00007f4b33ffc2d0 00656e696c646d63 0x00007f4b33ffbe68: 0000000000000000 333030305f363938 0x00007f4b33ffbe78: 303030304bb40058 0000000000000000 0x00007f4b33ffbe88: 0000000000000000 0000000000000000 0x00007f4b33ffbe98: 0000000000000000 0000000000000000 0x00007f4b33ffbea8: 0000000000000000 0000000000000000 0x00007f4b33ffbeb8: 0000000000000000 0000000000000000 0x00007f4b33ffbec8: 0000000000000000 6d616e6500203a5d 0x00007f4b33ffbed8: 0000000000003a65 0000000000000000 0x00007f4b33ffbee8: 0000000000000000 0000000000000000 0x00007f4b33ffbef8: 0000000000000000 0000000000000000 0x00007f4b33ffbf08: 0000000000000000 0000000000000000 0x00007f4b33ffbf18: 0000000000000000 0000000000000000 0x00007f4b33ffbf28: 0000000000000000 0000000000000000 0x00007f4b33ffbf38: 0000000000000000 0000000000000000 0x00007f4b33ffbf48: 0000000000000000 0000000000000000 0x00007f4b33ffbf58: 0000000000000000 0000000000000000 0x00007f4b33ffbf68: 0000000000000000 0000000000000000 0x00007f4b33ffbf78: 0000000000000000 0000000000000000 0x00007f4b33ffbf88: 0000000000000000 0000000000000000 0x00007f4b33ffbf98: 0000000000000000 0000000000000000 0x00007f4b33ffbfa8: 0000000000000000 0000000000000000 0x00007f4b33ffbfb8: 0000000000000000 0000000000000000 0x00007f4b33ffbfc8: 0000000000000000 0000000000000000 0x00007f4b33ffbfd8: 0000000000000000 0000000000000000 0x00007f4b33ffbfe8: 0000000000000000 0000000000000000 Instructions: (pc=0x00007f4b766818b1) 0x00007f4b76681891: c0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 0x00007f4b766818a1: 31 c0 89 f9 83 e1 3f 66 0f ef c0 83 f9 30 77 1d 0x00007f4b766818b1: f3 0f 6f 0f 66 0f 74 c1 66 0f d7 d0 85 d2 0f 85 0x00007f4b766818c1: 4e 02 00 00 48 89 f8 48 83 e0 f0 eb 24 48 89 f8 Register to memory mapping: RAX=0x0000000000000000 is an unknown value RBX=0x00000000000000b3 is an unknown value RCX=0x000000000000000f is an unknown value RDX=0x0000000000000006 is an unknown value RSP=0x00007f4b33ffbdf8 is pointing into the stack for thread: 0x00007f4b727d2800 RBP=0x00007f4b381b7e70 is pointing into the stack for thread: 0x00007f4b73bd4800 RSI=0x00007f4b33ffc2d0 is pointing into the stack for thread: 0x00007f4b727d2800 RDI=0x000000000000000f is an unknown value R8 =0x00007f4b33ffbe60 is pointing into the stack for thread: 0x00007f4b727d2800 R9 =0x0000000000000008 is an unknown value R10=0x00000000fffff000 is an unknown value R11=0x00007f4b766963f4: <offset 0x1773f4> in /lib64/libc.so.6 at 0x00007f4b7651f000 R12=0x0000000000000022 is an unknown value R13=0x00007f4b381b7f10 is pointing into the stack for thread: 0x00007f4b73bd4800 R14=0x00007f4b77123840: snoopy_inputdatastorage_data+0 in /usr/local/snoopy/lib/libsnoopy.so at 0x00007f4b76f16000 R15=0x0000000000000004 is an unknown value Stack: [0x00007f4b33eff000,0x00007f4b34000000], sp=0x00007f4b33ffbdf8, free space=1011k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) C [libc.so.6+0x1628b1] __strlen_sse2_pminub+0x11 Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) j java.lang.UNIXProcess.forkAndExec(I[B[B[BI[BI[B[IZ)I+0 j java.lang.UNIXProcess.<init>([B[BI[BI[B[IZ)V+30 j java.lang.ProcessImpl.start([Ljava/lang/String;Ljava/util/Map;Ljava/lang/String;[Ljava/lang/ProcessBuilder$Redirect;Z)Ljava/lang/Process;+433 j java.lang.ProcessBuilder.start()Ljava/lang/Process;+161 j org.apache.hadoop.util.Shell.runCommand()V+136 j org.apache.hadoop.util.Shell.run()V+23 j org.apache.hadoop.util.Shell$ShellCommandExecutor.execute()V+67 j org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(Lorg/apache/hadoop/yarn/server/nodemanager/executor/ContainerStartContext;)I+492 j org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call()Ljava/lang/Integer;+1166 j org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call()Ljava/lang/Object;+1 J 4490 C1 java.util.concurrent.FutureTask.run()V (126 bytes) @ 0x00007f4b5e50acdc [0x00007f4b5e50aa80+0x25c] j java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V+95 j java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5 j java.lang.Thread.run()V+11 v ~StubRoutines::call_stub
the completed log can be downloaded here yarn_hs_error.log
it seems like a hadoop or jdk bug?
Created 07-05-2017 05:13 PM
based on the JVM crash report, Looks like the crash is happening because of the problematic "snoopy" Native library. Can you please check the snoopy version of temporarily uninstallit.
Created 07-05-2017 05:13 PM
based on the JVM crash report, Looks like the crash is happening because of the problematic "snoopy" Native library. Can you please check the snoopy version of temporarily uninstallit.
Created 07-06-2017 02:56 AM
Yes, it's because snoopy, I disable it and it works good now. Thanks~
Created 07-06-2017 03:03 AM
Created 07-05-2017 11:49 PM
I agree with @Jay SenSharma. Can you try uninstalling snoopy and try?
More references -
https://issues.apache.org/jira/browse/YARN-5546 [ Not a BUG though ]
https://github.com/a2o/snoopy/issues/39
https://stackoverflow.com/questions/44922588/hadoop-nodemanager-killed-by-sigsegv
Created 07-06-2017 02:56 AM
Yes it's snoopy~ thank you