Created on 06-27-2017 08:21 AM - edited 09-16-2022 04:50 AM
Hi, after updateing my data nodes and kernel, and restarting the cluster Impala failed to start the Daemons. I tried to restart the impala daemon, but did not helped. Also tested on CDH 5.10 and CDH 5.11.1.
Tried to install different version of Java as well, downgrade, didnt helped either.
Running Centos 7 and CDH 5.11.1
Any suggestions how to avoid this error?
OS reinstall is my last option, but I do not want to clean up the whole cluster.
# # A fatal error has been detected by the Java Runtime Environment: # # SIGBUS (0x7) at pc=0x00007fa6b9f80c18, pid=3819, tid=0x00007fa6cfdb4900 # # JRE version: (8.0_131-b11) (build ) # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.131-b11 mixed mode linux-amd64 compressed oops) # Problematic frame: # j java.lang.Object.<clinit>()V+0 # # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again # # If you would like to submit a bug report, please visit: # http://bugreport.java.com/bugreport/crash.jsp # --------------- T H R E A D --------------- Current thread (0x0000000004c94000): JavaThread "Unknown thread" [_thread_in_Java, id=3819, stack(0x00007ffc40b88000,0x00007ffc40c88000)] siginfo: si_signo: 7 (SIGBUS), si_code: 2 (BUS_ADRERR), si_addr: 0x00007ffc40c77420 Registers: RAX=0x00007fa6b50f5a68, RBX=0x00007fa6b5047ca8, RCX=0x0000000000000008, RDX=0x00007fa6cf0fff30 RSP=0x00007ffc40c7f420, RBP=0x00007ffc40c7f460, RSI=0x0000000000000004, RDI=0x0000000004c94000 R8 =0x0000000000000000, R9 =0x0000000000000003, R10=0x0000000000000000, R11=0x0000000000000002 R12=0x0000000000000000, R13=0x00007fa6b5047c98, R14=0x00007ffc40c7f468, R15=0x0000000004c94000 RIP=0x00007fa6b9f80c18, EFLAGS=0x0000000000010202, CSGSFS=0x0000000000000033, ERR=0x0000000000000006 TRAPNO=0x000000000000000e Top of Stack: (sp=0x00007ffc40c7f420) 0x00007ffc40c7f420: 00007ffc40c7f420 00007fa6b5047c98 0x00007ffc40c7f430: 00007ffc40c7f468 00007fa6b50f1040 0x00007ffc40c7f440: 0000000000000000 00007fa6b5047ca8 0x00007ffc40c7f450: 0000000000000000 00007ffc40c7f470 0x00007ffc40c7f460: 00007ffc40c7f4d0 00007fa6b9f6e4e7 0x00007ffc40c7f470: 00007ffc00001fa0 0000000000000000 0x00007ffc40c7f480: 0000000004c94000 00007ffc40c7f550 0x00007ffc40c7f490: 00007fa6b5047ca8 00007ffc40c7f510 0x00007ffc40c7f4a0: 00007ffc40c7f510 00007ffc40c7f6e8 0x00007ffc40c7f4b0: 00007fa60000000a 00007fa6b5047ca8 0x00007ffc40c7f4c0: 00007fa6b9f809c0 00007ffc40c7f658 0x00007ffc40c7f4d0: 00007ffc40c7f640 00007fa6ce7cfd16 0x00007ffc40c7f4e0: 0000000000000000 0000000004c94000 0x00007ffc40c7f4f0: 00007ffc40c7f650 00007ffc40c7f6e0 0x00007ffc40c7f500: 00007fa6b9f809c0 00007fa60000000a 0x00007ffc40c7f510: 0000000004c94000 0000000004b78140 0x00007ffc40c7f520: 00007fa6b5047ca8 0000000000000000 0x00007ffc40c7f530: 0000000000000000 0000000000000000 0x00007ffc40c7f540: 0000000000000000 00007ffc40c7f6e0 0x00007ffc40c7f550: 0000000004c94000 0000000004b65b40 0x00007ffc40c7f560: 0000000004b5c5a0 0000000004b5c5c0 0x00007ffc40c7f570: 0000000004b5c688 00000000000000d8 0x00007ffc40c7f580: 00007ffc40c7f830 0000000004c94000 0x00007ffc40c7f590: 00007fa6b5047ca8 0000000004c94000 0x00007ffc40c7f5a0: 0000000004b618d0 00007fa6b5049648 0x00007ffc40c7f5b0: 00007fa6b5047ca8 0000000004c94000 0x00007ffc40c7f5c0: 00007ffc40c7f720 00007fa6ce910043 0x00007ffc40c7f5d0: 0000000004c94000 00007fa6ce9f1e67 0x00007ffc40c7f5e0: 00007fa6b5047ca8 0000000004c94000 0x00007ffc40c7f5f0: 00007ffc40c7f6d0 0000000000000000 0x00007ffc40c7f600: 00007fa6b5047ca8 0000000004c94000 0x00007ffc40c7f610: 0000000004b5c5a0 00007ffc40c7f650 Instructions: (pc=0x00007fa6b9f80c18) 0x00007fa6b9f80bf8: 00 d0 ff ff 89 84 24 00 c0 ff ff 89 84 24 00 b0 0x00007fa6b9f80c08: ff ff 89 84 24 00 a0 ff ff 89 84 24 00 90 ff ff 0x00007fa6b9f80c18: 89 84 24 00 80 ff ff 89 84 24 00 70 ff ff 89 84 0x00007fa6b9f80c28: 24 00 60 ff ff 89 84 24 00 50 ff ff 89 84 24 00 Register to memory mapping: RAX=0x00007fa6b50f5a68 is pointing into metadata RBX={method} {0x00007fa6b5047ca8} '<clinit>' '()V' in 'java/lang/Object' RCX=0x0000000000000008 is an unknown value RDX=0x00007fa6cf0fff30: <offset 0xfc1f30> in /usr/java/jdk1.8.0_131/jre/lib/amd64/server/libjvm.so at 0x00007fa6ce13e000 RSP=0x00007ffc40c7f420 is pointing into the stack for thread: 0x0000000004c94000 RBP=0x00007ffc40c7f460 is pointing into the stack for thread: 0x0000000004c94000 RSI=0x0000000000000004 is an unknown value RDI=0x0000000004c94000 is a thread R8 =0x0000000000000000 is an unknown value R9 =0x0000000000000003 is an unknown value VM Arguments: jvm_args: -Djava.library.path=/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/bin/../lib/impala/lib java_command: <unknown> java_class_path (initial): /usr/share/java/mysql-connector-java.jar:/usr/share/cmf/lib/postgresql-9.0-801.jdbc4.jar:/usr/share/java/oracle-connector-java.jar:/var/lib/impala/*.jar:/usr/share/java/mysql-connector-java.jar:/run/cloudera-scm-agent/process/319-impala-IMPALAD/impala-conf:/run/cloudera-scm-agent/process/319-impala-IMPALAD/hadoop-conf:/run/cloudera-scm-agent/process/319-impala-IMPALAD/hive-conf:/run/cloudera-scm-agent/process/319-impala-IMPALAD/hbase-conf:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/impala/lib/libthrift-0.9.0.jar::/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/impala/lib/ST4-4.0.4.jar:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/impala/lib/activation-1.1.jar:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/impala/lib/ant-1.5.jar:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/impala/lib/ant-1.9.1.jar:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/impala/lib/ant-contrib-1.0b3.jar:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/impala/lib/ant-launcher-1.9.1.jar:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/impala/lib/antlr-2.7.7.jar:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/impala/lib/antlr-runtime-3.3.jar:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/impala/lib/aopalliance-1.0.jar:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/impala/lib/apache-log4j-extras-1.2.17.jar:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/impala/lib/apacheds-i18n-2.0.0-M15.jar:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/impala/lib/apacheds-kerberos-codec-2.0.0-M15.jar:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/impala/lib/api-asn1-api-1.0.0-M20.jar:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/impala/lib/api-util-1.0.0-M20.jar:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/impala/lib/asm-3.1.jar:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/impala/lib/asm-commons-3.1.jar:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/impala/ Launcher Type: generic Environment Variables: JAVA_HOME=/usr/java/jdk1.8.0_131 JAVA_TOOL_OPTIONS= PATH=/sbin:/usr/sbin:/bin:/usr/bin LD_LIBRARY_PATH=/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/impala/lib:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/impala/sbin-retail:/usr/java/jdk1.8.0_131/jre/lib/amd64:/usr/java/jdk1.8.0_131/jre/lib/amd64:/usr/java/jdk1.8.0_131/jre/lib/amd64/server: SHELL=/bin/bash Signal Handlers: SIGSEGV: [libjvm.so+0xac8af0], sa_mask[0]=11111111111111111111111111111110, sa_flags=SA_ONSTACK|SA_SIGINFO SIGBUS: [libjvm.so+0xac8af0], sa_mask[0]=11111111111111111111111111111110, sa_flags=SA_RESTART|SA_SIGINFO SIGFPE: [impalad+0x178a0e0], sa_mask[0]=00010111001000000000000000000000, sa_flags=SA_ONSTACK|SA_SIGINFO SIGPIPE: SIG_IGN, sa_mask[0]=00000000000000000000000000000000, sa_flags=none SIGXFSZ: SIG_IGN, sa_mask[0]=00000000000000000000000000000000, sa_flags=none SIGILL: [impalad+0x178a0e0], sa_mask[0]=00010111001000000000000000000000, sa_flags=SA_ONSTACK|SA_SIGINFO SIGUSR1: [impalad+0x79a640], sa_mask[0]=00000000000000000000000000000000, sa_flags=none SIGUSR2: [libjvm.so+0x923610], sa_mask[0]=00000000000000000000000000000000, sa_flags=SA_RESTART|SA_SIGINFO SIGHUP: SIG_DFL, sa_mask[0]=00000000000000000000000000000000, sa_flags=none SIGINT: SIG_DFL, sa_mask[0]=00000000000000000000000000000000, sa_flags=none SIGTERM: SIG_DFL, sa_mask[0]=00000000000000000000000000000000, sa_flags=none SIGQUIT: SIG_DFL, sa_mask[0]=00000000000000000000000000000000, sa_flags=none --------------- S Y S T E M --------------- OS:CentOS Linux release 7.3.1611 (Core) uname:Linux 3.10.0-514.21.2.el7.x86_64 #1 SMP Tue Jun 20 12:24:47 UTC 2017 x86_64 libc:glibc 2.17 NPTL 2.17 rlimit: STACK 8192k, CORE 0k, NPROC 65536, NOFILE 32768, AS infinity load average:0.56 0.21 0.08 /proc/meminfo: MemTotal: 7231176 kB MemFree: 323736 kB MemAvailable: 3480696 kB Buffers: 4060 kB Cached: 3365420 kB SwapCached: 0 kB Active: 3480808 kB Inactive: 3267080 kB Active(anon): 3379768 kB Inactive(anon): 16648 kB Active(file): 101040 kB Inactive(file): 3250432 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 0 kB SwapFree: 0 kB Dirty: 224 kB Writeback: 0 kB AnonPages: 3378180 kB Mapped: 48992 kB
Created 06-27-2017 08:35 AM
Found out that it is related to this issue
https://issues.apache.org/jira/browse/DAEMON-363
So editing in CM the Impala Daemon properties:
Impala Daemon Environment Advanced Configuration Snippet (Safety Valve)
JAVA_TOOL_OPTIONS=-Xss2m
Fixed the problem.
Created 06-27-2017 08:35 AM
Found out that it is related to this issue
https://issues.apache.org/jira/browse/DAEMON-363
So editing in CM the Impala Daemon properties:
Impala Daemon Environment Advanced Configuration Snippet (Safety Valve)
JAVA_TOOL_OPTIONS=-Xss2m
Fixed the problem.
Created 06-27-2017 10:49 AM
This is https://issues.apache.org/jira/browse/IMPALA-5578
I think you will probably also need to update "Impala Catalog Server Environment Advanced Configuration Snippet (Safety Valve)" before you restart the catalog daemon.
Created 06-27-2017 11:34 PM
As suggested I bumped the JAVA_TOOL_OPTIONS=-Xss2m .
do you have any rational behind for -Xss2m ? can we increase it more does it depends on any parameter like number of quries and hits to impala daemon .
Created 06-28-2017 09:11 AM
"-Xss1280k" seems to be sufficient. The default is 1024k I believe, and previously that was always sufficient in our testing.
The crash was caused by a change to the linux kernel that modified the memory layout around thread stacks. As a result with the default Java stack size, the JVM somehow ends up accessing invalid memory. Increasing the stack size mitigates this.t
Created 06-28-2017 06:09 PM
Thanks for the detail information , appreciated it .
Created 06-29-2017 11:23 PM
@Tim Armstrongon more question
Could you let me know the Kernel version that it would fail (Centos / RHEL )
Since it is only being tested in testing enviroment , what should be done to the production box ?
Created 06-30-2017 09:32 AM
I don't have a list of affected kernels, particularly since so many different kernel versions were patched. I know the problem was the initial fix for CVE-2017-1000364, so you could check to see if the kernel version has that in it.
I believe many Linux vendors are working on a fix for the fix, e.g. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=865549 so this advice may become incorrect once the problem is resolved.
Created 07-27-2018 10:55 AM
I am seeing similar issue with ServiceMonitor and Host monitor when using Redhat 6.8 (Santiago)
CM/CDH is 5.11.1
After adding JAVA_TOOL_OPTIONS=-Xss2m to hostmonitor and service monitor configuration is works fine.
Is this a known issue with Redhat 6.7 as well ? (The link you mentioned is centos and its 6.9)
Created 06-27-2017 11:18 AM