Created on 06-27-2017 02:02 AM - edited 09-16-2022 04:50 AM
Suddenly, without no reason both Impalad and Catalogd services started failing with Segmentation fault error. Nothing useful is written to INFO nor ERROR logs nor Role logs.
Log file created at: 2017/06/26 16:13:30 Running on machine: bi-dev-a-05 Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg E0626 16:13:30.846743 27235 logging.cc:121] stderr will be logged to this file.
In core dump that's the only thing I see:
/var/log/catalogd$ sudo gdb --core core GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1 Copyright (C) 2014 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word". [New LWP 23770] [New LWP 23830] [New LWP 23833] [New LWP 23829] [New LWP 23832] [New LWP 23828] [New LWP 23831] [New LWP 23827] [New LWP 23834] Core was generated by `/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/impala/sbin-retail/catalo'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007f80345da307 in ?? ()
(gdb) bt full #0 0x00007f80345da307 in ?? () No symbol table info available. #1 0x0000000000000000 in ?? () No symbol table info available.
(gdb) info sharedLibrary From To Syms Read Shared Object Library 0x00007f80354ff960 0x00007f8035500208 No /usr/lib/jvm/java-8-oracle/jre/lib/amd64/libjsig.so 0x00007f80352e6b60 0x00007f80352f7563 No /usr/lib/x86_64-linux-gnu/libsasl2.so.2 0x00007f8035097ea0 0x00007f80350cbb2d No /lib/x86_64-linux-gnu/libssl.so.1.0.0 0x00007f8034d07e40 0x00007f8034df8caf No /lib/x86_64-linux-gnu/libcrypto.so.1.0.0 +++0x00007f8033ed4bb0 0x00007f80347adf28 No /usr/lib/jvm/java-8-oracle/jre/lib/amd64/server/libjvm.so 0x00007f8033616d00 0x00007f8033992004 No /opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/impala/lib/libkudu_client.so.0 0x00007f80333c5350 0x00007f80333c833c No /lib/x86_64-linux-gnu/librt.so.1 0x00007f80331bfed0 0x00007f80331c09ce No /lib/x86_64-linux-gnu/libdl.so.2 0x00007f8032f0fa40 0x00007f8032f77c5f No /opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/impala/lib/libstdc++.so.6 0x00007f8032bb4610 0x00007f8032c23056 No /lib/x86_64-linux-gnu/libm.so.6 0x00007f803299ba30 0x00007f80329ab915 No /opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/impala/lib/libgcc_s.so.1 0x00007f80327809f0 0x00007f803278d471 No /lib/x86_64-linux-gnu/libpthread.so.0 0x00007f80323d1520 0x00007f8032519dc3 No /lib/x86_64-linux-gnu/libc.so.6 0x00007f8035702b00 0x00007f803571d660 No /lib64/ld-linux-x86-64.so.2 0x00007f8032107470 0x00007f803215a635 No /usr/lib/x86_64-linux-gnu/libkrb5.so.3 0x00007f8031ebc020 0x00007f8031ed6ab5 No /usr/lib/x86_64-linux-gnu/libk5crypto.so.3 0x00007f8031cb5480 0x00007f8031cb6003 No /lib/x86_64-linux-gnu/libcom_err.so.2 0x00007f8031aab840 0x00007f8031ab04f6 No /usr/lib/x86_64-linux-gnu/libkrb5support.so.0 0x00007f80318a6020 0x00007f80318a6969 No /lib/x86_64-linux-gnu/libkeyutils.so.1 0x00007f803168dad0 0x00007f803169ceb9 No /lib/x86_64-linux-gnu/libresolv.so.2 0x00007f802fc7d2a0 0x00007f802fc840e8 No /usr/lib/jvm/java-8-oracle/jre/lib/amd64/libverify.so 0x00007f802fa5a630 0x00007f802fa70cd8 No /usr/lib/jvm/java-8-oracle/jre/lib/amd64/libjava.so 0x00007f802f8443b0 0x00007f802f84a34e No /lib/x86_64-linux-gnu/libnss_compat.so.2 0x00007f802f62d160 0x00007f802f639ea3 No /lib/x86_64-linux-gnu/libnsl.so.1 0x00007f802f41f1a0 0x00007f802f4256da No /lib/x86_64-linux-gnu/libnss_nis.so.2 0x00007f802f2142a0 0x00007f802f219eb3 No /lib/x86_64-linux-gnu/libnss_files.so.2 0x00007f802eff97f0 0x00007f802f00a7a8 No /usr/lib/jvm/java-8-oracle/jre/lib/amd64/libzip.so (gdb)
So it seems some problem occurs in `libjvm.so`, but I can't see where exactly. I use Impala to work with Kudu,
I use Impala to work with Kudu and run on the following Cloudera parcels distribution:
CDH 5.11.1-1.cdh5.11.1.p0.4
KUDU 1.3.0-1.cdh5.11.1.p0.27
Ubuntu 16.04 LTS (tried on 14.04 and with previous CDH and KUDU minor versions - the same error)
All other CDH components work well.
No changes were done to neither OS/cluster nor Impala/Kudu configuration - it just started to fail one day.
I tried to add a new clean node to the cluster and launch Impalad and Catalogd service there - they also fail with the same error.
UPDATE:
There is a chance that during unattended Ubuntu upgrade some system libs got changed and caused the inconsistency:
Start-Date: 2017-06-20 06:39:18 Commandline: /usr/bin/unattended-upgrade Install: linux-headers-4.8.0-56:amd64 (4.8.0-56.61~16.04.1, automatic), linux-image-4.8.0-56-generic:amd64 (4.8.0-56.61~16.04.1, automatic), linux-headers-4.8.0-56-generic:amd64 (4.8.0-56.61~16.04.1, automatic) Upgrade: linux-headers-virtual-hwe-16.04:amd64 (4.8.0.54.25, 4.8.0.56.27), linux-libc-dev:amd64 (4.4.0-79.100, 4.4.0-81.104), libc6-dev:amd64 (2.23-0ubuntu7, 2.23-0ubuntu9), libc6:amd64 (2.23-0ubuntu7, 2.23-0ubuntu9), locales:amd64 (2.23-0ubuntu7, 2.23-0ubuntu9), linux-image-virtual-hwe-16.04:amd64 (4.8.0.54.25, 4.8.0.56.27), linux-virtual-hwe-16.04:amd64 (4.8.0.54.25, 4.8.0.56.27), libc-bin:amd64 (2.23-0ubuntu7, 2.23-0ubuntu9), libc-dev-bin:amd64 (2.23-0ubuntu7, 2.23-0ubuntu9), multiarch-support:amd64 (2.23-0ubuntu7, 2.23-0ubuntu9), linux-headers-generic-hwe-16.04:amd64 (4.8.0.54.25, 4.8.0.56.27) End-Date: 2017-06-20 06:40:24
Created 06-27-2017 07:34 AM
Seems the same issue as https://community.cloudera.com/t5/Interactive-Short-cycle-SQL/Impala-daemon-fail-to-start-CDH-5-11-1....
SOLVED by one of the following approaches adviced by Tim Armstrong:
You might be running into https://issues.apache.org/jira/browse/IMPALA-5578, which is an issue with the Java Virtual machine (there's one embedded in the Impala daemon) and a Linux kernel update. See that JIRA for details.
You could try downgrading your kernel and restarting to confirm that that is indeed the issues. The suggested workaround to the problem, if confirmed, is to increase the -Xss parameter passed to the JVM.
Created 06-27-2017 07:34 AM
Seems the same issue as https://community.cloudera.com/t5/Interactive-Short-cycle-SQL/Impala-daemon-fail-to-start-CDH-5-11-1....
SOLVED by one of the following approaches adviced by Tim Armstrong:
You might be running into https://issues.apache.org/jira/browse/IMPALA-5578, which is an issue with the Java Virtual machine (there's one embedded in the Impala daemon) and a Linux kernel update. See that JIRA for details.
You could try downgrading your kernel and restarting to confirm that that is indeed the issues. The suggested workaround to the problem, if confirmed, is to increase the -Xss parameter passed to the JVM.
Created 06-27-2017 10:50 AM
Yes, a lot of people have been hitting this after upgrading their kernels! Thank you for following up and confirming that you were able to fix the problem.