Reply
Expert Contributor
Posts: 105
Registered: ‎07-17-2017
Accepted Solution

After upgrading to cdh 5.14.2 Impala daemon stopped suddenly! -

Hi,

After I upgraded the cluster successfully to the last releases CM 5.14.0 / CDH 5.14.2, I have been faced to this problem in 6 of my nodes, suddenly in the first queries the impala deamon get stopped and the query cancelled and give the error messages below:

Impala-shell:

Cancelled due to unreachable impalad(s): node1.example.com:22000

ODBC:

Status: RPC Error: Client for node5.example.com:22000 hit an unexpected exception: Unknown: Interrupted system call, type: N6apache6thrift9transport19TTransportExceptionE, rpc: N6impala19TTransmitDataResultE, send: done

Impala Deamon log file:

 

CancelQueryFInstances query_id= 3423055f3fda78a:a2446bea00000000 failed to connect to node2.example.com:22000 :Couldn't open transport for node2.example.com:22000 (connect() failed: Connection refused)

Statestore log file:

I0413 20:07:01.767758 64122 statestore.cc:729] Unable to send heartbeat message to subscriber impalad@node5.exaple.com:22000, received error: Couldn't open transport for node5.exaple.com:23000 (connect() failed: Connection refused)


When I looking for the issue source I have found this crash message in the Impala Daemon logs:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGILL (0x4) at pc=0x0000000000d863e5, pid=13065, tid=0x00007efc499cf700
#
# JRE version: Java(TM) SE Runtime Environment (8.0_144-b01) (build 1.8.0_144-b01)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.144-b01 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [impalad+0x9863e5]  impala::HdfsScanNodeBase::StopAndFinalizeCounters()+0x965
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /var/run/cloudera-scm-agent/process/13339-impala-IMPALAD/hs_err_pid13065.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#


We have Centos OS v6.9 in the 6 servers, I tried to upgrade/downgrade to a several centos 6.9 kernel releases and jdk versions but no result, Here is the releases used:

Centos 6.9 kernel:
2.6.32-696.23.1.el6.x86_64
2.6.32-696.16.1.el6.x86_64
2.6.32-696.13.2.el6.x86_64
2.6.32-642.15.1.el6.x86_64
2.6.32-642.11.1.el6.x86_64

JDK:
jdk.1.8.0_144
jdk.1.8.0_121


Remark: The 6 nodes are the only nodes that does not support SSE4_2.

Thanks in advance.

Cloudera Employee
Posts: 329
Registered: ‎07-29-2015

Re: After upgrading to cdh 5.14.2 Impala daemon stopped suddenly! -

What version of CDH were you running before the upgrade? Were you running on the same hardware?

 

Can you include the CPU info from your impalad.INFO log. It looks something like this:

I0417 17:05:31.064653  8873 init.cc:237] Cpu Info:
  Model: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
  Cores: 8
  Max Possible Cores: 8
  L1 Cache: 32.00 KB (Line: 64.00 B)
  L2 Cache: 256.00 KB (Line: 64.00 B)
  L3 Cache: 8.00 MB (Line: 64.00 B)
  Hardware Supports:
    ssse3
    sse4_1
    sse4_2
    popcnt
    avx
    avx2
    pclmulqdq
  Numa Nodes: 1
  Numa Nodes of Cores: 0->0 | 1->0 | 2->0 | 3->0 | 4->0 | 5->0 | 6->0 | 7->0 |
Expert Contributor
Posts: 105
Registered: ‎07-17-2017

Re: After upgrading to cdh 5.14.2 Impala daemon stopped suddenly! -

Thanks for the reply Tim
It was CDH 5.12.0 and it was working great on the same servers..
I'll share the CPU info of those nodes ASAS.
Expert Contributor
Posts: 105
Registered: ‎07-17-2017

Re: After upgrading to cdh 5.14.2 Impala daemon stopped suddenly! -

Hi @Tim Armstrong

Here is the CPU info from impalad.INFO :

I0417 20:54:12.845438 13375 init.cc:230] Cpu Info:
  Model: Intel(R) Xeon(R) CPU           E5405  @ 2.00GHz
  Cores: 8
  Max Possible Cores: 8
  L1 Cache: 32.00 KB (Line: 64.00 B)
  L2 Cache: 6.00 MB (Line: 64.00 B)
  L3 Cache: 0 (Line: 0)
  Hardware Supports:
    ssse3
    sse4_1
  Numa Nodes: 1
  Numa Nodes of Cores: 0->0 | 1->0 | 2->0 | 3->0 | 4->0 | 5->0 | 6->0 | 7->0 |




Cloudera Employee
Posts: 329
Registered: ‎07-29-2015

Re: After upgrading to cdh 5.14.2 Impala daemon stopped suddenly! -

Do you have the JVM error dump file?

/var/run/cloudera-scm-agent/process/13339-impala-IMPALAD/hs_err_pid13065.log

 

I filed https://issues.apache.org/jira/browse/IMPALA-6882 to investigate the issue. I took a look at the code and it doesn't look like anything has changed, so probabyl requires deeper investigation.

Expert Contributor
Posts: 105
Registered: ‎07-17-2017

Re: After upgrading to cdh 5.14.2 Impala daemon stopped suddenly! -

Hi @Tim Armstrong

Thank you for you interaction.

Here is the JVM error dump file: https://ufile.io/j0zat
I have formatted 2 servers and resit them to the centos 6.9 (kernel 2.6.32-696.23.1.el6.x86_64) but always the same problem!


I hope we can resolve this bug asap, good luck.

New Contributor
Posts: 1
Registered: ‎06-13-2018

Re: After upgrading to cdh 5.14.2 Impala daemon stopped suddenly! -

Hello,

 

I am running into the same problem on a fresh install of CDH 5.14.3.  According to the ticket that Tim pasted above, the issue is fixed.  Is there a timeline for when this fix will be available for general release?  Is there a workaround for this that one can utilize now? 

Cloudera Employee
Posts: 329
Registered: ‎07-29-2015

Re: After upgrading to cdh 5.14.2 Impala daemon stopped suddenly! -

I expect it will be included in the 5.14.4 maintenance release. I'm not aware of a workaround aside from avoiding running on affected hardware without popcnt support.

Announcements