We already shared exception from Impala logs of affected node. But "ThriftServer" log came about 90 minutes later when node connection was lost. There is no other error/exception in impala log.
Please let me know if there is some other log we can check into. This event happened multiple times. In one node the service is down but we still see process running. See attachment.
Even kill -9 pid is not honoring.
Also when one node is impacted my application running on Weblogic 11g looses all connection and we had to restart application servers for getting connection back. This is becoming annoying.
Please let me know what we can do to get to root cause of problem.
E0914 10:58:10.457620 94112 logging.cc:121] stderr will be logged to this file.
W0914 10:58:10.467237 94112 authentication.cc:1003] LDAP authentication is being used with TLS, but without an --ldap_ca_certificate file, the identity of the LDAP server cannot be verified. Network communication (and hence passwords) could be intercepted by a man-in-the-middle attack
E0914 10:58:13.220167 94268 thrift-server.cc:182] ThriftServer 'backend' (on port: 22000) exited due to TException: Could not bind: Transport endpoint is not connected
E0914 10:58:13.220221 94112 thrift-server.cc:171] ThriftServer 'backend' (on port: 22000) did not start correctly
F0914 10:58:13.221709 94112 impalad-main.cc:89] ThriftServer 'backend' (on port: 22000) did not start correctly
. Impalad exiting.
Can you help me?
there is a possibility that your daemon process is hung
find it and kill it
$> ps -eaf|grep impala
impala 4399 1 0 Aug17 ? 00:00:00 python2.7 /usr/lib64/cmf/agent/build/env/bin/cmf-redactor /usr/lib64/cmf/service/impala/impala.sh impalad impalad_flags false
clouder+ 8426 5709 0 16:34 pts/0 00:00:00 grep --color=auto impala
impala 12322 1 0 Aug17 ? 00:00:00 /opt/cloudera/parcels/CDH-5.10.1-1.cdh5.10.1.p0.10/lib/impala/sbin-retail/impalad --flagfile=/run/cloudera-scm-agent/process/3639-impala-IMPALAD/impala-conf/impalad_flags
$>sudo kill -9 12322
$>sudo kill -9 4399
try to restart your daemon
and check the port
$> sudo netstat -lntp|grep 22000