Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Impala daemons constantly connecting/disconnecting

avatar
Contributor

Hi everyone, 

 

Hoping someone can help. 

 

I'm getting this error when impalad connects: 

 

RPC error: Client for <hostname>:23000 hits an unexpected exception authorize: cannot authorize peer, type: N6apache6thrift9transport13TSSLException

 

I get a similar error when SSL is disabled. 

 

the connect/disconnect cycle repeats about every 1/2 second forever.

 

The log does include a message that the cert was successfully loaded so I don't think it's a problem with that, but I'm open to any suggestions.

 

Is the type: noted above literally referring to a connection to an HBase thrift server? 

 

Anyone know more about this specific error message? 

 

Thanks!

 

 

1 ACCEPTED SOLUTION

avatar
Contributor

Thanks to everyone who replied. It turns out that references to truststores and server keys, etc., and associated passwords may be cached, so when we changed these after moving the cluster, creating new cerrts and replacing the passwords in CDH was insufficient. 

 

So, after DELETING all fields containing passwords, cert locations, key locations, etc.,unchecking SSL, restarting the cluster, and adding the references back in, everything works. Uugghhh - who knew! 🙂

 

B

View solution in original post

9 REPLIES 9

avatar
Master Collaborator

Hi @bridgor 

 

It looks like you have a firewall issue between your nodes.. try to check iptables.

Please share with us the impalad log files.

 

Good luck.

avatar
Contributor
Copy thanks. I shut down firewalld to test with the same result. Also, these servers are on the same switch.

I'll scrub logs and send what I am able to.

Thanks!

avatar
Contributor

PS - I have to go through some hoops to get the logs out. Hand-typing what I think is relevant takes some time. I'm hoping to have something soon.

 

As a note, we have SSL enabled.

avatar
Expert Contributor

Hi @bridgor , if we were just to take the N6apache6thrift9transport13TSSLException originally posted, this typically ends in a hostname mistmatch in the .pem files used / DNS problems.   Check statestore logs / web UI for issues as well.   I agree with @AcharkiMed that this seems to be network / SSL related.



Robert Justice, Technical Resolution Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

Terms of Service

avatar
Contributor
Copy thanks

avatar
Cloudera Employee

Hi @bridgor ,

 

You may want to check out this thread: https://community.cloudera.com/t5/Interactive-Short-cycle-SQL/Impala-Catalogue-server-down-after-upg...,

 

I don't know what Impala version you are on (may have missed it), so I can't decide it the thread is directly applicable to your problem, but it looks remarkably similar.

avatar
Contributor

Copy thanks. It is close. We are now using FQDN and the errors are s little different now. So, here is an additional question>

DNS us not running here so when we use the FQDN, is any part of CDH attempting to resolve the FQDN using DNS? Or is the FQDN for some other purpose.

 

Our install was generally working better when we were using only hostnames. FQDN seems to have complicated the issue - we made up a generic but consistent FQDN. Is Cloudera looking for the domain and having problems when it can't find it?

 

Hoping to understand how the FQDN is actually being used as this represents a shift in the original problem statement.

 

Thanks!

avatar
Expert Contributor

I would say this depends on if your hostname is set FQDN at time of CM/cluster install, if your hostname is set as fully qualified in the OS by the hostname command, or if you are still using TLS/SSL your certificates have the FQDN hostname.   Kerberos and several other things are very sensitive to DNS.   Run the hosts inspector under CM to check for DNS / host resolution problems.

 

Since you state you are not using DNS, what I would suggest is making sure your /etc/hosts file on all hosts contains all hosts of the cluster and CM, and are set as fully qualified hostname first, then aliased to the short hostname.  You can use rsync to keep this file accurate across the cluster.   Also make sure /etc/nsswitch.conf has files first for the hosts: line, and /etc/hosts will get used first.   Finally, if you supect the internal hostname was changed away from FQDN to short, either change it back or follow the following article to get CM configuration back in sync with what it was at time of install (Check the Name column beneath CM hosts tab to see what CM has in it's database):

 

https://www.cloudera.com/documentation/enterprise/latest/topics/cm_ag_change_hostnames.html

 

[root@cm1 ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.2.3 cm1.rgjustice.com cm1
192.168.2.4 node1.rgjustice.com node1
192.168.2.5 node2.rgjustice.com node2

 

[root@cm1 ~]# cat /etc/nsswitch.conf |grep hosts
#hosts: db files nisplus nis dns
hosts: files dns myhostname



Robert Justice, Technical Resolution Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

Terms of Service

avatar
Contributor

Thanks to everyone who replied. It turns out that references to truststores and server keys, etc., and associated passwords may be cached, so when we changed these after moving the cluster, creating new cerrts and replacing the passwords in CDH was insufficient. 

 

So, after DELETING all fields containing passwords, cert locations, key locations, etc.,unchecking SSL, restarting the cluster, and adding the references back in, everything works. Uugghhh - who knew! 🙂

 

B