Support Questions

HeathG · ‎09-06-2021

We are still running an old version of Clouder, v 5.15.0 (almost at the end of work upgrading) and a few hours ago users reported queries in Hue were not running. CM reported a number of issues so we decided to restart the entire cluster.

Zookeeper will not start, the error seen is below. There is a timeout issue that we've not been able to solve. There were no changes made at the time.

7:49:47.620 AM INFO QuorumPeerConfig
Reading configuration from: /run/cloudera-scm-agent/process/20394-zookeeper-server/zoo.cfg
7:49:47.631 AM INFO QuorumPeerConfig
Defaulting to majority quorums
7:49:47.634 AM INFO DatadirCleanupManager
autopurge.snapRetainCount set to 3
7:49:47.634 AM INFO DatadirCleanupManager
autopurge.purgeInterval set to 24
7:49:47.635 AM INFO DatadirCleanupManager
Purge task started.
7:49:47.643 AM INFO DatadirCleanupManager
Purge task completed.
7:49:47.646 AM INFO QuorumPeerMain
Starting quorum peer
7:49:50.747 AM ERROR QuorumPeerMain
Unexpected exception, exiting abnormally
java.io.IOException: Could not configure server because SASL configuration did not allow the ZooKeeper server to authenticate itself properly: javax.security.auth.login.LoginException: connect timed out
at org.apache.zookeeper.server.ServerCnxnFactory.configureSaslLogin(ServerCnxnFactory.java:207)
at org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:87)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:135)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:79)

Hoping someone will have an idea we can try.

HeathG · ‎09-07-2021

Hi @Scharan there were no changes in the cluster at all. It turned out to be a networking issue, some changes were made to a heap of non prod networks but it affected our prod network. Once they were rolled back everything started up OK.

View solution in original post

Scharan · ‎09-07-2021

@HeathG Can you confirm were there any changes made on the cluster

If your cluster is kerberized can you try increasing kdc_timeout value in /etc/krb5.conf and they try restarting the zookeeper

HeathG · ‎09-07-2021

Hi @Scharan there were no changes in the cluster at all. It turned out to be a networking issue, some changes were made to a heap of non prod networks but it affected our prod network. Once they were rolled back everything started up OK.

cjervis · ‎09-07-2021

@HeathG I'm happy to see you resolved your issue. Please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.

Cy Jervis, Manager, Community Program
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Cloudera Community

Support Questions

Unable to start Zookeeper - LoginException: connect timed out