Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Zookeeper FailOver Controller failed to start

Zookeeper FailOver Controller failed to start

New Contributor

zkFailoverController is getting stopped automatically after restarting it from ambari/command prompt. Below is the log:

All the required configuration like dfs.ha.automatic-failover.enabled and ha.zookeeper.quorum are set

2016-07-27 14:15:08,334 INFO  zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:java.library.path=:/usr/hdp/2.3.4.0-3485/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.4.0-3485/hadoop/lib/native:/usr/hdp/2.3.4.0-3485/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.4.0-3485/hadoop/lib/native
2016-07-27 14:15:08,335 INFO  zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
2016-07-27 14:15:08,335 INFO  zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
2016-07-27 14:15:08,335 INFO  zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:os.name=Linux
2016-07-27 14:15:08,335 INFO  zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:os.arch=amd64
2016-07-27 14:15:08,335 INFO  zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:os.version=2.6.32-573.8.1.el6.x86_64
2016-07-27 14:15:08,335 INFO  zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:user.name=hdfs
2016-07-27 14:15:08,337 INFO  zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:user.home=/home/hdfs
2016-07-27 14:15:08,337 INFO  zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:user.dir=/usr/hdp/2.3.4.0-3485/hadoop
2016-07-27 14:15:08,338 INFO  zookeeper.ZooKeeper (ZooKeeper.java:<init>(438)) - Initiating client connection, connectString=node3.exp.dinesh.snx.io:2181,node5.exp.dinesh.snx.io:2181,node6.exp.dinesh.snx.io:2181 sessionTimeout=5000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@2fbb21e
2016-07-27 14:15:08,359 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(1019)) - Opening socket connection to server node3.exp.dinesh.snx.io/127.0.0.249:2181. Will not attempt to authenticate using SASL (unknown error)
2016-07-27 14:15:08,369 INFO  zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(864)) - Socket connection established to node3.exp.dinesh.snx.io/127.0.0.249:2181, initiating session
2016-07-27 14:15:08,370 INFO  zookeeper.ClientCnxn (ClientCnxn.java:run(1142)) - Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
2016-07-27 14:15:09,417 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(1019)) - Opening socket connection to server node5.exp.dinesh.snx.io/127.0.0.7:2181. Will not attempt to authenticate using SASL (unknown error)
2016-07-27 14:15:09,418 INFO  zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(864)) - Socket connection established to node5.exp.dinesh.snx.io/127.0.0.7:2181, initiating session
2016-07-27 14:15:11,087 INFO  zookeeper.ClientCnxn (ClientCnxn.java:run(1140)) - Client session timed out, have not heard from server in 1669ms for sessionid 0x0, closing socket connection and attempting reconnect
2016-07-27 14:15:11,442 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(1019)) - Opening socket connection to server node6.exp.dinesh.snx.io/127.0.0.8:2181. Will not attempt to authenticate using SASL (unknown error)
2016-07-27 14:15:11,443 INFO  zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(864)) - Socket connection established to node6.exp.dinesh.snx.io/127.0.0.8:2181, initiating session
2016-07-27 14:15:11,443 INFO  zookeeper.ClientCnxn (ClientCnxn.java:run(1142)) - Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
2016-07-27 14:15:12,643 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(1019)) - Opening socket connection to server node3.exp.dinesh.snx.io/127.0.0.249:2181. Will not attempt to authenticate using SASL (unknown error)
2016-07-27 14:15:12,644 INFO  zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(864)) - Socket connection established to node3.exp.dinesh.snx.io/127.0.0.249:2181, initiating session
2016-07-27 14:15:12,644 INFO  zookeeper.ClientCnxn (ClientCnxn.java:run(1142)) - Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
2016-07-27 14:15:13,355 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(1019)) - Opening socket connection to server node5.exp.dinesh.snx.io/127.0.0.7:2181. Will not attempt to authenticate using SASL (unknown error)
2016-07-27 14:15:13,356 ERROR ha.ActiveStandbyElector (ActiveStandbyElector.java:waitForZKConnectionEvent(1104)) - Connection timed out: couldn't connect to ZooKeeper in 5000 milliseconds
2016-07-27 14:15:13,356 INFO  zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(864)) - Socket connection established to node5.exp.dinesh.snx.io/127.0.0.7:2181, initiating session
2016-07-27 14:15:15,126 INFO  zookeeper.ZooKeeper (ZooKeeper.java:close(684)) - Session: 0x0 closed
2016-07-27 14:15:15,126 INFO  zookeeper.ClientCnxn (ClientCnxn.java:run(524)) - EventThread shut down
2016-07-27 14:15:15,132 FATAL ha.ZKFailoverController (ZKFailoverController.java:doRun(193)) - Unable to start failover controller. Unable to connect to ZooKeeper quorum at zk3.example.com:2181,zk5.example.com:2181,zk6.example.com:2181. Please check the configured value for ha.zookeeper.quorum and ensure that ZooKeeper is running.
2016-07-27 14:15:15,135 INFO  tools.DFSZKFailoverController (LogAdapter.java:info(45)) - SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DFSZKFailoverController at node4.exp.dinesh.snx.io/127.0.0.6
***********************************************************

Thanks

5 REPLIES 5

Re: Zookeeper FailOver Controller failed to start

Mentor

@Dinesh E

Can you copy and paste the contents of your zookeeper configuration zoo.cfg?

Re: Zookeeper FailOver Controller failed to start

New Contributor

Here is the configuration file

clientPort=2181
initLimit=10
autopurge.purgeInterval=24
syncLimit=5
tickTime=2000
dataDir=/disk1/hadoop/zookeeper
autopurge.snapRetainCount=30
server.1=node3.exp.dinesh.snx.io:2888:3888
server.2=node5.exp.dinesh.snx.io:2888:3888
server.3=node6.exp.dinesh.snx.io:2888:3888

Re: Zookeeper FailOver Controller failed to start

Mentor

@Dinesh E

Sorry I was away for some issues. I think you have an issue with the zookeeper configuration .I see in the logs

unable to start failover controller.

Unable to connect to ZooKeeper quorum at

zk3.example.com:2181, 
zk5.example.com:2181, 
zk6.example.com:2181. 

The above assembly is an example used in here . I just tested a couple of hours ago using ambari and the setup was successful .

My question is how did or are you activating the the HA ?

Re: Zookeeper FailOver Controller failed to start

Mentor

@Dinesh E

Just to clarify your log above shows you are using a wrote zookeeper config

  1. zk3.example.com:2181,
  2. zk5.example.com:2181,
  3. zk6.example.com:2181.

Please rectify that

6532-hcc-dinesh.png


hcc-dinesh.png

Re: Zookeeper FailOver Controller failed to start

Super Guru
@Dinesh E

Please check if zookeeper servers are running or its down. if its running make sure that you are able to connect to them from ZKFC. Try some basic networking command first ( like telnet or nc on 2181 )

Also if zookeeper servers are running fine, check if anything suspicious in zookeeper.out on ZKs ( /var/log/zookeeper/zookeeper.out)

Don't have an account?
Coming from Hortonworks? Activate your account here