Created 07-27-2016 03:19 PM
zkFailoverController is getting stopped automatically after restarting it from ambari/command prompt. Below is the log:
All the required configuration like dfs.ha.automatic-failover.enabled and ha.zookeeper.quorum are set
2016-07-27 14:15:08,334 INFO zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:java.library.path=:/usr/hdp/2.3.4.0-3485/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.4.0-3485/hadoop/lib/native:/usr/hdp/2.3.4.0-3485/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.4.0-3485/hadoop/lib/native 2016-07-27 14:15:08,335 INFO zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp 2016-07-27 14:15:08,335 INFO zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:java.compiler=<NA> 2016-07-27 14:15:08,335 INFO zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:os.name=Linux 2016-07-27 14:15:08,335 INFO zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:os.arch=amd64 2016-07-27 14:15:08,335 INFO zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:os.version=2.6.32-573.8.1.el6.x86_64 2016-07-27 14:15:08,335 INFO zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:user.name=hdfs 2016-07-27 14:15:08,337 INFO zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:user.home=/home/hdfs 2016-07-27 14:15:08,337 INFO zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:user.dir=/usr/hdp/2.3.4.0-3485/hadoop 2016-07-27 14:15:08,338 INFO zookeeper.ZooKeeper (ZooKeeper.java:<init>(438)) - Initiating client connection, connectString=node3.exp.dinesh.snx.io:2181,node5.exp.dinesh.snx.io:2181,node6.exp.dinesh.snx.io:2181 sessionTimeout=5000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@2fbb21e 2016-07-27 14:15:08,359 INFO zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(1019)) - Opening socket connection to server node3.exp.dinesh.snx.io/127.0.0.249:2181. Will not attempt to authenticate using SASL (unknown error) 2016-07-27 14:15:08,369 INFO zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(864)) - Socket connection established to node3.exp.dinesh.snx.io/127.0.0.249:2181, initiating session 2016-07-27 14:15:08,370 INFO zookeeper.ClientCnxn (ClientCnxn.java:run(1142)) - Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect 2016-07-27 14:15:09,417 INFO zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(1019)) - Opening socket connection to server node5.exp.dinesh.snx.io/127.0.0.7:2181. Will not attempt to authenticate using SASL (unknown error) 2016-07-27 14:15:09,418 INFO zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(864)) - Socket connection established to node5.exp.dinesh.snx.io/127.0.0.7:2181, initiating session 2016-07-27 14:15:11,087 INFO zookeeper.ClientCnxn (ClientCnxn.java:run(1140)) - Client session timed out, have not heard from server in 1669ms for sessionid 0x0, closing socket connection and attempting reconnect 2016-07-27 14:15:11,442 INFO zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(1019)) - Opening socket connection to server node6.exp.dinesh.snx.io/127.0.0.8:2181. Will not attempt to authenticate using SASL (unknown error) 2016-07-27 14:15:11,443 INFO zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(864)) - Socket connection established to node6.exp.dinesh.snx.io/127.0.0.8:2181, initiating session 2016-07-27 14:15:11,443 INFO zookeeper.ClientCnxn (ClientCnxn.java:run(1142)) - Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect 2016-07-27 14:15:12,643 INFO zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(1019)) - Opening socket connection to server node3.exp.dinesh.snx.io/127.0.0.249:2181. Will not attempt to authenticate using SASL (unknown error) 2016-07-27 14:15:12,644 INFO zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(864)) - Socket connection established to node3.exp.dinesh.snx.io/127.0.0.249:2181, initiating session 2016-07-27 14:15:12,644 INFO zookeeper.ClientCnxn (ClientCnxn.java:run(1142)) - Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect 2016-07-27 14:15:13,355 INFO zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(1019)) - Opening socket connection to server node5.exp.dinesh.snx.io/127.0.0.7:2181. Will not attempt to authenticate using SASL (unknown error) 2016-07-27 14:15:13,356 ERROR ha.ActiveStandbyElector (ActiveStandbyElector.java:waitForZKConnectionEvent(1104)) - Connection timed out: couldn't connect to ZooKeeper in 5000 milliseconds 2016-07-27 14:15:13,356 INFO zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(864)) - Socket connection established to node5.exp.dinesh.snx.io/127.0.0.7:2181, initiating session 2016-07-27 14:15:15,126 INFO zookeeper.ZooKeeper (ZooKeeper.java:close(684)) - Session: 0x0 closed 2016-07-27 14:15:15,126 INFO zookeeper.ClientCnxn (ClientCnxn.java:run(524)) - EventThread shut down 2016-07-27 14:15:15,132 FATAL ha.ZKFailoverController (ZKFailoverController.java:doRun(193)) - Unable to start failover controller. Unable to connect to ZooKeeper quorum at zk3.example.com:2181,zk5.example.com:2181,zk6.example.com:2181. Please check the configured value for ha.zookeeper.quorum and ensure that ZooKeeper is running. 2016-07-27 14:15:15,135 INFO tools.DFSZKFailoverController (LogAdapter.java:info(45)) - SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down DFSZKFailoverController at node4.exp.dinesh.snx.io/127.0.0.6 ***********************************************************
Thanks
Created 07-27-2016 09:40 PM
Can you copy and paste the contents of your zookeeper configuration zoo.cfg?
Created 07-28-2016 09:37 AM
Here is the configuration file
clientPort=2181 initLimit=10 autopurge.purgeInterval=24 syncLimit=5 tickTime=2000 dataDir=/disk1/hadoop/zookeeper autopurge.snapRetainCount=30 server.1=node3.exp.dinesh.snx.io:2888:3888 server.2=node5.exp.dinesh.snx.io:2888:3888 server.3=node6.exp.dinesh.snx.io:2888:3888
Created 08-09-2016 09:07 PM
Sorry I was away for some issues. I think you have an issue with the zookeeper configuration .I see in the logs
unable to start failover controller.
Unable to connect to ZooKeeper quorum at
zk3.example.com:2181, zk5.example.com:2181, zk6.example.com:2181.
The above assembly is an example used in here . I just tested a couple of hours ago using ambari and the setup was successful .
My question is how did or are you activating the the HA ?
Created on 08-10-2016 09:50 AM - edited 08-18-2019 05:45 AM
Just to clarify your log above shows you are using a wrote zookeeper config
Please rectify that
Created 08-10-2016 04:44 AM
Please check if zookeeper servers are running or its down. if its running make sure that you are able to connect to them from ZKFC. Try some basic networking command first ( like telnet or nc on 2181 )
Also if zookeeper servers are running fine, check if anything suspicious in zookeeper.out on ZKs ( /var/log/zookeeper/zookeeper.out)