Support Questions
Find answers, ask questions, and share your expertise

zkfc fails to start as it fails to execute command "hdfs zkfc -formatZK -nonInteractive"

I am trying to start zkfc from ambari but it is failing while executing the command: hdfs zkfc -formatZK -nonInteractive . All 3 Zookeeper servers are running state as per Ambari dashboard. When I checked for the port 2181, it is running. But at the same time when i tried with :

telnet abctestlab0512.bdaas.com 2181
Trying x.x.x.x...
Connected to abctestlab0512.bdaas.com
Escape character is '^]'.
Connection closed by foreign host.

It seems like port is getting closed.

Check the following ZKFC logs:

17/09/18 07:36:34 INFO zookeeper.ZooKeeper: Client environment:java.library.path=:/usr/hdp/2.4.3.0-227/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.4.3.0-227/hadoop/lib/native
17/09/18 07:36:34 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
17/09/18 07:36:34 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
17/09/18 07:36:34 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
17/09/18 07:36:34 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
17/09/18 07:36:34 INFO zookeeper.ZooKeeper: Client environment:os.version=3.10.0-514.16.1.el7.x86_64
17/09/18 07:36:34 INFO zookeeper.ZooKeeper: Client environment:user.name=root
17/09/18 07:36:34 INFO zookeeper.ZooKeeper: Client environment:user.home=/root
17/09/18 07:36:34 INFO zookeeper.ZooKeeper: Client environment:user.dir=/var/log/hadoop/hdp44-hdfs
17/09/18 07:36:34 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=abctestlab0512.bdaas.com:2181,abctestlab0513.bdaas.com:2181,abctestlab0515.bdaas.com:2181 sessionTimeout=5000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@610f7aa
17/09/18 07:36:34 INFO zookeeper.ClientCnxn: Opening socket connection to server abctestlab0515.bdaas.com/192.120.10.24:2181. Will not attempt to authenticate using SASL (unknown error)
17/09/18 07:36:34 INFO zookeeper.ClientCnxn: Socket connection established to abctestlab0515.bdaas.com/192.120.10.24:2181, initiating session
17/09/18 07:36:34 INFO zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
17/09/18 07:36:34 INFO zookeeper.ClientCnxn: Opening socket connection to server abctestlab0513.bdaas.com/192.120.10.22:2181. Will not attempt to authenticate using SASL (unknown error)
17/09/18 07:36:34 INFO zookeeper.ClientCnxn: Socket connection established to abctestlab0513.bdaas.com/192.120.10.22:2181, initiating session
17/09/18 07:36:34 INFO zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
17/09/18 07:36:35 INFO zookeeper.ClientCnxn: Opening socket connection to server abctestlab0512.bdaas.com/192.120.10.21:2181. Will not attempt to authenticate using SASL (unknown error)
17/09/18 07:36:35 INFO zookeeper.ClientCnxn: Socket connection established to abctestlab0512.bdaas.com/192.120.10.21:2181, initiating session
17/09/18 07:36:35 INFO zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
17/09/18 07:36:37 INFO zookeeper.ClientCnxn: Opening socket connection to server abctestlab0515.bdaas.com/192.120.10.24:2181. Will not attempt to authenticate using SASL (unknown error)
17/09/18 07:36:37 INFO zookeeper.ClientCnxn: Socket connection established to abctestlab0515.bdaas.com/192.120.10.24:2181, initiating session
17/09/18 07:36:37 INFO zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
17/09/18 07:36:38 INFO zookeeper.ClientCnxn: Opening socket connection to server abctestlab0513.bdaas.com/192.120.10.22:2181. Will not attempt to authenticate using SASL (unknown error)
17/09/18 07:36:38 INFO zookeeper.ClientCnxn: Socket connection established to abctestlab0513.bdaas.com/192.120.10.22:2181, initiating session
17/09/18 07:36:38 INFO zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
17/09/18 07:36:38 INFO zookeeper.ClientCnxn: Opening socket connection to server abctestlab0512.bdaas.com/192.120.10.21:2181. Will not attempt to authenticate using SASL (unknown error)
17/09/18 07:36:38 INFO zookeeper.ClientCnxn: Socket connection established to abctestlab0512.bdaas.com/192.120.10.21:2181, initiating session
17/09/18 07:36:38 INFO zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
17/09/18 07:36:39 ERROR ha.ActiveStandbyElector: Connection timed out: couldn't connect to ZooKeeper in 5000 milliseconds
17/09/18 07:36:39 INFO zookeeper.ZooKeeper: Session: 0x0 closed
17/09/18 07:36:39 FATAL ha.ZKFailoverController: Unable to start failover controller. Unable to connect to ZooKeeper quorum at abctestlab0512.bdaas.com:2181,abctestlab0513.bdaas.com:2181,abctestlab0515.bdaas.com:2181. Please check the configured value for ha.zookeeper.quorum and ensure that ZooKeeper is running.
17/09/18 07:36:39 INFO zookeeper.ClientCnxn: EventThread shut down
17/09/18 07:36:39 INFO tools.DFSZKFailoverController: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DFSZKFailoverController at abctestlab0512.bdaas.com/192.120.10.21
************************************************************/


1 REPLY 1

Rising Star

@Ajit Sonawane, can you check your zookeeper servers (abctestlab0512, abctestlab0513, abctestlab0515) and logs to see if they are running properly not overloaded?

; ;