Support Questions

Find answers, ask questions, and share your expertise

FATAL ha.ZKFailoverController: Unable to start failover controller

avatar
Expert Contributor

When I issue the command

 

sudo -u hdfs hdfs zkfc -formatZK

 

i get the error

 

14/07/24 00:24:34 INFO zookeeper.ClientCnxn: Opening socket connection to server nn1/192.168.1.30:2181. Will not attempt to authenticate using SASL (java.lang.SecurityException: Unable to locate a login configuration)
14/07/24 00:24:34 INFO zookeeper.ClientCnxn: Socket connection established to nn1/192.168.1.30:2181, initiating session
14/07/24 00:24:34 INFO zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
14/07/24 00:24:35 INFO zookeeper.ClientCnxn: Opening socket connection to server nn2/192.168.1.31:2181. Will not attempt to authenticate using SASL (java.lang.SecurityException: Unable to locate a login configuration)
14/07/24 00:24:35 INFO zookeeper.ClientCnxn: Socket connection established to nn2/192.168.1.31:2181, initiating session
14/07/24 00:24:35 INFO zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
14/07/24 00:24:35 INFO zookeeper.ClientCnxn: Opening socket connection to server jt1/192.168.1.32:2181. Will not attempt to authenticate using SASL (java.lang.SecurityException: Unable to locate a login configuration)
14/07/24 00:24:35 INFO zookeeper.ClientCnxn: Socket connection established to jt1/192.168.1.32:2181, initiating session
14/07/24 00:24:35 INFO zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
14/07/24 00:24:37 INFO zookeeper.ClientCnxn: Opening socket connection to server nn1/192.168.1.30:2181. Will not attempt to authenticate using SASL (java.lang.SecurityException: Unable to locate a login configuration)
14/07/24 00:24:37 INFO zookeeper.ClientCnxn: Socket connection established to nn1/192.168.1.30:2181, initiating session
14/07/24 00:24:37 INFO zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
14/07/24 00:24:37 INFO zookeeper.ClientCnxn: Opening socket connection to server nn2/192.168.1.31:2181. Will not attempt to authenticate using SASL (java.lang.SecurityException: Unable to locate a login configuration)
14/07/24 00:24:37 INFO zookeeper.ClientCnxn: Socket connection established to nn2/192.168.1.31:2181, initiating session
14/07/24 00:24:37 INFO zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
14/07/24 00:24:37 INFO zookeeper.ClientCnxn: Opening socket connection to server jt1/192.168.1.32:2181. Will not attempt to authenticate using SASL (java.lang.SecurityException: Unable to locate a login configuration)
14/07/24 00:24:37 INFO zookeeper.ClientCnxn: Socket connection established to jt1/192.168.1.32:2181, initiating session
14/07/24 00:24:37 INFO zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
14/07/24 00:24:39 INFO zookeeper.ClientCnxn: Opening socket connection to server nn1/192.168.1.30:2181. Will not attempt to authenticate using SASL (java.lang.SecurityException: Unable to locate a login configuration)
14/07/24 00:24:39 INFO zookeeper.ClientCnxn: Socket connection established to nn1/192.168.1.30:2181, initiating session
14/07/24 00:24:39 INFO zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
14/07/24 00:24:39 ERROR ha.ActiveStandbyElector: Connection timed out: couldn't connect to ZooKeeper in 5000 milliseconds
14/07/24 00:24:40 INFO zookeeper.ZooKeeper: Session: 0x0 closed
14/07/24 00:24:40 INFO zookeeper.ClientCnxn: EventThread shut down
14/07/24 00:24:40 FATAL ha.ZKFailoverController: Unable to start failover controller. Unable to connect to ZooKeeper quorum at nn1:2181,nn2:2181,jt1:2181. Please check the configured value for ha.zookeeper.quorum and ensure that ZooKeeper is running.

 

 

I have confirmed that the zookeeper service is running on every machine by

 

[root@nn1 ~]# service zookeeper-server start
JMX enabled by default
Using config: /etc/zookeeper/conf/zoo.cfg
Starting zookeeper ... already running as process 1065.

 

I can also do an nc from every machine to every machine

 

[root@nn1 ~]# nc nn1 2181
^C
[root@nn1 ~]# nc nn2 2181
^C
[root@nn1 ~]# nc jt1 2181
^C
[root@nn1 ~]#

 

I can see this in the zookeeper event log

 

2014-07-24 00:24:18,706 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@774] - Notification time out: 60000
2014-07-24 00:24:34,956 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /192.168.1.30:35151
2014-07-24 00:24:34,956 [myid:1] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
2014-07-24 00:24:34,956 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /192.168.1.30:35151 (no session established for client)
2014-07-24 00:24:37,075 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /192.168.1.30:35154
2014-07-24 00:24:37,076 [myid:1] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
2014-07-24 00:24:37,076 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /192.168.1.30:35154 (no session established for client)
2014-07-24 00:24:39,432 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /192.168.1.30:35157
2014-07-24 00:24:39,433 [myid:1] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
2014-07-24 00:24:39,433 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /192.168.1.30:35157 (no session established for client)
2014-07-24 00:25:18,709 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@190] - Have smaller server identifier, so dropping the connection: (2, 1)
2014-07-24 00:25:18,710 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@190] - Have smaller server identifier, so dropping the connection: (3, 1)
2014-07-24 00:25:18,711 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@774] - Notification time out: 60000
2014-07-24 00:26:18,713 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@190] - Have smaller server identifier, so dropping the connection: (2, 1)
2014-07-24 00:26:18,715 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@190] - Have smaller server identifier, so dropping the connection: (3, 1)
2014-07-24 00:26:18,716 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@774] - Notification time out: 60000
2014-07-24 00:26:40,619 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /192.168.1.30:35170
2014-07-24 00:26:43,508 [myid:1] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x0, likely client has closed socket
  at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
  at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
  at java.lang.Thread.run(Thread.java:662)
2014-07-24 00:26:43,511 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /192.168.1.30:35170 (no session established for client)
2014-07-24 00:27:18,717 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@190] - Have smaller server identifier, so dropping the connection: (2, 1)
2014-07-24 00:27:18,719 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@190] - Have smaller server identifier, so dropping the connection: (3, 1)
2014-07-24 00:27:18,719 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@774] - Notification time out: 60000

 

2 ACCEPTED SOLUTIONS

avatar
Contributor

Looks like your zookeeper quorum was not able to elect a master. Maybe you have misconfigured your zookeeper?

 

Make sure that you have entered all 3 servers in your zoo.cfg with a unique ID. Make sure you have the same config on all 3 of your machines and and make sure that every server is using the correct myId as specified in the cfg.

 

BR

Marc

View solution in original post

avatar
Expert Contributor

Thank you so much. Your answer is absolutely correct.

 

I went to each server and did

 

nn1:  service zookeeper-server init --myid=1 --force

nn2:  service zookeeper-server init --myid=2 --force

jt1:  service zookeeper-server init --myid=3 --force

 

earlier I had chosen an ID of 1 on every machine.

 

I also corrected my zoo.cfg. to ensure right entries.

 

Now it works and I am able to do 

 

sudo -u hdfs hdfs zkfc -formatZK

 

Thank you so much!

 

View solution in original post

2 REPLIES 2

avatar
Contributor

Looks like your zookeeper quorum was not able to elect a master. Maybe you have misconfigured your zookeeper?

 

Make sure that you have entered all 3 servers in your zoo.cfg with a unique ID. Make sure you have the same config on all 3 of your machines and and make sure that every server is using the correct myId as specified in the cfg.

 

BR

Marc

avatar
Expert Contributor

Thank you so much. Your answer is absolutely correct.

 

I went to each server and did

 

nn1:  service zookeeper-server init --myid=1 --force

nn2:  service zookeeper-server init --myid=2 --force

jt1:  service zookeeper-server init --myid=3 --force

 

earlier I had chosen an ID of 1 on every machine.

 

I also corrected my zoo.cfg. to ensure right entries.

 

Now it works and I am able to do 

 

sudo -u hdfs hdfs zkfc -formatZK

 

Thank you so much!