Reply
Contributor
Posts: 56
Registered: ‎07-15-2014
Accepted Solution

FATAL ha.ZKFailoverController: Unable to start failover controller

When I issue the command

 

sudo -u hdfs hdfs zkfc -formatZK

 

i get the error

 

14/07/24 00:24:34 INFO zookeeper.ClientCnxn: Opening socket connection to server nn1/192.168.1.30:2181. Will not attempt to authenticate using SASL (java.lang.SecurityException: Unable to locate a login configuration)
14/07/24 00:24:34 INFO zookeeper.ClientCnxn: Socket connection established to nn1/192.168.1.30:2181, initiating session
14/07/24 00:24:34 INFO zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
14/07/24 00:24:35 INFO zookeeper.ClientCnxn: Opening socket connection to server nn2/192.168.1.31:2181. Will not attempt to authenticate using SASL (java.lang.SecurityException: Unable to locate a login configuration)
14/07/24 00:24:35 INFO zookeeper.ClientCnxn: Socket connection established to nn2/192.168.1.31:2181, initiating session
14/07/24 00:24:35 INFO zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
14/07/24 00:24:35 INFO zookeeper.ClientCnxn: Opening socket connection to server jt1/192.168.1.32:2181. Will not attempt to authenticate using SASL (java.lang.SecurityException: Unable to locate a login configuration)
14/07/24 00:24:35 INFO zookeeper.ClientCnxn: Socket connection established to jt1/192.168.1.32:2181, initiating session
14/07/24 00:24:35 INFO zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
14/07/24 00:24:37 INFO zookeeper.ClientCnxn: Opening socket connection to server nn1/192.168.1.30:2181. Will not attempt to authenticate using SASL (java.lang.SecurityException: Unable to locate a login configuration)
14/07/24 00:24:37 INFO zookeeper.ClientCnxn: Socket connection established to nn1/192.168.1.30:2181, initiating session
14/07/24 00:24:37 INFO zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
14/07/24 00:24:37 INFO zookeeper.ClientCnxn: Opening socket connection to server nn2/192.168.1.31:2181. Will not attempt to authenticate using SASL (java.lang.SecurityException: Unable to locate a login configuration)
14/07/24 00:24:37 INFO zookeeper.ClientCnxn: Socket connection established to nn2/192.168.1.31:2181, initiating session
14/07/24 00:24:37 INFO zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
14/07/24 00:24:37 INFO zookeeper.ClientCnxn: Opening socket connection to server jt1/192.168.1.32:2181. Will not attempt to authenticate using SASL (java.lang.SecurityException: Unable to locate a login configuration)
14/07/24 00:24:37 INFO zookeeper.ClientCnxn: Socket connection established to jt1/192.168.1.32:2181, initiating session
14/07/24 00:24:37 INFO zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
14/07/24 00:24:39 INFO zookeeper.ClientCnxn: Opening socket connection to server nn1/192.168.1.30:2181. Will not attempt to authenticate using SASL (java.lang.SecurityException: Unable to locate a login configuration)
14/07/24 00:24:39 INFO zookeeper.ClientCnxn: Socket connection established to nn1/192.168.1.30:2181, initiating session
14/07/24 00:24:39 INFO zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
14/07/24 00:24:39 ERROR ha.ActiveStandbyElector: Connection timed out: couldn't connect to ZooKeeper in 5000 milliseconds
14/07/24 00:24:40 INFO zookeeper.ZooKeeper: Session: 0x0 closed
14/07/24 00:24:40 INFO zookeeper.ClientCnxn: EventThread shut down
14/07/24 00:24:40 FATAL ha.ZKFailoverController: Unable to start failover controller. Unable to connect to ZooKeeper quorum at nn1:2181,nn2:2181,jt1:2181. Please check the configured value for ha.zookeeper.quorum and ensure that ZooKeeper is running.

 

 

I have confirmed that the zookeeper service is running on every machine by

 

[root@nn1 ~]# service zookeeper-server start
JMX enabled by default
Using config: /etc/zookeeper/conf/zoo.cfg
Starting zookeeper ... already running as process 1065.

 

I can also do an nc from every machine to every machine

 

[root@nn1 ~]# nc nn1 2181
^C
[root@nn1 ~]# nc nn2 2181
^C
[root@nn1 ~]# nc jt1 2181
^C
[root@nn1 ~]#

 

I can see this in the zookeeper event log

 

2014-07-24 00:24:18,706 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@774] - Notification time out: 60000
2014-07-24 00:24:34,956 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /192.168.1.30:35151
2014-07-24 00:24:34,956 [myid:1] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
2014-07-24 00:24:34,956 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /192.168.1.30:35151 (no session established for client)
2014-07-24 00:24:37,075 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /192.168.1.30:35154
2014-07-24 00:24:37,076 [myid:1] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
2014-07-24 00:24:37,076 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /192.168.1.30:35154 (no session established for client)
2014-07-24 00:24:39,432 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /192.168.1.30:35157
2014-07-24 00:24:39,433 [myid:1] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
2014-07-24 00:24:39,433 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /192.168.1.30:35157 (no session established for client)
2014-07-24 00:25:18,709 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@190] - Have smaller server identifier, so dropping the connection: (2, 1)
2014-07-24 00:25:18,710 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@190] - Have smaller server identifier, so dropping the connection: (3, 1)
2014-07-24 00:25:18,711 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@774] - Notification time out: 60000
2014-07-24 00:26:18,713 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@190] - Have smaller server identifier, so dropping the connection: (2, 1)
2014-07-24 00:26:18,715 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@190] - Have smaller server identifier, so dropping the connection: (3, 1)
2014-07-24 00:26:18,716 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@774] - Notification time out: 60000
2014-07-24 00:26:40,619 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /192.168.1.30:35170
2014-07-24 00:26:43,508 [myid:1] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x0, likely client has closed socket
  at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
  at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
  at java.lang.Thread.run(Thread.java:662)
2014-07-24 00:26:43,511 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /192.168.1.30:35170 (no session established for client)
2014-07-24 00:27:18,717 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@190] - Have smaller server identifier, so dropping the connection: (2, 1)
2014-07-24 00:27:18,719 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@190] - Have smaller server identifier, so dropping the connection: (3, 1)
2014-07-24 00:27:18,719 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@774] - Notification time out: 60000

 

Explorer
Posts: 12
Registered: ‎01-18-2014

Re: FATAL ha.ZKFailoverController: Unable to start failover controller

Looks like your zookeeper quorum was not able to elect a master. Maybe you have misconfigured your zookeeper?

 

Make sure that you have entered all 3 servers in your zoo.cfg with a unique ID. Make sure you have the same config on all 3 of your machines and and make sure that every server is using the correct myId as specified in the cfg.

 

BR

Marc

Highlighted
Contributor
Posts: 56
Registered: ‎07-15-2014

Re: FATAL ha.ZKFailoverController: Unable to start failover controller

Thank you so much. Your answer is absolutely correct.

 

I went to each server and did

 

nn1:  service zookeeper-server init --myid=1 --force

nn2:  service zookeeper-server init --myid=2 --force

jt1:  service zookeeper-server init --myid=3 --force

 

earlier I had chosen an ID of 1 on every machine.

 

I also corrected my zoo.cfg. to ensure right entries.

 

Now it works and I am able to do 

 

sudo -u hdfs hdfs zkfc -formatZK

 

Thank you so much!

 

Announcements