Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hi, am unable to install HDP via Ambari. Zookeeper server is not running throwing the following error in /var/log/zookeeper/zookeeper-root-server-zookeeper-1.out. I am running it in aws env.

avatar
Explorer

/var/log/zookeeper/zookeeper-root-server-zookeeper-1.out:

2016-06-14 06:38:38,357 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /54.206.194.70:49731 (no session established for client) 2016-06-14 06:38:38,664 - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@383] - Cannot open channel to 2 at election address zookeeper-2/54.253.26.67:3888 java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:795) 2016-06-14 06:38:38,665 - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@383] - Cannot open channel to 3 at election address zookeeper-3/54.66.23.197:3888 java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:795) 2016-06-14 06:38:38,666 - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - Notification time out: 60000 2016-06-14 06:38:40,476 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /54.206.194.70:49734 2016-06-14 06:38:40,477 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running ======================================================================================= Config files: /usr/hdp/current/zookeeper-server/conf/zoo.cfg ZOOKEEPER_PID_DIR=/home/zookeeper/zookeeper_server.pid clientPort=2181 syncLimit=5 autopurge.purgeInterval=24 maxClientCnxns=1000 dataDir=/hadoop/zookeeper initLimit=10 zookeeper.znode.parent=/hbase-unsecure tickTime=2000 autopurge.snapRetainCount=30 server.1=zookeeper-1:2888:3888 server.2=zookeeper-2:2888:3888 server.3=zookeeper-3:2888:3888 ====================================================================== But when am checking for 'netstat -tulpn' am not getting anything running in 2888 or 3888 port. Please help!

1 ACCEPTED SOLUTION

avatar

@Payel Datta, these exceptions indicate that for this member of the ZooKeeper ensemble, it cannot connect to port 3888 for the other 2 members of the ensemble.

2016-06-14 06:38:38,664 - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@383] - Cannot open channel to 2 at election address zookeeper-2/54.253.26.67:3888 java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:795)
2016-06-14 06:38:38,665 - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@383] - Cannot open channel to 3 at election address zookeeper-3/54.66.23.197:3888 java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:795)

Port 3888 is used for ZooKeeper's leader election protocol. Without a successful connection, the ensemble cannot successfully elect a leader. Note that the message occurs for the connection to both of the other hosts: zookeeper-2/54.253.26.67 and zookeeper-3/54.66.23.197.

This warning indicates that client connections were rejected, because the ZooKeeper ensemble is not fully initialized. This is expected behavior if an ensemble cannot elect a leader and complete its initialization.

2016-06-14 06:38:40,477 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running 

I recommend reviewing ZooKeeper logs from all 3 nodes in the ensemble to try to find root cause. If netstat reports that there is nothing listening on port 3888 on all 3 nodes, then try looking earlier in the logs to see if there was possibly a bind error when ZooKeeper tried to use port 3888. If nothing is easily found in the logs, try restarting all 3 ZooKeeper processes to get a fresh run. That might make it easier to see what is happening when it tries to bind to port 3888.

View solution in original post

12 REPLIES 12

avatar

@Payel Datta, these exceptions indicate that for this member of the ZooKeeper ensemble, it cannot connect to port 3888 for the other 2 members of the ensemble.

2016-06-14 06:38:38,664 - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@383] - Cannot open channel to 2 at election address zookeeper-2/54.253.26.67:3888 java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:795)
2016-06-14 06:38:38,665 - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@383] - Cannot open channel to 3 at election address zookeeper-3/54.66.23.197:3888 java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:795)

Port 3888 is used for ZooKeeper's leader election protocol. Without a successful connection, the ensemble cannot successfully elect a leader. Note that the message occurs for the connection to both of the other hosts: zookeeper-2/54.253.26.67 and zookeeper-3/54.66.23.197.

This warning indicates that client connections were rejected, because the ZooKeeper ensemble is not fully initialized. This is expected behavior if an ensemble cannot elect a leader and complete its initialization.

2016-06-14 06:38:40,477 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running 

I recommend reviewing ZooKeeper logs from all 3 nodes in the ensemble to try to find root cause. If netstat reports that there is nothing listening on port 3888 on all 3 nodes, then try looking earlier in the logs to see if there was possibly a bind error when ZooKeeper tried to use port 3888. If nothing is easily found in the logs, try restarting all 3 ZooKeeper processes to get a fresh run. That might make it easier to see what is happening when it tries to bind to port 3888.

avatar
Explorer

@Chris Nauroth ,Thanks a lot. I haven't explicitly declared 'leader' in the ensemble when installed via Ambari. Is there any way to do that?

And yes, I got that 'Bind exception' in zookeeper logs earlier.

Should I kill all the zookeeper processes from all 3 nodes and restart?

avatar

@Payel Datta, you won't need to declare the leader explicitly. The ZooKeeper ensemble negotiates a leader node automatically by itself. Do you have more details on that bind exception? Is it possible that something else on the host is already using that port?

avatar
Explorer

@Chris Nauroth

'netstat -tulpn' is not giving me anything running in 3888.

Following is the BindException error am getting.

2016-06-15 06:42:11,260 - INFO [Thread-2:QuorumCnxManager$Listener@506] - My election bind port: zookeeper-2/54.253.26.67:3888 2016-06-15 06:42:11,261 - ERROR [zookeeper-2/54.253.26.67:3888:QuorumCnxManager$Listener@519] - Exception while listening java.net.BindException: Cannot assign requested address at java.net.PlainSocketImpl.socketBind(Native Method) at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:382) at java.net.ServerSocket.bind(ServerSocket.java:375) at java.net.ServerSocket.bind(ServerSocket.java:329) at org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:509) 2016-06-15 06:42:11,307 - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:QuorumPeer@747] - LOOKING 2016-06-15 06:42:11,311 - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@815] - New election. My id = 2, proposed zxid=0x0 2016-06-15 06:42:11,328 - WARN [WorkerSender[myid=2]:QuorumCnxManager@383] - Cannot open channel to 1 at election address zookeeper-1/54.253.62.9:3888 java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368) at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:341) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:449) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:430) at java.lang.Thread.run(Thread.java:745)

avatar

@Payel Datta, thank you for sharing the full stack trace. I expect this will turn out to be some kind of misconfiguration, either of the host network settings or of ZooKeeper's connection configuration. On the ZooKeeper side, I recommend reviewing the zoo.cfg files and the myid files. On each host, the myid file must match up correctly with the address settings in zoo.cfg. For example, on the node with myid=1, look in zoo.cfg for the server.1 settings. Make sure those settings have the correct matching host or IP address. Perhaps the addresses in zoo.cfg do not match correctly with the network interface on the host. If the settings in zoo.cfg refer to a hostname/IP address for which the host does not have a listening network interface, then the bind won't be able to succeed. On the networking side, you might try using basic tools like NetCat to see if it's possible to set up a listening server bound to port 3888. If that succeeds, then it's likely not a host OS networking problem. If it fails though, then that's worth further investigation on the networking side, independent of ZooKeeper.

avatar
Explorer

Hi @Chris Nauroth, checked the zoo.cfg - myid file mapping and that's matching. /etc/hosts is also checked.

In aws config port range of 2888:3888 has been kept opened.

Can permission issue be the culprit? As owner of myid file is root and zoo.cfg is zoo.cfg.

-rw-r--r--. 1 root root 1 May 25 08:15 myid

-rw-r--r-- 1 zookeeper hadoop 1142 Jun 14 06:54 zoo.cfg

avatar
@Payel Datta, I would not expect those permissions on the myid file to be a problem. Even though root is the owner, the permissions still allow read access to everyone. The ZooKeeper process only needs read access to that file.

Have you tried any network troubleshooting with tools like NetCat, like I suggested in the last comment?

avatar
Explorer

Killed all the processes of zookeeper and started with 'sudo ./zkServer.sh start'.

But still no luck!

avatar
Explorer

Hi @Chris Nauroth, following are the opened ports for my zookeeper server.

5184-bdb4d.png

And following is the result for 'netstat -tulpn':

5185-zcqdi.png