Support Questions

Find answers, ask questions, and share your expertise

Zookeeper Failover Controller failed to start

avatar
Contributor

I am trying to bring up a Hortonworks cluster.

Below are the services in the cluster that I am trying to install

Zookeeper

Ambari metrics

HDFS

YARN

MR2

Out the above services I was able to bring up the Zookeeper and Ambari metrics services. But the other services(HDFS, YARN and MR2) are not coming up. Namenode is also not coming up. I am trying to install the cluster in 3 nodes which is HA as well. When I checked the HDFS alerts one of the critical alert was that Zookeeper Failover Controller hasn't been started. After googling I tried to format it using the command hdfs zkfc -formatZK -nonInteractive but getting same error as I am getting the Ambari UI. My feeling is that ZKFC startup is causing the other hadoop services not to start.

Below is the error message from the Zookeeper logs

2018-03-06 13:34:20,580 - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@383] - Cannot open channel to 3 at election address Host2/ip3-host4:3888 java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:795)

Attaching the below items

  • Zookeeper log from the namenode
  • Ambari UI log

I have been struck with this for the past 2 days. I tried uninstalling and reinstalling the cluster 2 times but still getting the same error. Any inputs would be appreciated.

1 ACCEPTED SOLUTION

avatar
Contributor

Thanks @Geoffrey Shelton Okot The issue has been resolved. Again I came to know the importance of /etc/hosts file. It's not the firewall that was blocking the connection rather the process was spawned internal to the instance - meaning none of the other instance could access the process. Zookeeper process looks for the ip address from /etc/hosts file and spawns the process, instead of fetching the ip address it took the loopback address(127.0.0.1) which made sure the outside world cannot access the process. Followed the thread to resolve the issue MeaningOfIPaddressinProcess

View solution in original post

5 REPLIES 5

avatar
Master Mentor

@ajay vembu

Zookeeper is not running on these 2 hosts

Cannot open channel to 2 at election addressHost2/10.23.152.247:3888java.net.ConnectException:Connection refused
Cannot open channel to 3 at election addressHost2/10.23.152.159:3888java.net.ConnectException:Connection refused

Can you manually start by running the below command on all the zookeeper hosts

su - zookeeper -c "/usr/hdp/current/zookeeper-server/bin/zookeeper-server start"
Once the zookeepers are up the start the other components

avatar
Contributor

@Geoffrey Shelton Okot Thanks for the response. Zookeeper is running on these ports zookeeper-server1.pngzookeeper-server2.png. Attaching the process screenshots. I am not able to telnet to that port as well from the node where we are seeing the error like telnet host1/host2 3888. Can it be due to the fact that fire wall has been set? But I am able to telnet to the port 2181 - I thought 2181 is the default zookeeper port. Please confirm?

avatar
Master Mentor

@ajay vembu

One of the pre-requisites for an HDP cluster setup is to disable the firewall. See this hortonworks official documentation

You can temporary clear all iptables rules so that you can troubleshoot problem. If you are using Red Hat or Fedora Linux type command:

# /etc/init.d/iptables save 
# /etc/init.d/iptables 

stop If you are using other Linux distribution type following commands:

# iptables -F 
# iptables -X 
# iptables -t nat -F 
# iptables -t nat -X 
# iptables -t mangle -F  

Please revert

avatar
Contributor

@Geoffrey Shelton Okot We have disabled firewall already for all the hosts in the cluster. Also the port for which we are getting connection refused is the one which has the process running internal to the instance - meaning only localhost can access that process. Not sure why we are getting connection refused for a process that is running internal to an instance. Attached the screenshot where the process is internal to 127.0.1.1. Any inputs would be appreciated?

avatar
Contributor

Thanks @Geoffrey Shelton Okot The issue has been resolved. Again I came to know the importance of /etc/hosts file. It's not the firewall that was blocking the connection rather the process was spawned internal to the instance - meaning none of the other instance could access the process. Zookeeper process looks for the ip address from /etc/hosts file and spawns the process, instead of fetching the ip address it took the loopback address(127.0.0.1) which made sure the outside world cannot access the process. Followed the thread to resolve the issue MeaningOfIPaddressinProcess