Created 03-06-2018 07:46 PM
I am trying to bring up a Hortonworks cluster.
Below are the services in the cluster that I am trying to install
Zookeeper
Ambari metrics
HDFS
YARN
MR2
Out the above services I was able to bring up the Zookeeper and Ambari metrics services. But the other services(HDFS, YARN and MR2) are not coming up. Namenode is also not coming up. I am trying to install the cluster in 3 nodes which is HA as well. When I checked the HDFS alerts one of the critical alert was that Zookeeper Failover Controller hasn't been started. After googling I tried to format it using the command hdfs zkfc -formatZK -nonInteractive but getting same error as I am getting the Ambari UI. My feeling is that ZKFC startup is causing the other hadoop services not to start.
Below is the error message from the Zookeeper logs
2018-03-06 13:34:20,580 - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@383] - Cannot open channel to 3 at election address Host2/ip3-host4:3888 java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:795)
Attaching the below items
I have been struck with this for the past 2 days. I tried uninstalling and reinstalling the cluster 2 times but still getting the same error. Any inputs would be appreciated.
Created 03-07-2018 01:46 AM
Thanks @Geoffrey Shelton Okot The issue has been resolved. Again I came to know the importance of /etc/hosts file. It's not the firewall that was blocking the connection rather the process was spawned internal to the instance - meaning none of the other instance could access the process. Zookeeper process looks for the ip address from /etc/hosts file and spawns the process, instead of fetching the ip address it took the loopback address(127.0.0.1) which made sure the outside world cannot access the process. Followed the thread to resolve the issue MeaningOfIPaddressinProcess
Created 03-06-2018 11:02 PM
Zookeeper is not running on these 2 hosts
Cannot open channel to 2 at election addressHost2/10.23.152.247:3888java.net.ConnectException:Connection refused Cannot open channel to 3 at election addressHost2/10.23.152.159:3888java.net.ConnectException:Connection refused
Can you manually start by running the below command on all the zookeeper hosts
su - zookeeper -c "/usr/hdp/current/zookeeper-server/bin/zookeeper-server start"Once the zookeepers are up the start the other components
Created 03-06-2018 11:10 PM
@Geoffrey Shelton Okot Thanks for the response. Zookeeper is running on these ports zookeeper-server1.pngzookeeper-server2.png. Attaching the process screenshots. I am not able to telnet to that port as well from the node where we are seeing the error like telnet host1/host2 3888. Can it be due to the fact that fire wall has been set? But I am able to telnet to the port 2181 - I thought 2181 is the default zookeeper port. Please confirm?
Created 03-06-2018 11:28 PM
One of the pre-requisites for an HDP cluster setup is to disable the firewall. See this hortonworks official documentation
You can temporary clear all iptables rules so that you can troubleshoot problem. If you are using Red Hat or Fedora Linux type command:
# /etc/init.d/iptables save # /etc/init.d/iptablesstop If you are using other Linux distribution type following commands:
# iptables -F # iptables -X # iptables -t nat -F # iptables -t nat -X # iptables -t mangle -F
Please revert
Created 03-07-2018 12:07 AM
@Geoffrey Shelton Okot We have disabled firewall already for all the hosts in the cluster. Also the port for which we are getting connection refused is the one which has the process running internal to the instance - meaning only localhost can access that process. Not sure why we are getting connection refused for a process that is running internal to an instance. Attached the screenshot where the process is internal to 127.0.1.1. Any inputs would be appreciated?
Created 03-07-2018 01:46 AM
Thanks @Geoffrey Shelton Okot The issue has been resolved. Again I came to know the importance of /etc/hosts file. It's not the firewall that was blocking the connection rather the process was spawned internal to the instance - meaning none of the other instance could access the process. Zookeeper process looks for the ip address from /etc/hosts file and spawns the process, instead of fetching the ip address it took the loopback address(127.0.0.1) which made sure the outside world cannot access the process. Followed the thread to resolve the issue MeaningOfIPaddressinProcess