Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Why is my Ambari confirm host step failing?

Why is my Ambari confirm host step failing?

New Contributor

I am trying to set up a hadoop cluster and one of the host is failing on the confirm host step. I did follow all the passwordless ssh setup and I am amble to successfully connect to each server from the name node / ambari-server node. I have also verified iptables is off.

Here is what I see in the failed link;

Command start time 2016-08-24 02:42:54 ('ERROR 2016-08-24 02:43:04,384 main.py:315 - Fatal exception occurred: Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/ambari_agent/main.py", line 312, in <module> main(heartbeat_stop_callback) File "/usr/lib/python2.6/site-packages/ambari_agent/main.py", line 248, in main stop_agent() File "/usr/lib/python2.6/site-packages/ambari_agent/main.py", line 195, in stop_agent sys.exit(1) SystemExit: 1 INFO 2016-08-24 02:43:04,385 ExitHelper.py:53 - Performing cleanup before exiting... INFO 2016-08-24 02:43:04,675 main.py:71 - loglevel=logging.INFO INFO 2016-08-24 02:43:04,675 main.py:71 - loglevel=logging.INFO INFO 2016-08-24 02:43:04,676 DataCleaner.py:39 - Data cleanup thread started INFO 2016-08-24 02:43:04,678 DataCleaner.py:120 - Data cleanup started INFO 2016-08-24 02:43:04,679 DataCleaner.py:122 - Data cleanup finished INFO 2016-08-24 02:43:04,703 PingPortListener.py:50 - Ping port listener started on port: 8670 INFO 2016-08-24 02:43:04,704 main.py:289 - Connecting to Ambari server at https://datanode2.localdomain:8440 (192.168.224.135) INFO 2016-08-24 02:43:04,705 NetUtil.py:60 - Connecting to https://datanode2.localdomain:8440/ca WARNING 2016-08-24 02:43:04,705 NetUtil.py:89 - Failed to connect to https://datanode2.localdomain:8440/ca due to [Errno 111] Connection refused WARNING 2016-08-24 02:43:04,705 NetUtil.py:112 - Server at https://datanode2.localdomain:8440 is not reachable, sleeping for 10 seconds... ', None) ('ERROR 2016-08-24 02:43:04,384 main.py:315 - Fatal exception occurred: Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/ambari_agent/main.py", line 312, in <module> main(heartbeat_stop_callback) File "/usr/lib/python2.6/site-packages/ambari_agent/main.py", line 248, in main stop_agent() File "/usr/lib/python2.6/site-packages/ambari_agent/main.py", line 195, in stop_agent sys.exit(1) SystemExit: 1 INFO 2016-08-24 02:43:04,385 ExitHelper.py:53 - Performing cleanup before exiting... INFO 2016-08-24 02:43:04,675 main.py:71 - loglevel=logging.INFO INFO 2016-08-24 02:43:04,675 main.py:71 - loglevel=logging.INFO INFO 2016-08-24 02:43:04,676 DataCleaner.py:39 - Data cleanup thread started INFO 2016-08-24 02:43:04,678 DataCleaner.py:120 - Data cleanup started INFO 2016-08-24 02:43:04,679 DataCleaner.py:122 - Data cleanup finished INFO 2016-08-24 02:43:04,703 PingPortListener.py:50 - Ping port listener started on port: 8670 INFO 2016-08-24 02:43:04,704 main.py:289 - Connecting to Ambari server at https://datanode2.localdomain:8440 (192.168.224.135) INFO 2016-08-24 02:43:04,705 NetUtil.py:60 - Connecting to https://datanode2.localdomain:8440/ca WARNING 2016-08-24 02:43:04,705 NetUtil.py:89 - Failed to connect to https://datanode2.localdomain:8440/ca due to [Errno 111] Connection refused WARNING 2016-08-24 02:43:04,705 NetUtil.py:112 - Server at https://datanode2.localdomain:8440 is not reachable, sleeping for 10 seconds... ', None) Connection to datanode2.localdomain closed. SSH command execution finished host=datanode2.localdomain, exitcode=0 Command end time 2016-08-24 02:43:07 Registering with the server... Registration with the server failed.

5 REPLIES 5

Re: Why is my Ambari confirm host step failing?

Your error indicates the following:

NetUtil.py:89 - Failed to connect to https://datanode2.localdomain:8440/ca due

It might be because either Firewall is running on "datanode2.localdomain"

(OR) the FQDN of the host "datanode2.localdomain" might not be correct.

(OR) The "/etc/hosts/" file might now have the correct entry in it.

Try using

telnet  datanode2.localdomain   8440

to see if there is any n/w issue? Because the ambari-agent is not able to conect to the AmbariServer

Re: Why is my Ambari confirm host step failing?

Also please check on the Ambari Server Host "datanode2.localdomain" the port 8440 is opened properly or not?

netstat -tnlpa | grep 8440

Running the above command from the AmbariServer will indicate if the port is opened properly or not ?

All the ambari-agent should have the /etc/hosts file with the entry of

10.10.10.10       datanode2.localdomain

Where 10.10.10.10. is example IP address of ambariserver.

Re: Why is my Ambari confirm host step failing?

@Joel Dodd

Even if DNS is enabled, it's always a good idea to ensure that every host is in your /etc/hosts. This will ensure if there are DNS issues, the hosts can still resolve each other.

While you said can connect from the Ambari node to each data node, can you connect to the Ambari node from each data node? Is https://datanode2.localdomain where your Ambari server is actually located?

Re: Why is my Ambari confirm host step failing?

New Contributor

check ambari-agent ini file, make sure you specified the ambari-server correctly

vi /etc/ambari-agent/conf/ambari-agent.ini

                                    
[server]
hostname={your.ambari.server.hostname}
url_port=4080
secured_url_port=8443
Highlighted

Re: Why is my Ambari confirm host step failing?

43686-ambariagentslavenode.png

43685-masterambariserver.png

Hello I am facing the exact issue as you , did you manage to resolve it please

for mu case I set a 3 nodes cluster with amabri server intsalled in one (master) node and ambari agents set in the 3 nodes .

The ambari intsall wizard was done successfully , but the services did not start and the heartbeats were lost for all the components.

in the master node ( ambari server node) I have

[root@master ~]# netstat -tnlpa | grep 8440

tcp 0 0 10.3.146.5:47608 51.15.134.161:8440 TIME_WAIT - tcp 0 0 10.3.146.5:47692 51.15.134.161:8440 TIME_WAIT - tcp 0 0 10.3.146.5:47636 51.15.134.161:8440 TIME_WAIT - tcp6 0 0 :::8440 :::* LISTEN 12315/java tcp6 0 0 10.3.146.5:8440 51.15.134.161:47584 TIME_WAIT - tcp6 0 0 10.3.146.5:8440 51.15.134.161:47718 TIME_WAIT - tcp6 0 0 10.3.146.5:8440 51.15.134.161:47670 TIME_WAIT - tcp6 0 0 10.3.146.5:8440 51.15.216.73:47410 TIME_WAIT -

[root@master ~]# netstat -tnlpa | grep 8441

tcp6 0 0 :::8441 :::* LISTEN 12315/java

in the other nodes ( just ambari agent) I have :

[root@node2 ~]# netstat -tnlpa | grep 8440

tcp 0 0 10.3.27.195:47386 51.15.134.161:8440 TIME_WAIT - tcp 0 0 10.3.27.195:47384 51.15.134.161:8440 TIME_WAIT - tcp 0 0 10.3.27.195:47390 51.15.134.161:8440 TIME_WAIT - tcp 0 0 10.3.27.195:47388 51.15.134.161:8440 TIME_WAIT - tcp 0 0 10.3.27.195:47392 51.15.134.161:8440 TIME_WAIT -

and

[root@node2 ~]# netstat -tnlpa | grep 8441

root@node2 ~]#

doesn't give me any result.

Si as you may notice

Ambari agent in the 2 slave nodes has no open 8441 port .

How do I manage to open this port please ?

Otherwise enclosed you may find the screen shots of the logs for the ambari server as well as the ambari agents in the 3 nodes .

Any idea to help me with this ?

Otherwise why are all the sevices stopped , with the start option deactivated ?

Thanks a lot in advance for your answers

Don't have an account?
Coming from Hortonworks? Activate your account here