I'm doing a manual registration of hosts. I started the ambari-agent on all the hosts(on the server host too) When I did register hosts, I only see the ambari-server host succeeded registration but others failed. Also the ambari-agent started successfully on the other nodes too but the registration failed.
Please check the following things:
1. From ambari-server host machine are you able to do Passwordless SSH to every host?
# ssh email@example.com # ssh firstname.lastname@example.org
2. Do you have "/etc/hosts" file entries setup on all hosts properly to resolve each other? And is that host name entry matching with the FQDN of indivisual hosts?
# cat /etc/hosts # hostname -f
3. The "iptables" (Firewall) is disabled on ambari server host and other hosts?
# service iptables stop
4. Also as you mentioned that the Host Registration failed. Which usuallyhappens on the port "8441" of ambari server. This is the "Registration and Heartbeat Port for Ambari Agents to Ambari Server".
So please check from the ambari agent machines if the ambari-server's port 8441 & 8440 (Handshake Port for Ambari Agents to Ambari Server) are accessible or not?
So please run the following commands from the Agent machines where the registration failed.
# nc -v $AMBARI_HOSTNAME 8440 # nc -v $AMBARI_HOSTNAME 8441 OR # telnet $AMBARI_HOSTNAME 8440 # telnet $AMBARI_HOSTNAME 8441
Data Request: It will be really great if you can share the output of the agent registration from the ambari UI. You will see "failed" link in the ambari UI where the agent registration failed. Once you click on that you will see the reason of failure ... Please share that whole output.
Although there are options to Install the ambari-agents manually as well. As mentioned in the following doc (just in case you want to try that approach).
@Jay SenSharma I setup my agents using the same document mentioned above . Ambari-agent started successfully on all hosts but host registration is failing except the one on which ambari-server is running.
I started agent on the server-host too. Is this fine? ( this is the only host which succeeded registration)
1. Should i be able to ssh as root or the user account used to setup ssh? I chose the same user to run ambari instead of root.( and yes I’m able to ssh using the user setup)
2. Should I update etc/hosts on all the hosts or just the ambari-server host?
3. Yes I did stop iptables on all hosts.
4. Yes I ran the port check from all the agent nodes.
For the Failed link it just says Host registration failed and nothing else is being displayed. Is there any other place to look for the specific error?
1. Please check and share the /var/log/ambari-agent/ambari-agent.log that must log some error for failed registration.
2. Also it will be worth checking the ambari server port access from the Agent machines , to isolate the Port/Network/Firewall issue
# nc -v $AMBARI_HOSTNAME 8440 # nc -v $AMBARI_HOSTNAME 8441
3. Regarding your queri "Should I update etc/hosts on all the hosts or just the ambari-server host?"
>>> Yes, Ideally all the "/etc/hosts" entries through out the cluster should look alike including ambari-server "/etc/hosts" sothat they can identify each other with their FQDN. https://docs.hortonworks.com/HDPDocuments/Ambari-18.104.22.168/bk_ambari-installation-ppc/content/edit_the...
4. As you seems to be running ambari server as "Non Root" user hence please check the following link to be double sure that it is setup correctly: https://docs.hortonworks.com/HDPDocuments/Ambari-22.214.171.124/bk_ambari-security/content/how_to_configure...
Also if you are planning to run the ambari agents as welll with Non Root user then it will be worth checking: https://docs.hortonworks.com/HDPDocuments/Ambari-126.96.36.199/bk_ambari-security/content/how_to_configure...
@Jay SenSharma One of my hosts is failing to register. Even in DEBUG mode I don't see an error on the log file. It's just stopping at socket.getfqdn() of the ambari-agent. I did stop and start ambari-server and the agents. And placed all the hosts in the /etc/hosts of all the hosts. iptables are stopped on all the hosts as well. checked the ports to the ambari-server form this host and that works fine as well.
Please try the following approach:
1. On the problematic Agent host, Create a file with name : "/var/lib/ambari-agent/public_hostname.sh" then in that file add the following line:
#!/bin/sh echo `hostname -f`
2. Make sure that the file "/var/lib/ambari-agent/public_hostname.sh" has proper execute permission. Example:
chmod 755 "/var/lib/ambari-agent/public_hostname.sh"
3. On the ambari-agent host edit the file "/etc/ambari-agent/conf/ambari-agent.ini" and in the [agent] section add the following line:
## Added following to customize the public hostname public_hostname_script=/var/lib/ambari-agent/public_hostname.sh hostname_script=/var/lib/ambari-agent/public_hostname.sh
NOTE: Users can also use the property "hostname_script" to customize the internal hostname.
4. Now restart the agents.
For more details please refer to: https://community.hortonworks.com/articles/42872/why-ambari-host-might-have-different-public-host-n....
Looks like the Ambari Agent which is basically a Python script is not able to determine the hostname (FQDN) So you can also try running the following command on the problematic host to see if it is returning the correct hostname or not?
[root@sandbox ~]# python -c "import socket; print socket.getfqdn()" sandbox.hortonworks.com
Also please check if you have setup the Hostname as mentioned in the following 3 links:
On Centos Or RHEL you can also use the following command to setup the hostname:
# sysctl kernel.hostname = agent1.example.com