Created 02-19-2017 06:57 AM
Hey guys,
After repeating a full build several times, I'm failing to get the agents to consistently register to the Agents. I've discovered fro looking at the REST API response that Ambari server says the failing nodes are registering, but *not* registering with a FQHN:
{ "href" : "http://host0147.domain.com:8080/api/v1/hosts", "items" : [ { "href" : "http://host0147.domain.com:8080/api/v1/hosts/host0141", "Hosts" : { "host_name" : "host0141" } },{ "href" : "http://host0147.domain.com:8080/api/v1/hosts/host0145.domain.com", "Hosts" : { "host_name" : "host0145.domain.com" }
what's of note is that of the two hosts the first fails to register in a way that allows a successful installation, a symptom is that it has failed to register the FQHN, Of the ten hosts I have, several fail, and never consistently the same ones. The agent clearly connects, but fires off a registration that fails to take hold to the domain. The randomness is making this hard to diagnose.
Thanks.
Created 02-19-2017 07:29 AM
Ambari agent will generally use the "socket.getfqdn()" approach to find the FQDN. You can also validate the output of the same python command on your problematic hosts.
Example:
# python Python 2.6.6 (r266:84292, Aug 18 2016, 15:13:37) [GCC 4.4.7 20120313 (Red Hat 4.4.7-17)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import socket; >>> print socket.getfqdn(); sandbox.hortonworks.com
.
So please check if all your hosts are returning proper FQDN? Because everytime when we start ambari-agent it gathers information (like cpu/RAM/public_host_name/host_name) about the host where it is running and then sends a registration request to the ambari-server.
Also are these agents located in some cloud environment? If yes then it is possible that you might be encountering an issue that is reported in the article: https://community.hortonworks.com/content/kbentry/42872/why-ambari-host-might-have-different-public-...
.
Created 02-19-2017 07:29 AM
Ambari agent will generally use the "socket.getfqdn()" approach to find the FQDN. You can also validate the output of the same python command on your problematic hosts.
Example:
# python Python 2.6.6 (r266:84292, Aug 18 2016, 15:13:37) [GCC 4.4.7 20120313 (Red Hat 4.4.7-17)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import socket; >>> print socket.getfqdn(); sandbox.hortonworks.com
.
So please check if all your hosts are returning proper FQDN? Because everytime when we start ambari-agent it gathers information (like cpu/RAM/public_host_name/host_name) about the host where it is running and then sends a registration request to the ambari-server.
Also are these agents located in some cloud environment? If yes then it is possible that you might be encountering an issue that is reported in the article: https://community.hortonworks.com/content/kbentry/42872/why-ambari-host-might-have-different-public-...
.
Created 02-19-2017 07:48 AM
Okay that's a start:
>>> import socket >>> print socket.getfqdn(); host0141
So somehow we don't have python finding a FQHN...
[centos@host0141 ~]$ hostname -f host0141.domain.com [centos@host0141 ~]$ python<<<"import socket;print socket.getfqdn();" host0141
So it seems that socket.getfqdn() is the culprit. I'm using openstack, I'm wondering if it's a delay in registering hosts, shortly after their deletion and recreation... thanks for helping JanSenSharma
I restarted and boom! I now have a FQHN coming from the socket function. Seems I need to restart the host to freshen the sockets after being spawned.
Created 02-19-2017 09:20 AM
In Openstack we can use the postinstallation "cloud-init" file to setup the desired FQDN/Hostname. https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux_OpenStack_Platform/4/html/End... Like:
#cloud-config hostname: host0141 fqdn: host0141.domain.com ssh_pwauth: False password: test
Are all your agent hosts having incorrect output returning for `hostname -f` and "socket.getfqdn()" (not same) ?
.
Created 02-19-2017 11:58 AM
I'm marking this as Solved thanks Jay. Technically this is not the right answer but certainly helped me get closer to an outcome I can use. Seems restarting the Openstack instance jiggles the sockets and allows Python to find the FQDN.