Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Ambari Agent randomly fails to register correctly

avatar
Explorer

Hey guys,

After repeating a full build several times, I'm failing to get the agents to consistently register to the Agents. I've discovered fro looking at the REST API response that Ambari server says the failing nodes are registering, but *not* registering with a FQHN:

{  "href" : "http://host0147.domain.com:8080/api/v1/hosts",  
   "items" : [  
      {  
       "href" : "http://host0147.domain.com:8080/api/v1/hosts/host0141",  
       "Hosts" : {  "host_name" : "host0141"  }  
      },{  
        "href" : "http://host0147.domain.com:8080/api/v1/hosts/host0145.domain.com",  
        "Hosts" : {  "host_name" : "host0145.domain.com"  }    

what's of note is that of the two hosts the first fails to register in a way that allows a successful installation, a symptom is that it has failed to register the FQHN, Of the ten hosts I have, several fail, and never consistently the same ones. The agent clearly connects, but fires off a registration that fails to take hold to the domain. The randomness is making this hard to diagnose.

  1. Does anybody know how to guarentee successful registration?
  2. Failing this, does anybody know to clear and re-execute the registration of failed hosts?

Thanks.

1 ACCEPTED SOLUTION

avatar
Master Mentor

@rbailey

Ambari agent will generally use the "socket.getfqdn()" approach to find the FQDN. You can also validate the output of the same python command on your problematic hosts.

Example:

# python
Python 2.6.6 (r266:84292, Aug 18 2016, 15:13:37) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-17)] on linux2
Type "help", "copyright", "credits" or "license" for more information.

>>> import socket;
>>> print socket.getfqdn();
sandbox.hortonworks.com

.

So please check if all your hosts are returning proper FQDN? Because everytime when we start ambari-agent it gathers information (like cpu/RAM/public_host_name/host_name) about the host where it is running and then sends a registration request to the ambari-server.

Also are these agents located in some cloud environment? If yes then it is possible that you might be encountering an issue that is reported in the article: https://community.hortonworks.com/content/kbentry/42872/why-ambari-host-might-have-different-public-...

.

View solution in original post

4 REPLIES 4

avatar
Master Mentor

@rbailey

Ambari agent will generally use the "socket.getfqdn()" approach to find the FQDN. You can also validate the output of the same python command on your problematic hosts.

Example:

# python
Python 2.6.6 (r266:84292, Aug 18 2016, 15:13:37) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-17)] on linux2
Type "help", "copyright", "credits" or "license" for more information.

>>> import socket;
>>> print socket.getfqdn();
sandbox.hortonworks.com

.

So please check if all your hosts are returning proper FQDN? Because everytime when we start ambari-agent it gathers information (like cpu/RAM/public_host_name/host_name) about the host where it is running and then sends a registration request to the ambari-server.

Also are these agents located in some cloud environment? If yes then it is possible that you might be encountering an issue that is reported in the article: https://community.hortonworks.com/content/kbentry/42872/why-ambari-host-might-have-different-public-...

.

avatar
Explorer

Okay that's a start:

>>> import socket
>>> print socket.getfqdn();
host0141

So somehow we don't have python finding a FQHN...

[centos@host0141 ~]$ hostname -f
host0141.domain.com
[centos@host0141 ~]$ python<<<"import socket;print socket.getfqdn();"
host0141

So it seems that socket.getfqdn() is the culprit. I'm using openstack, I'm wondering if it's a delay in registering hosts, shortly after their deletion and recreation... thanks for helping JanSenSharma

I restarted and boom! I now have a FQHN coming from the socket function. Seems I need to restart the host to freshen the sockets after being spawned.

avatar
Master Mentor

@rbailey

In Openstack we can use the postinstallation "cloud-init" file to setup the desired FQDN/Hostname. https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux_OpenStack_Platform/4/html/End... Like:

#cloud-config
hostname: host0141
fqdn: host0141.domain.com
ssh_pwauth: False
password: test

Are all your agent hosts having incorrect output returning for `hostname -f` and "socket.getfqdn()" (not same) ?

.

avatar
Explorer

I'm marking this as Solved thanks Jay. Technically this is not the right answer but certainly helped me get closer to an outcome I can use. Seems restarting the Openstack instance jiggles the sockets and allows Python to find the FQDN.