Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Ambari Agent randomly fails to register correctly

avatar
Explorer

Hey guys,

After repeating a full build several times, I'm failing to get the agents to consistently register to the Agents. I've discovered fro looking at the REST API response that Ambari server says the failing nodes are registering, but *not* registering with a FQHN:

{  "href" : "http://host0147.domain.com:8080/api/v1/hosts",  
   "items" : [  
      {  
       "href" : "http://host0147.domain.com:8080/api/v1/hosts/host0141",  
       "Hosts" : {  "host_name" : "host0141"  }  
      },{  
        "href" : "http://host0147.domain.com:8080/api/v1/hosts/host0145.domain.com",  
        "Hosts" : {  "host_name" : "host0145.domain.com"  }    

what's of note is that of the two hosts the first fails to register in a way that allows a successful installation, a symptom is that it has failed to register the FQHN, Of the ten hosts I have, several fail, and never consistently the same ones. The agent clearly connects, but fires off a registration that fails to take hold to the domain. The randomness is making this hard to diagnose.

  1. Does anybody know how to guarentee successful registration?
  2. Failing this, does anybody know to clear and re-execute the registration of failed hosts?

Thanks.

1 ACCEPTED SOLUTION

avatar
Master Mentor
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
4 REPLIES 4

avatar
Master Mentor
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar
Explorer

Okay that's a start:

>>> import socket
>>> print socket.getfqdn();
host0141

So somehow we don't have python finding a FQHN...

[centos@host0141 ~]$ hostname -f
host0141.domain.com
[centos@host0141 ~]$ python<<<"import socket;print socket.getfqdn();"
host0141

So it seems that socket.getfqdn() is the culprit. I'm using openstack, I'm wondering if it's a delay in registering hosts, shortly after their deletion and recreation... thanks for helping JanSenSharma

I restarted and boom! I now have a FQHN coming from the socket function. Seems I need to restart the host to freshen the sockets after being spawned.

avatar
Master Mentor

@rbailey

In Openstack we can use the postinstallation "cloud-init" file to setup the desired FQDN/Hostname. https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux_OpenStack_Platform/4/html/End... Like:

#cloud-config
hostname: host0141
fqdn: host0141.domain.com
ssh_pwauth: False
password: test

Are all your agent hosts having incorrect output returning for `hostname -f` and "socket.getfqdn()" (not same) ?

.

avatar
Explorer

I'm marking this as Solved thanks Jay. Technically this is not the right answer but certainly helped me get closer to an outcome I can use. Seems restarting the Openstack instance jiggles the sockets and allows Python to find the FQDN.