- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Ambari Agent randomly fails to register correctly
- Labels:
-
Apache Ambari
Created ‎02-19-2017 06:57 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey guys,
After repeating a full build several times, I'm failing to get the agents to consistently register to the Agents. I've discovered fro looking at the REST API response that Ambari server says the failing nodes are registering, but *not* registering with a FQHN:
{ "href" : "http://host0147.domain.com:8080/api/v1/hosts", "items" : [ { "href" : "http://host0147.domain.com:8080/api/v1/hosts/host0141", "Hosts" : { "host_name" : "host0141" } },{ "href" : "http://host0147.domain.com:8080/api/v1/hosts/host0145.domain.com", "Hosts" : { "host_name" : "host0145.domain.com" }
what's of note is that of the two hosts the first fails to register in a way that allows a successful installation, a symptom is that it has failed to register the FQHN, Of the ten hosts I have, several fail, and never consistently the same ones. The agent clearly connects, but fires off a registration that fails to take hold to the domain. The randomness is making this hard to diagnose.
- Does anybody know how to guarentee successful registration?
- Failing this, does anybody know to clear and re-execute the registration of failed hosts?
Thanks.
Created ‎02-19-2017 07:29 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ambari agent will generally use the "socket.getfqdn()" approach to find the FQDN. You can also validate the output of the same python command on your problematic hosts.
Example:
# python Python 2.6.6 (r266:84292, Aug 18 2016, 15:13:37) [GCC 4.4.7 20120313 (Red Hat 4.4.7-17)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import socket; >>> print socket.getfqdn(); sandbox.hortonworks.com
.
So please check if all your hosts are returning proper FQDN? Because everytime when we start ambari-agent it gathers information (like cpu/RAM/public_host_name/host_name) about the host where it is running and then sends a registration request to the ambari-server.
Also are these agents located in some cloud environment? If yes then it is possible that you might be encountering an issue that is reported in the article: https://community.hortonworks.com/content/kbentry/42872/why-ambari-host-might-have-different-public-...
.
Created ‎02-19-2017 07:29 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ambari agent will generally use the "socket.getfqdn()" approach to find the FQDN. You can also validate the output of the same python command on your problematic hosts.
Example:
# python Python 2.6.6 (r266:84292, Aug 18 2016, 15:13:37) [GCC 4.4.7 20120313 (Red Hat 4.4.7-17)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import socket; >>> print socket.getfqdn(); sandbox.hortonworks.com
.
So please check if all your hosts are returning proper FQDN? Because everytime when we start ambari-agent it gathers information (like cpu/RAM/public_host_name/host_name) about the host where it is running and then sends a registration request to the ambari-server.
Also are these agents located in some cloud environment? If yes then it is possible that you might be encountering an issue that is reported in the article: https://community.hortonworks.com/content/kbentry/42872/why-ambari-host-might-have-different-public-...
.
Created ‎02-19-2017 07:48 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Okay that's a start:
>>> import socket >>> print socket.getfqdn(); host0141
So somehow we don't have python finding a FQHN...
[centos@host0141 ~]$ hostname -f host0141.domain.com [centos@host0141 ~]$ python<<<"import socket;print socket.getfqdn();" host0141
So it seems that socket.getfqdn() is the culprit. I'm using openstack, I'm wondering if it's a delay in registering hosts, shortly after their deletion and recreation... thanks for helping JanSenSharma
I restarted and boom! I now have a FQHN coming from the socket function. Seems I need to restart the host to freshen the sockets after being spawned.
Created ‎02-19-2017 09:20 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In Openstack we can use the postinstallation "cloud-init" file to setup the desired FQDN/Hostname. https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux_OpenStack_Platform/4/html/End... Like:
#cloud-config hostname: host0141 fqdn: host0141.domain.com ssh_pwauth: False password: test
Are all your agent hosts having incorrect output returning for `hostname -f` and "socket.getfqdn()" (not same) ?
.
Created ‎02-19-2017 11:58 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm marking this as Solved thanks Jay. Technically this is not the right answer but certainly helped me get closer to an outcome I can use. Seems restarting the Openstack instance jiggles the sockets and allows Python to find the FQDN.
