Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Ambari Agents thinks that the Ambari REST API is located at 'localhost:8080' and as a result fails to register new hosts and install components

avatar
Explorer

I have a long running Ambari configured cluster that has run into some problems. Several weeks ago we had a network upgrade and we lost the DNS routing that had previously been present. This caused the Ambari server VM to return "localhost" when running 'hostname -f'. As such, I expected this may cause Ambari to think that the Ambari REST API was running at localhost instead of the FQ hostname. Since then, I have updated the Ambari Server VM (via /etc/hosts) to return the correct FQ hostname when running 'hostname -f', restarted the ambari server, restarted the agents, but still the agents are looking for the REST API at 'locahost:8080'.

The problem can be seen when trying to register a new host in the Ambari Server log:

STDOUT: Host registration aborted. Ambari Agent host cannot reach Ambari Server 'localhost:8080'. Please check the network connectivity between the Ambari Agent host and the Ambari Server

You can see above this that in the Bootstrapping hosts process that the 'server' is being set to 'localhost':

using tmp dir /var/run/ambari-server/bootstrap/2 ambari: localhost; server_port: 8080; ambari version: 2.2.0.0; 

Is this hostname value cached somewhere? The agent can be manually registered by changing the ini file and specifying the Ambari server hostname, but this only helps with registration. All REST API calls from the Agent host still route to (for example) "locahost:8080/resources/..." so it doesn't solve the problem, and some component installation fails as a result.

I've also looked for rogue Ambari processes and this does not appear to be the problem.

Thanks for any help.

1 ACCEPTED SOLUTION

avatar
Explorer

I finally noticed that the perl command:

>>> import socket;
>>> print socket.getfqdn();

was returning the wrong value for hostname, which led me to fixing the /etc/hosts file with the right fully qualified host setting. In this case, hostname -f was giving me the right value, so this was hiding the real issue in the hosts file.

View solution in original post

7 REPLIES 7

avatar
Master Mentor

I see your Ambari is version 2.2.0.0, try upgrading Ambari to latest 2.2.2.0. I'm hoping it will force updated configs. Other than that I would scan all tables in Ambari db for local host.

avatar
Explorer

I had already updated ambari-server and ambari-agent on all hosts to 2.2.2.0 and restarted all of them with the hope that this would help, but no luck. I have also tried dumping the Ambari DB and looking for localhost references, but unfortunately there were hundreds if not thousands of references to localhost so was hard to tell what might be problematic. So is the possible solution to force an update in the DB if I can figure out what to update? Any hints on which table this value might be in?

avatar
Master Mentor

What does the property in ambari-agent.ini on any of the agent hosts refer to? Look in /etc/ambari-agent/conf

Does it say Ambari server host point to local host or an old Ambari server hostname?

avatar
Explorer

It is specified as the current Ambari server's fully qualified hostname, the same as what is returned by 'hostname -f' on the Ambari server VM. With this set, the host can be registered manually, but after registration there continue to be problems trying to access localhost:8080, such as error messages like these in the ambari-agent.log.

WARNING 2016-07-18 23:26:51,379 FileCache.py:162 - Error occurred during cache update. Error tolerate setting is set to true, so ignoring this error and continuing with current cache. Error details: Can not download file from url http://localhost:8080/resources//host_scripts/.hash : <urlopen error [Errno 97] Address family not supported by protocol>

Thanks for your help!

avatar
Explorer

Anyone have any other ideas before I wipe this cluster and start over?

avatar
Explorer

I finally noticed that the perl command:

>>> import socket;
>>> print socket.getfqdn();

was returning the wrong value for hostname, which led me to fixing the /etc/hosts file with the right fully qualified host setting. In this case, hostname -f was giving me the right value, so this was hiding the real issue in the hosts file.

avatar
New Contributor

for me it was the fqdn on the ambari host, it was "localhost" as the /etc/hosts file contained the hosts name with ip 127.0.0.1