Created 07-18-2016 06:29 PM
I have a long running Ambari configured cluster that has run into some problems. Several weeks ago we had a network upgrade and we lost the DNS routing that had previously been present. This caused the Ambari server VM to return "localhost" when running 'hostname -f'. As such, I expected this may cause Ambari to think that the Ambari REST API was running at localhost instead of the FQ hostname. Since then, I have updated the Ambari Server VM (via /etc/hosts) to return the correct FQ hostname when running 'hostname -f', restarted the ambari server, restarted the agents, but still the agents are looking for the REST API at 'locahost:8080'.
The problem can be seen when trying to register a new host in the Ambari Server log:
STDOUT: Host registration aborted. Ambari Agent host cannot reach Ambari Server 'localhost:8080'. Please check the network connectivity between the Ambari Agent host and the Ambari Server
You can see above this that in the Bootstrapping hosts process that the 'server' is being set to 'localhost':
using tmp dir /var/run/ambari-server/bootstrap/2 ambari: localhost; server_port: 8080; ambari version: 2.2.0.0;
Is this hostname value cached somewhere? The agent can be manually registered by changing the ini file and specifying the Ambari server hostname, but this only helps with registration. All REST API calls from the Agent host still route to (for example) "locahost:8080/resources/..." so it doesn't solve the problem, and some component installation fails as a result.
I've also looked for rogue Ambari processes and this does not appear to be the problem.
Thanks for any help.
Created 07-20-2016 04:03 AM
I finally noticed that the perl command:
>>> import socket; >>> print socket.getfqdn();
was returning the wrong value for hostname, which led me to fixing the /etc/hosts file with the right fully qualified host setting. In this case, hostname -f was giving me the right value, so this was hiding the real issue in the hosts file.
Created 07-19-2016 12:23 AM
I see your Ambari is version 2.2.0.0, try upgrading Ambari to latest 2.2.2.0. I'm hoping it will force updated configs. Other than that I would scan all tables in Ambari db for local host.
Created 07-19-2016 01:46 AM
I had already updated ambari-server and ambari-agent on all hosts to 2.2.2.0 and restarted all of them with the hope that this would help, but no luck. I have also tried dumping the Ambari DB and looking for localhost references, but unfortunately there were hundreds if not thousands of references to localhost so was hard to tell what might be problematic. So is the possible solution to force an update in the DB if I can figure out what to update? Any hints on which table this value might be in?
Created 07-19-2016 02:00 AM
What does the property in ambari-agent.ini on any of the agent hosts refer to? Look in /etc/ambari-agent/conf
Does it say Ambari server host point to local host or an old Ambari server hostname?
Created 07-19-2016 03:29 AM
It is specified as the current Ambari server's fully qualified hostname, the same as what is returned by 'hostname -f' on the Ambari server VM. With this set, the host can be registered manually, but after registration there continue to be problems trying to access localhost:8080, such as error messages like these in the ambari-agent.log.
WARNING 2016-07-18 23:26:51,379 FileCache.py:162 - Error occurred during cache update. Error tolerate setting is set to true, so ignoring this error and continuing with current cache. Error details: Can not download file from url http://localhost:8080/resources//host_scripts/.hash : <urlopen error [Errno 97] Address family not supported by protocol>
Thanks for your help!
Created 07-20-2016 01:08 AM
Anyone have any other ideas before I wipe this cluster and start over?
Created 07-20-2016 04:03 AM
I finally noticed that the perl command:
>>> import socket; >>> print socket.getfqdn();
was returning the wrong value for hostname, which led me to fixing the /etc/hosts file with the right fully qualified host setting. In this case, hostname -f was giving me the right value, so this was hiding the real issue in the hosts file.
Created 06-15-2018 08:10 AM
for me it was the fqdn on the ambari host, it was "localhost" as the /etc/hosts file contained the hosts name with ip 127.0.0.1