Created on 03-04-2018 11:18 PM - edited 08-17-2019 05:14 PM
Suddenly, Ambari displayed heartbeat lost on all hosts as the following picture shows:
I have tried to generate a new certificate for each host as showed at this link but it does not help. Fyi, on all my hosts the folder `/var/lib/ambari-agent/keys` is empty.
If I execute the command `sudo ambari-agent start` I get the following error:
Verifying Python version compatibility... Using python /usr/bin/python Checking for previously running Ambari Agent... /run/ambari-agent/ambari-agent.pid found with no process. Removing 3439... Starting ambari-agent Verifying ambari-agent process status... ERROR: ambari-agent start failed. For more details, see /var/log/ambari-agent/ambari-agent.out: ==================== Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/ambari_agent/main.py", line 387, in <module> main(heartbeat_stop_callback) File "/usr/lib/python2.6/site-packages/ambari_agent/main.py", line 355, in main (retries, connected, stopped) = netutil.try_to_connect(server_url, MAX_RETRIES, logger) UnboundLocalError: local variable 'server_url' referenced before assignment ====================
Created 03-04-2018 11:33 PM
We see the error in determining the "server_url"
UnboundLocalError: local variable 'server_url' referenced before assignment ====================
In general this error can occur if the ambari agent process is somehow not able to determine the ambari server Host & Port information. So please check if the ambari Hostname is resolvable from the Agent machine? Like looking at the "/etc/hosts" file on agent to see if Ambari Server IP and Hostname is mapped properly and reachable?
We can also try connecting to Ambari Server host using telnet to see hostname and port resolution: (From agent machine w can run) Try using ambari server hostname (FQDN) and IP address both.
# telnet $AMBARI_SERVER_HOSTNAME 8080 (OR) # nc -v $AMBARI_SERVER_HOSTNAME 8080
.
We see that the problem is because of the above error. Ambari Agent is not able to determine the ambari-server hostname (URL).
This can happen if the "/etc/ambari-agent/conf/ambari-agent.ini" file might have an invalid ambari server hostname. By any chance do you see that the timestamp of this file "/etc/ambari-agent/conf/ambari-agent.ini" is changed? Or some user (or any automated script) has mistakenly removed the "hostname" and ports from this file?
.
Looks like you might be using Ambari 2.4.2 (or 2.4.x) as we see the code here: https://github.com/apache/ambari/blob/release-2.4.2/ambari-agent/src/main/python/ambari_agent/main.p...
for server_hostname in server_hostnames: try: server_ip = socket.gethostbyname(server_hostname) server_url = config.get_api_url(server_hostname) logger.info('Connecting to Ambari server at %s (%s)', server_url, server_ip) except socket.error: logger.warn("Unable to determine the IP address of the Ambari server '%s'", server_hostname) # Wait until MAX_RETRIES to see if server is reachable netutil = NetUtil(config, heartbeat_stop_callback) (retries, connected, stopped) = netutil.try_to_connect(server_url, MAX_RETRIES, logger)
Ambari Agent determines the "server_url" as following: https://github.com/apache/ambari/blob/release-2.4.2/ambari-agent/src/main/python/ambari_agent/Ambari...
def get_api_url(self, server_hostname): return "%s://%s:%s" % (self.CONNECTION_PROTOCOL, server_hostname, self.get('server', 'url_port'))
.
So please check the "/etc/ambari-server/conf/ambari.properties" file as well to check if by any chance the "client.port" or "api.ssl" etc are changed recently?
.
Created 03-04-2018 11:33 PM
We see the error in determining the "server_url"
UnboundLocalError: local variable 'server_url' referenced before assignment ====================
In general this error can occur if the ambari agent process is somehow not able to determine the ambari server Host & Port information. So please check if the ambari Hostname is resolvable from the Agent machine? Like looking at the "/etc/hosts" file on agent to see if Ambari Server IP and Hostname is mapped properly and reachable?
We can also try connecting to Ambari Server host using telnet to see hostname and port resolution: (From agent machine w can run) Try using ambari server hostname (FQDN) and IP address both.
# telnet $AMBARI_SERVER_HOSTNAME 8080 (OR) # nc -v $AMBARI_SERVER_HOSTNAME 8080
.
We see that the problem is because of the above error. Ambari Agent is not able to determine the ambari-server hostname (URL).
This can happen if the "/etc/ambari-agent/conf/ambari-agent.ini" file might have an invalid ambari server hostname. By any chance do you see that the timestamp of this file "/etc/ambari-agent/conf/ambari-agent.ini" is changed? Or some user (or any automated script) has mistakenly removed the "hostname" and ports from this file?
.
Looks like you might be using Ambari 2.4.2 (or 2.4.x) as we see the code here: https://github.com/apache/ambari/blob/release-2.4.2/ambari-agent/src/main/python/ambari_agent/main.p...
for server_hostname in server_hostnames: try: server_ip = socket.gethostbyname(server_hostname) server_url = config.get_api_url(server_hostname) logger.info('Connecting to Ambari server at %s (%s)', server_url, server_ip) except socket.error: logger.warn("Unable to determine the IP address of the Ambari server '%s'", server_hostname) # Wait until MAX_RETRIES to see if server is reachable netutil = NetUtil(config, heartbeat_stop_callback) (retries, connected, stopped) = netutil.try_to_connect(server_url, MAX_RETRIES, logger)
Ambari Agent determines the "server_url" as following: https://github.com/apache/ambari/blob/release-2.4.2/ambari-agent/src/main/python/ambari_agent/Ambari...
def get_api_url(self, server_hostname): return "%s://%s:%s" % (self.CONNECTION_PROTOCOL, server_hostname, self.get('server', 'url_port'))
.
So please check the "/etc/ambari-server/conf/ambari.properties" file as well to check if by any chance the "client.port" or "api.ssl" etc are changed recently?
.
Created 03-05-2018 12:11 AM
A similar issue is reported here: https://issues.apache.org/jira/browse/AMBARI-19246
So if you keep seeing this issue then explore an option to upgrade your ambari.
Created 03-05-2018 12:52 AM
If you can not upgrade your ambari binaries to atleast 2.5 then you can try the workaround as mentioned in : https://issues.apache.org/jira/browse/AMBARI-19246
Which says just move the [ server_url = config.get_api_url(server_hostname) ] line outside the try block. See: https://reviews.apache.org/r/54890/diff/1#index_header
You can make this change on all the ambari agent machines by editing the file: "/usr/lib/python2.6/site-packages/ambari_agent/main.py"
for server_hostname in server_hostnames: server_url = config.get_api_url(server_hostname) try: server_ip = socket.gethostbyname(server_hostname) logger.info('Connecting to Ambari server at %s (%s)', server_url, server_ip)
.
Created 03-05-2018 05:37 PM
Thanks for the reply. By setting the hostname of the machine running the ambari-server service, in the "/etc/hosts" file of each host machine, helped to solve the problem.
Quick question: in the "/etc/hosts" file of each host machine, shall I only set the hostname of the machine running the ambari-server service, or shall I also write the hostname of the all hosts? Because my cluster is composed by serveral machines, and different services, such as, YARN, Spark, Mapreduce< HIVE etc. are running on different machines (for example, some machine are Nodemanager, other machine run the Hive server etc.).
Created 03-05-2018 09:02 PM
Good to know that correcting the /etc/hosts entry resolved the issue.
Regarding your later query "in the "/etc/hosts" file of each host machine, shall I only set the hostname of the machine running the ambari-server service, or shall I also write the hostname of the all hosts?"
>>>> Every Node (host) which is part of the cluster need to resolved the hostname of ambari server which is needed for Ambari Server a nd Ambari Agent communication,Also other components as well needs to know the hostname of other clusternodes. For example the NameNode host should be able to resolve the hostnames of DataNodes , Similarly the ResourceManager should be able to resolve the hostname of all the NodeManager hosts ...etc Similarly the ambari service checks runs on different hosts ...
So basically you should have the same every node of your cluster to be able to resolve each other.
.