Support Questions

Find answers, ask questions, and share your expertise

Ambari displays heartbeat lost on all hosts

avatar
New Contributor

Suddenly, Ambari displayed heartbeat lost on all hosts as the following picture shows:

I have tried to generate a new certificate for each host as showed at this link but it does not help. Fyi, on all my hosts the folder `/var/lib/ambari-agent/keys` is empty.

If I execute the command `sudo ambari-agent start` I get the following error:

Verifying Python version compatibility... Using python /usr/bin/python Checking for previously running Ambari Agent... /run/ambari-agent/ambari-agent.pid found with no process. Removing 3439... Starting ambari-agent Verifying ambari-agent process status... ERROR: ambari-agent start failed. For more details, see /var/log/ambari-agent/ambari-agent.out: ==================== Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/ambari_agent/main.py", line 387, in <module> main(heartbeat_stop_callback) File "/usr/lib/python2.6/site-packages/ambari_agent/main.py", line 355, in main (retries, connected, stopped) = netutil.try_to_connect(server_url, MAX_RETRIES, logger) UnboundLocalError: local variable 'server_url' referenced before assignment ====================

64473-screen-shot-2018-03-04-at-234414.png

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Hpc

We see the error in determining the "server_url"

UnboundLocalError: local variable 'server_url' referenced before assignment ====================

In general this error can occur if the ambari agent process is somehow not able to determine the ambari server Host & Port information. So please check if the ambari Hostname is resolvable from the Agent machine? Like looking at the "/etc/hosts" file on agent to see if Ambari Server IP and Hostname is mapped properly and reachable?

We can also try connecting to Ambari Server host using telnet to see hostname and port resolution: (From agent machine w can run) Try using ambari server hostname (FQDN) and IP address both.

# telnet $AMBARI_SERVER_HOSTNAME  8080
(OR)
# nc -v $AMBARI_SERVER_HOSTNAME  8080

.
We see that the problem is because of the above error. Ambari Agent is not able to determine the ambari-server hostname (URL).
This can happen if the "/etc/ambari-agent/conf/ambari-agent.ini" file might have an invalid ambari server hostname. By any chance do you see that the timestamp of this file "/etc/ambari-agent/conf/ambari-agent.ini" is changed? Or some user (or any automated script) has mistakenly removed the "hostname" and ports from this file?
.
Looks like you might be using Ambari 2.4.2 (or 2.4.x) as we see the code here: https://github.com/apache/ambari/blob/release-2.4.2/ambari-agent/src/main/python/ambari_agent/main.p...

    for server_hostname in server_hostnames:
      try:
        server_ip = socket.gethostbyname(server_hostname)
        server_url = config.get_api_url(server_hostname)
        logger.info('Connecting to Ambari server at %s (%s)', server_url, server_ip)
      except socket.error:
        logger.warn("Unable to determine the IP address of the Ambari server '%s'", server_hostname)

      # Wait until MAX_RETRIES to see if server is reachable
      netutil = NetUtil(config, heartbeat_stop_callback)
(retries, connected, stopped) = netutil.try_to_connect(server_url, MAX_RETRIES, logger)


Ambari Agent determines the "server_url" as following: https://github.com/apache/ambari/blob/release-2.4.2/ambari-agent/src/main/python/ambari_agent/Ambari...

  def get_api_url(self, server_hostname):
    return "%s://%s:%s" % (self.CONNECTION_PROTOCOL,
                           server_hostname,
                           self.get('server', 'url_port'))

.

So please check the "/etc/ambari-server/conf/ambari.properties" file as well to check if by any chance the "client.port" or "api.ssl" etc are changed recently?

.

View solution in original post

5 REPLIES 5

avatar
Master Mentor

@Hpc

We see the error in determining the "server_url"

UnboundLocalError: local variable 'server_url' referenced before assignment ====================

In general this error can occur if the ambari agent process is somehow not able to determine the ambari server Host & Port information. So please check if the ambari Hostname is resolvable from the Agent machine? Like looking at the "/etc/hosts" file on agent to see if Ambari Server IP and Hostname is mapped properly and reachable?

We can also try connecting to Ambari Server host using telnet to see hostname and port resolution: (From agent machine w can run) Try using ambari server hostname (FQDN) and IP address both.

# telnet $AMBARI_SERVER_HOSTNAME  8080
(OR)
# nc -v $AMBARI_SERVER_HOSTNAME  8080

.
We see that the problem is because of the above error. Ambari Agent is not able to determine the ambari-server hostname (URL).
This can happen if the "/etc/ambari-agent/conf/ambari-agent.ini" file might have an invalid ambari server hostname. By any chance do you see that the timestamp of this file "/etc/ambari-agent/conf/ambari-agent.ini" is changed? Or some user (or any automated script) has mistakenly removed the "hostname" and ports from this file?
.
Looks like you might be using Ambari 2.4.2 (or 2.4.x) as we see the code here: https://github.com/apache/ambari/blob/release-2.4.2/ambari-agent/src/main/python/ambari_agent/main.p...

    for server_hostname in server_hostnames:
      try:
        server_ip = socket.gethostbyname(server_hostname)
        server_url = config.get_api_url(server_hostname)
        logger.info('Connecting to Ambari server at %s (%s)', server_url, server_ip)
      except socket.error:
        logger.warn("Unable to determine the IP address of the Ambari server '%s'", server_hostname)

      # Wait until MAX_RETRIES to see if server is reachable
      netutil = NetUtil(config, heartbeat_stop_callback)
(retries, connected, stopped) = netutil.try_to_connect(server_url, MAX_RETRIES, logger)


Ambari Agent determines the "server_url" as following: https://github.com/apache/ambari/blob/release-2.4.2/ambari-agent/src/main/python/ambari_agent/Ambari...

  def get_api_url(self, server_hostname):
    return "%s://%s:%s" % (self.CONNECTION_PROTOCOL,
                           server_hostname,
                           self.get('server', 'url_port'))

.

So please check the "/etc/ambari-server/conf/ambari.properties" file as well to check if by any chance the "client.port" or "api.ssl" etc are changed recently?

.

avatar
Master Mentor

@Hpc

A similar issue is reported here: https://issues.apache.org/jira/browse/AMBARI-19246

So if you keep seeing this issue then explore an option to upgrade your ambari.

avatar
Master Mentor

@Hpc

If you can not upgrade your ambari binaries to atleast 2.5 then you can try the workaround as mentioned in : https://issues.apache.org/jira/browse/AMBARI-19246

Which says just move the [ server_url = config.get_api_url(server_hostname) ] line outside the try block. See: https://reviews.apache.org/r/54890/diff/1#index_header

You can make this change on all the ambari agent machines by editing the file: "/usr/lib/python2.6/site-packages/ambari_agent/main.py"

    for server_hostname in server_hostnames:
      server_url = config.get_api_url(server_hostname)
      try:
        server_ip = socket.gethostbyname(server_hostname)
        logger.info('Connecting to Ambari server at %s (%s)', server_url, server_ip)

.

avatar
New Contributor

HI @Jay Kumar SenSharma,

Thanks for the reply. By setting the hostname of the machine running the ambari-server service, in the "/etc/hosts" file of each host machine, helped to solve the problem.

Quick question: in the "/etc/hosts" file of each host machine, shall I only set the hostname of the machine running the ambari-server service, or shall I also write the hostname of the all hosts? Because my cluster is composed by serveral machines, and different services, such as, YARN, Spark, Mapreduce< HIVE etc. are running on different machines (for example, some machine are Nodemanager, other machine run the Hive server etc.).

avatar
Master Mentor

@Hpc

Good to know that correcting the /etc/hosts entry resolved the issue.

Regarding your later query "in the "/etc/hosts" file of each host machine, shall I only set the hostname of the machine running the ambari-server service, or shall I also write the hostname of the all hosts?"

>>>> Every Node (host) which is part of the cluster need to resolved the hostname of ambari server which is needed for Ambari Server a nd Ambari Agent communication,Also other components as well needs to know the hostname of other clusternodes. For example the NameNode host should be able to resolve the hostnames of DataNodes , Similarly the ResourceManager should be able to resolve the hostname of all the NodeManager hosts ...etc Similarly the ambari service checks runs on different hosts ...

So basically you should have the same every node of your cluster to be able to resolve each other.

.