Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Ambari displays heartbeat lost on all hosts

avatar
New Contributor

Suddenly, Ambari displayed heartbeat lost on all hosts as the following picture shows:

I have tried to generate a new certificate for each host as showed at this link but it does not help. Fyi, on all my hosts the folder `/var/lib/ambari-agent/keys` is empty.

If I execute the command `sudo ambari-agent start` I get the following error:

Verifying Python version compatibility... Using python /usr/bin/python Checking for previously running Ambari Agent... /run/ambari-agent/ambari-agent.pid found with no process. Removing 3439... Starting ambari-agent Verifying ambari-agent process status... ERROR: ambari-agent start failed. For more details, see /var/log/ambari-agent/ambari-agent.out: ==================== Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/ambari_agent/main.py", line 387, in <module> main(heartbeat_stop_callback) File "/usr/lib/python2.6/site-packages/ambari_agent/main.py", line 355, in main (retries, connected, stopped) = netutil.try_to_connect(server_url, MAX_RETRIES, logger) UnboundLocalError: local variable 'server_url' referenced before assignment ====================

64473-screen-shot-2018-03-04-at-234414.png

1 ACCEPTED SOLUTION

avatar
Master Mentor
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
5 REPLIES 5

avatar
Master Mentor
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar
Master Mentor

@Hpc

A similar issue is reported here: https://issues.apache.org/jira/browse/AMBARI-19246

So if you keep seeing this issue then explore an option to upgrade your ambari.

avatar
Master Mentor

@Hpc

If you can not upgrade your ambari binaries to atleast 2.5 then you can try the workaround as mentioned in : https://issues.apache.org/jira/browse/AMBARI-19246

Which says just move the [ server_url = config.get_api_url(server_hostname) ] line outside the try block. See: https://reviews.apache.org/r/54890/diff/1#index_header

You can make this change on all the ambari agent machines by editing the file: "/usr/lib/python2.6/site-packages/ambari_agent/main.py"

    for server_hostname in server_hostnames:
      server_url = config.get_api_url(server_hostname)
      try:
        server_ip = socket.gethostbyname(server_hostname)
        logger.info('Connecting to Ambari server at %s (%s)', server_url, server_ip)

.

avatar
New Contributor

HI @Jay Kumar SenSharma,

Thanks for the reply. By setting the hostname of the machine running the ambari-server service, in the "/etc/hosts" file of each host machine, helped to solve the problem.

Quick question: in the "/etc/hosts" file of each host machine, shall I only set the hostname of the machine running the ambari-server service, or shall I also write the hostname of the all hosts? Because my cluster is composed by serveral machines, and different services, such as, YARN, Spark, Mapreduce< HIVE etc. are running on different machines (for example, some machine are Nodemanager, other machine run the Hive server etc.).

avatar
Master Mentor

@Hpc

Good to know that correcting the /etc/hosts entry resolved the issue.

Regarding your later query "in the "/etc/hosts" file of each host machine, shall I only set the hostname of the machine running the ambari-server service, or shall I also write the hostname of the all hosts?"

>>>> Every Node (host) which is part of the cluster need to resolved the hostname of ambari server which is needed for Ambari Server a nd Ambari Agent communication,Also other components as well needs to know the hostname of other clusternodes. For example the NameNode host should be able to resolve the hostnames of DataNodes , Similarly the ResourceManager should be able to resolve the hostname of all the NodeManager hosts ...etc Similarly the ambari service checks runs on different hosts ...

So basically you should have the same every node of your cluster to be able to resolve each other.

.