what I noticed is that reverse lookup is creating problem,

for instance when you install the cluster you have mentioned the host name as datanode1 and later your reverse lookup(from etc/resolve.conf) or etc/hosts entry has been update/resolved to to the amber-server is rejecting the message as it treats that host not part of the cluster, this can be notice in amber-server logs (not on agent logs)

"WARN [alert-event-bus-1] AlertReceivedListener:530 - Unable to process alert datanode_process for an invalid service HDFS and component DATANODE on host"

quick fix is update the hosts entries in agent node(in above example datanode1) resolved the problem for changed host names.


I have faced similar issue today and have lost the heartbeat for all of the services.

I have done just restarted the ambari agent , and relaunched and logined into Ambari and it worked well.

ambari-agent restart

We need to make sure below point in heart beat lost host.

  • check firewall status. It should be stop.

service iptables stop

  • check /etc/ambari-agent/conf/ambari-agent.ini file.

in ambari-agent file hostname entry should be

hostname = ambariservernodehost

ambariservernodehost should be present in /etc/hosts file

  • openssl version should be upgraded.
  • Stop ambari-server
  • Stop ambari-agent service on all nodes
  • Start ambari-agent service on all nodes
  • Start ambari-server server

check logs of ambari agent. If even there is problem then please reply me.

Heartbeats work fine from ambari-agent host with this:

rpm -qa openssl

But not with this:

rpm -qa openssl

With this newer version of openssl, the ambari agent is attempting to connect to ambari server using https instead of http. In our setup, ambari is restricted to just internal cluster users (admins) and therefore is not setup for https. This results in lost heartbeats. You can work around this by changing the default verification rule for python on each agent host like this:

sed -i 's/verify=platform_default/verify=disable/' /etc/python/cert-verification.cfg
ambari-agent restart

I know, not the best solution because you change the security default for python host-wide. But, as an interim fix, it works.