what I noticed is that reverse lookup is creating problem,
for instance when you install the cluster you have mentioned the host name as datanode1 and later your reverse lookup(from etc/resolve.conf) or etc/hosts entry has been update/resolved to to datenode1.example.com. the amber-server is rejecting the message as it treats that host not part of the cluster, this can be notice in amber-server logs (not on agent logs)
"WARN [alert-event-bus-1] AlertReceivedListener:530 - Unable to process alert datanode_process for an invalid service HDFS and component DATANODE on host datanode1.example.com"
quick fix is update the hosts entries in agent node(in above example datanode1) resolved the problem for changed host names.
I have faced similar issue today and have lost the heartbeat for all of the services.
I have done just restarted the ambari agent , and relaunched and logined into Ambari and it worked well.
Please let me know in case of queries.
We need to make sure below point in heart beat lost host.
service iptables stop
in ambari-agent file hostname entry should be
hostname = ambariservernodehost
ambariservernodehost should be present in /etc/hosts file
check logs of ambari agent. If even there is problem then please reply me.
Heartbeats work fine from ambari-agent host with this:
rpm -qa openssl openssl-1.0.1e-51.el7_2.5.x86_64
But not with this:
rpm -qa openssl openssl-1.0.2k-8.el7.x86_64
With this newer version of openssl, the ambari agent is attempting to connect to ambari server using https instead of http. In our setup, ambari is restricted to just internal cluster users (admins) and therefore is not setup for https. This results in lost heartbeats. You can work around this by changing the default verification rule for python on each agent host like this:
sed -i 's/verify=platform_default/verify=disable/' /etc/python/cert-verification.cfg ambari-agent restart
I know, not the best solution because you change the security default for python host-wide. But, as an interim fix, it works.