Does the ambari-agent log show that heartbeats are being generated and acknowledged? How many nodes are in the cluster? or is it that all those losing the heartbeat are in the same node?
So I did the operations, but nothing changes.
In my ambari-agent log it shows WARNING 2016-03-16 10:18:00,938 NetUtil.py:105 - Server at https://dl-master:8440 is not reachable, sleeping for 10 seconds...
The cluster have 4 nodes and only loose heartbeat on one node.
I still have the same problem, Is it possible that the problem is : Connecting to https://dl-master:8440/ca ERROR 2016-03-16 10:18:31,414 NetUtil.py:77 - [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:590) I don't have any certificate with amabari-agent. But I didn't have any problem before.
You can also check /var/log/ambari-server/ambari-server.log to see if there are any errors being logged on that host's heartbeat. Sometimes if heartbeat processing encounters an error it will manifest like a lost heartbeat from the agent.
Might be possible the change in firewall or something, Have you configured this within firewall ? If so, then please check by disabling it, might be your server isn't reaching due to that.
I have the same problem, Is this problem resolved???
I have installed HDP 2.4 in ubuntu 14.04, Single Node Cluster, I have configured the cluster using the hostname not using IP address. Please find below the screenshot:
Hearbeat lost is the error for all the components.
Connection failed: [Errno 111] Connection refused to (hostname) :50095
Sorry, I didn't see your answer sooner. I wasn't able to solve the issue and finally decide to reinstall ambari, that remove this issue. I didn't face the same kind of issue since then.
Please provide the logs, and check anything is running on 50095 port.
netstat -tlpn |grep 50095.
you will get the output with process id and check which process is running on the port 50095.
ps -ef |grep process_id
and kill the process if not important.