Support Questions

c_ · ‎04-20-2016

I lost all heartbeats on some datanodes of my cluster after a restart of machines.

The problem is located just after the connection of ambari-agent of the host with ambari-server

The last log I received in /var/log/ambari-agent/ambari-agent.log file of the defected DataNode :

INFO 2016-04-20 17:59:48,925 PingPortListener.py:50 - Ping port listener started on port: 8670 INFO 2016-04-20 17:59:48,927 main.py:283 - Connecting to Ambari server at https://hmaster1.xxx.local:8440 (10.10.238.111)

NetUtil.py:60 - Connecting to https://hmaster1.xxx.local:8440/ca

With the working Datanodes the process continue with this line of log :

INFO 2016-04-20 17:51:22,147 threadpool.py:52 - Started thread pool with 3 core threads and 20 maximum threads

In the Log of the ambari-server located in /var/log/ambari-server/ambari-server.log file. I see anything between the defected DataNode and Ambari Master.

I notice that I use the last version of ambari 2.2.1.1 and centos 7 with the last updates.

I disabled all firewall rules and I have the same configuration for the working dataNode and the defected one.

Any idea about this strange problem ?

manish1 · ‎04-20-2016

first check, whether these datanodes are reachable from ambari-server using ssh protocol and their hostnames. And also try to do vice-versa then telnet from datanode to ambari server using ambari-server hostname on port 8440. If everything looks good. Then kill the current ambari-agent daemon and restart this service. Please make sure there is no hung stale instance of Ambani-Agent is running.

If it does not work then stop Ambari server. Stop postgresql DB server

Now Start Ambari-Server and it will start postgreSQL server itself.

Let me know if it does not fix the issue.

View solution in original post

manish1 · ‎04-20-2016

first check, whether these datanodes are reachable from ambari-server using ssh protocol and their hostnames. And also try to do vice-versa then telnet from datanode to ambari server using ambari-server hostname on port 8440. If everything looks good. Then kill the current ambari-agent daemon and restart this service. Please make sure there is no hung stale instance of Ambani-Agent is running.

If it does not work then stop Ambari server. Stop postgresql DB server

Now Start Ambari-Server and it will start postgreSQL server itself.

Let me know if it does not fix the issue.

c_ · ‎04-21-2016

It was an ssh problem between machines. Thank you.

ajay_kumar · ‎04-20-2016

@K. Karray

Kill any stale amabri-agent on effected nodes. ( ps -ef|grep ambari-agent)
Restart the ambari-agent manually. (sudo systemctl start ambari-agent)
If issue persists share the amabri-agent logs

balkrushnapatil · ‎04-30-2017

We need to make sure below point in heart beat lost host.

check firewall status. It should be stop.

service iptables stop

check /etc/ambari-agent/conf/ambari-agent.ini file.

in ambari-agent file hostname entry should be

hostname = ambariservernodehost

ambariservernodehost should be present in /etc/hosts file

openssl version should be upgraded.

Stop ambari-server
Stop ambari-agent service on all nodes
Start ambari-agent service on all nodes
Start ambari-server server

check logs of ambari agent. If even there is problem then please reply me.

Cloudera Community

Support Questions

Stange lost of HeartBeats Ambari in some hosts