Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Stange lost of HeartBeats Ambari in some hosts

Solved Go to solution

Stange lost of HeartBeats Ambari in some hosts

New Contributor

I lost all heartbeats on some datanodes of my cluster after a restart of machines.

The problem is located just after the connection of ambari-agent of the host with ambari-server

The last log I received in /var/log/ambari-agent/ambari-agent.log file of the defected DataNode :

INFO 2016-04-20 17:59:48,925 PingPortListener.py:50 - Ping port listener started on port: 8670 INFO 2016-04-20 17:59:48,927 main.py:283 - Connecting to Ambari server at https://hmaster1.xxx.local:8440 (10.10.238.111)

NetUtil.py:60 - Connecting to https://hmaster1.xxx.local:8440/ca

With the working Datanodes the process continue with this line of log :

INFO 2016-04-20 17:51:22,147 threadpool.py:52 - Started thread pool with 3 core threads and 20 maximum threads

In the Log of the ambari-server located in /var/log/ambari-server/ambari-server.log file. I see anything between the defected DataNode and Ambari Master.

I notice that I use the last version of ambari 2.2.1.1 and centos 7 with the last updates.

I disabled all firewall rules and I have the same configuration for the working dataNode and the defected one.

Any idea about this strange problem ?

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Stange lost of HeartBeats Ambari in some hosts

Expert Contributor

first check, whether these datanodes are reachable from ambari-server using ssh protocol and their hostnames. And also try to do vice-versa then telnet from datanode to ambari server using ambari-server hostname on port 8440. If everything looks good. Then kill the current ambari-agent daemon and restart this service. Please make sure there is no hung stale instance of Ambani-Agent is running.

If it does not work then stop Ambari server. Stop postgresql DB server

Now Start Ambari-Server and it will start postgreSQL server itself.

Let me know if it does not fix the issue.

View solution in original post

4 REPLIES 4
Highlighted

Re: Stange lost of HeartBeats Ambari in some hosts

Expert Contributor

first check, whether these datanodes are reachable from ambari-server using ssh protocol and their hostnames. And also try to do vice-versa then telnet from datanode to ambari server using ambari-server hostname on port 8440. If everything looks good. Then kill the current ambari-agent daemon and restart this service. Please make sure there is no hung stale instance of Ambani-Agent is running.

If it does not work then stop Ambari server. Stop postgresql DB server

Now Start Ambari-Server and it will start postgreSQL server itself.

Let me know if it does not fix the issue.

View solution in original post

Highlighted

Re: Stange lost of HeartBeats Ambari in some hosts

New Contributor

It was an ssh problem between machines. Thank you.

Highlighted

Re: Stange lost of HeartBeats Ambari in some hosts

Rising Star
@K. Karray
  1. Kill any stale amabri-agent on effected nodes. ( ps -ef|grep ambari-agent)
  2. Restart the ambari-agent manually. (sudo systemctl start ambari-agent)
  3. If issue persists share the amabri-agent logs
Highlighted

Re: Stange lost of HeartBeats Ambari in some hosts

We need to make sure below point in heart beat lost host.

  • check firewall status. It should be stop.

service iptables stop

  • check /etc/ambari-agent/conf/ambari-agent.ini file.

in ambari-agent file hostname entry should be

hostname = ambariservernodehost

ambariservernodehost should be present in /etc/hosts file

  • openssl version should be upgraded.
  • Stop ambari-server
  • Stop ambari-agent service on all nodes
  • Start ambari-agent service on all nodes
  • Start ambari-server server

check logs of ambari agent. If even there is problem then please reply me.

Don't have an account?
Coming from Hortonworks? Activate your account here