Support Questions

Find answers, ask questions, and share your expertise

Heartbeat Lost: Hostname is not sending heartbeat

avatar
Explorer

Hi Team,

I have come across one of the strange issue in our cluster where i see that suddenly all of the services turns out to be Yellow and says that heartbeat lost of that particular services. It seems to be intermittent the services sometimes turns to green and after sometime it again turns out to be yellow and says that heartbeat lost.

Ambari Version: Version2.4.2.0

Ambari -Agent version: ambari-agent-2.4.2.0-136

Please find the screenshot attached.heartbeat-lost.png

We have tried restarting ambari-server, ambari-agent, postgresql but did not help. We have checked the logs but did not find anything.

Can anyone please help me in providing the solution to get this fixed? Also would like to know what made to arise this issue suddenly?

Thanking in Advance..!!

7 REPLIES 7

avatar
Master Mentor

@Shrikant BM

1. What is the Size of your Cluster? If the cluster size is large then sometimes we need to tune the "agent.threadpool.size.max"

agent.threadpool.size.max" : property sets max number of threads used to process heartbeats from ambari agents. The default value for this property is "25". This basically indicates the size of the Jetty connection pool used for handling incoming Ambari Agent requests.

# grep 'agent.threadpool.size.max' /etc/ambari-server/conf/ambari.properties
50

.

For more detail on this please refer to: https://community.hortonworks.com/articles/131670/ambari-server-performance-tuning-troubleshooting-c...

.

2. If the heartbeat be coming back shortly (in few seconds) then another approach will be to increase the "Ambari Agent Heartbeat" interval from 2 minutes to bit more. Ambari UI --> Alerts --> Search for "Ambari Agent Heartbeat"

.

3. Please share the ambari-server.log and ambari-agent logs of the same time stamp when you notice the heartbeat lost ... so that we can review for any strange behaviour.

.

4. If the heartbeat lost is happening on a specific duration (time pattern) then we should check if any heavy load job is running on the agent host that might be causing the Agent to not send the heartbeat for few seconds.

.

avatar
Explorer

Thanks for sharing the details.

Cluster Size: It is a 21 node cluster. We will try with the options that you have mentioned

Sharing the logs would be little difficult for me.. let me try my best.

you can please share me the link for similar kind of issues that can help me to figure it out

avatar
Explorer

43844-capture84.jpg

43845-capture83.jpg

@Jay Kumar SenSharma

My logs and my ambari components heartbeat fail after sometime

avatar

Hi @Gaurav Bapat ,

This error seems to be becuase of python version

can you please refer to following thread

https://community.hortonworks.com/questions/120861/ambari-agent-ssl-certificate-verify-failed-certif...

I hope you issue is same.

avatar
Explorer

I have python 2.7.5 installed, do I need to downgrade it or upgrade it??

Is the SSL error because of Heartbeat and also why does my Metron component fails??

@akhilsnaik

avatar

Hi @Gaurav Bapat,

As you are using python 2.7.5

you might be hitting the same bug mentioned in about link.

You can refer to this Link . https://access.redhat.com/articles/2039753#controlling-certificate-verification-7 and try disabling the certificates.

hope this helps

avatar
Rising Star

Did you upgrade os from base version with "yum upgrade" command?