Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Ambari 2.5.1 server and agent disagree on heartbeat

Highlighted

Ambari 2.5.1 server and agent disagree on heartbeat

New Contributor

I have never seen this before upgrading to 2.5.1

A cluster of 6 nodes, after 2 weeks running, one of the hosts is listed as having lost the heartbeat. The agent is reporting metrics and all the components are running fine without alerts. It's only that the node actions are disabled.

However looking at the agents log, the heartbeat seems to be running normally and continuously, e.g:

INFO 2017-07-25 01:55:31,110 Controller.py:304 - Heartbeat (response id = 621881) with server is running...
INFO 2017-07-25 01:55:31,110 Controller.py:311 - Building heartbeat message
INFO 2017-07-25 01:55:31,112 Heartbeat.py:90 - Adding host info/state to heartbeat message.
INFO 2017-07-25 01:55:31,163 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length.
INFO 2017-07-25 01:55:31,163 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length.
INFO 2017-07-25 01:55:31,289 Hardware.py:176 - Some mount points were ignored: /, /dev, /dev/shm, /sys/fs/cgroup, /run, /boot, /var/log, /hadoop, /hadoop/druid, /hadoop/yarn/local, /run/user/1017, /run/user/1006, /run/user/1002, /run/user/1003
INFO 2017-07-25 01:55:31,291 Controller.py:320 - Sending Heartbeat (id = 621881)
INFO 2017-07-25 01:55:31,335 Controller.py:332 - Heartbeat response received (id = 621882)
INFO 2017-07-25 01:55:31,336 Controller.py:341 - Heartbeat interval is 1 seconds
INFO 2017-07-25 01:55:31,336 Controller.py:377 - Updating configurations from heartbeat
INFO 2017-07-25 01:55:31,336 Controller.py:386 - Adding cancel/execution commands
INFO 2017-07-25 01:55:31,336 Controller.py:471 - Waiting 0.9 for next heartbeat
INFO 2017-07-25 01:55:32,236 Controller.py:478 - Wait for next heartbeat over

Both server and agent are clean installations and all the ambari packages are on the same version: 2.5.1.0-159

Restarting the server solved it without having to restart the agent.

Has anybody seen this behavior? So far has only happened once but I have only recently started using 2.5.1

2 REPLIES 2

Re: Ambari 2.5.1 server and agent disagree on heartbeat

@Gonzalo Herreros

Did Ambari server logs report any issues at the time of lost heartbeats?

Re: Ambari 2.5.1 server and agent disagree on heartbeat

New Contributor

Since the server started and finished startup, there was only one message printed every 5 minutes:

25 Jul 2017 14:12:22,785  INFO [pool-18-thread-1] MetricsServiceImpl:64 - Checking for metrics sink initialization

However, since I restarted now is gone and so far the heartbeat is fine. Maybe is a coincidence. I'm thinking, I don't know when the heartbeat was lost since it's only noticeable if you go to the hosts or the specific host screen, it doesn't show in the main screen because the services are fine