Support Questions
Find answers, ask questions, and share your expertise

Ambari Agent heatbeat missing an one or two intervals intermittently once or twice in a week

Explorer

Hi,

One of the edge nodes in the cluster is generating alerts once or twice a week due to missed heartbeat from agent.


INFO 2018-01-29 07:04:38,554 logger.py:71 - call returned (0, '')
INFO 2018-01-29 07:06:04,226 logger.py:71 - call[['test', '-w', '/']] {'sudo': True, 'timeout': 5}
INFO 2018-01-29 07:06:04,233 logger.py:71 - call returned (0, '')

As you can see, there is no logging for 1.5 minutes and it is causing ambari alert for this edge node. How can i track if there was any connectivity issue between server and agent. Sometimes more than 1 heartbeat interval is missing.

1 REPLY 1

Super Mentor

@Dhirendra Khanka

If the heartbeat lost is happening once or twice a week and then if the heart beat of the agent is coming back immediately in few seconds, then it can be ignored or if we do not want to see the alert then we can try to increase the default check interval from 2 minutes to 3 minutes.

Ambari UI --> Alerts --> "Ambari Agent Heartbeat"  --> "Edit"  and then increase the "Check Interval" from 2 minutes (TO) 3 minutes.


Sometimes due to heavy load or Network traffic or network slow request/response the heartbeat is not sent/received properly. If it is happening on a specific time in a day or week then we can check the edge node if it was running an heavy job or network operation to see if that was causing any issue.

.