Support Questions
Find answers, ask questions, and share your expertise

Ambari Alert on Yarn: NodeManager Health Summary

Contributor

Hi, I am getting the following alert on Ambari for the Yarn service: 

There are 2 stale alerts from 2 host(s):
node-m03.host.com
[NodeManager Health Summary (6h 34m)],
node-wn01.host.com
[NodeManager Health (7h)]

 

However, i don't see any nodemanagers with the unhealthy status on ambari... And i did a check on the YARN service with success.

 

I am not sure what exactly i should be doing to correct this alert. Could you please help me troubleshoot this issue?

 

Thank you

 

HDP 3.0.1

YARN 3.1.0

Ambari 2.7.1

1 REPLY 1

Cloudera Employee

A stale alert does not necessarily mean there is a problem in your cluster. It triggers only when the ambari-agent is taking some time to run the alert and it does not run at the scheduled time. There can be multiple reasons for this and among them is network latency or resource crunch.

 

There are a couple of ways to correct this.

 

Increase the alert grace period
There is a grace period before an Ambari agent reports that a configured alert missed its schedule. If the alert missed its scheduled time but ran within the grace period, the stale alert isn't generated.

The default alert_grace_period value is 5 seconds. You can configure this setting in /etc/ambari-agent/conf/ambari-agent.ini. For hosts on which stale alerts occur at regular intervals, try increasing the value to 10. Then, restart the Ambari agent.

 

Increase the alert interval time
One may increase the value of an individual alert interval, based on your cluster's response time and load:

In the Apache Ambari UI, select the Alerts tab.
Select the alert definition name that you want.
From the definition, select Edit.
Increase the Check Interval value, and then select Save.
Increase the alert interval time for Ambari Server Alerts
In the Apache Ambari UI, select the Alerts tab.
From the Groups drop-down list, select AMBARI Default.
Select the Ambari Server Alerts alert.
From the definition, select Edit.
Increase the Check Interval value.
Increase the Interval Multiplier value, and then select Save.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.