Support Questions

Find answers, ask questions, and share your expertise

How to get rid of stale alerts in Ambari

avatar

I have restarted Ambari Server and all agents along with complete HDP stack multiple times in past 5 days for different activities but these alerts don't go away.

1308-screen-shot-2016-01-13-at-102524-am.png

1 ACCEPTED SOLUTION

avatar
Master Guru

Hi @Pardeep with Support's help we got rid of those alerts by adding 'misfire_grace_time':10 to APS_CONFIG in /usr/lib/python2.6/site-packages/ambari_agent/AlertSchedulerHandler.py on every node. After the update that section should read:

APS_CONFIG = { 
'threadpool.core_threads': 3, 
'coalesce': True, 
'standalone': False, 
'misfire_grace_time':10 
}

In this we are allowing up to 10 seconds for all tests to complete. After that restart all ambari_agents. We tried on one cluster and it worked. This is most likely fixed in Ambari-2.2 but happens in 2.1.2.

View solution in original post

20 REPLIES 20

avatar
Master Collaborator

Hi:

today i had the same problem and i have efixed adding this on all nodes in thid file:

/usr/lib/python2.6/site-packages/ambari_agent/AlertSchedulerHandler.py

APS_CONFIG = { 'threadpool.core_threads': 3, 'coalesce': True, 'standalone': False, 'misfire_grace_time':10 }

but i want to know, what is that, its a bug or i Have a problem with my cluster.

thanks

avatar
New Contributor

I did this in root user, found the file and changed it there. But, how to change it for each node?

 

avatar
Master Collaborator

Hi:

today the alert appear again, so, i think there are another problem, any suggestions?

Many Thanks.

avatar
Master Collaborator

Hi:

today the alert appear again, so, i think there are another problem, any suggestions?

Many Thanks.

avatar
Guru

FWIW, You can disable this alert. Click "Enabled" next to State in upper-right and this alert will no longer be checked.

avatar

I have the same issue with an HDP 2.4 on SLES 11.4 fresh install.

Many of the alerts regard timeouts chacking the UIs:

Data Node UI, Node Manager UI, Atrals UI, Oozie UI, etc.

All these URLs are reachable from both the windows laptops with the browser an the HDP-nodes using wget in the console.

How does Ambari check these URLs? Is it possible that the check-scripts ignore os-wide proxy- and firewall-configurations?

I would also like to know what does the "24-Hour" columns mean, does someone have an idea what does it mean? There is no mention at all of this field in the docs. The content is alway "0" in my cluster (like in the posted screenshot).

Many thanks

avatar
Master Collaborator

Hi:

have you did this:

/usr/lib/python2.6/site-packages/ambari_agent/AlertSchedulerHandler.py

  1. APS_CONFIG ={'threadpool.core_threads':3,'coalesce':True,'standalone':False,'misfire_grace_time':10}

also, check de ambari database alert

regards

avatar

@Predrag Minovic

In my Ambari the code look like:

self.APS_CONFIG = {
      'apscheduler.threadpool.core_threads': 3,
      'apscheduler.coalesce': True,
      'apscheduler.standalone': False,
      'apscheduler.misfire_grace_time': alert_grace_period
    }

avatar
Contributor

In Ambari 2.2 misfire_grace_time is configurable by changing the value for the variable alert_grace_period.

This variable is configured in /etc/ambari-agent/conf/ambari-agent.ini and the default is:

alert_grace_period=5

You can increase to 10 seconds to match above answer.

This needs to be done on all hosts running ambari-agent, and afterwards ambari-agent needs to be restarted.

avatar
Explorer

I have tried to place this parameter

  1. APS_CONFIG ={'threadpool.core_threads':3,'coalesce':True,'standalone':False,'misfire_grace_time':10} in the following file /usr/lib/python2.6/site-packages/ambari_agent/AlertSchedulerHandler.py
  2. Changed the parmeter and restarted all ambari-agents but failed with following error

====================================================================================

  • Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/ambari_agent/AmbariAgent.py", line 24, in <module> from Controller import AGENT_AUTO_RESTART_EXIT_CODE File "/usr/lib/python2.6/site-packages/ambari_agent/Controller.py", line 44, in <module> from ambari_agent.AlertSchedulerHandler import AlertSchedulerHandler File "/usr/lib/python2.6/site-packages/ambari_agent/AlertSchedulerHandler.py", line 50 'misfire_grace_time':10 ^SyntaxError: invalid syntax
  • =========================================================================================
  • Please help me out how to fix this issue
  • Appreciate the help