Support Questions

Find answers, ask questions, and share your expertise

How to get rid of stale alerts in Ambari

avatar

I have restarted Ambari Server and all agents along with complete HDP stack multiple times in past 5 days for different activities but these alerts don't go away.

1308-screen-shot-2016-01-13-at-102524-am.png

1 ACCEPTED SOLUTION

avatar
Master Guru

Hi @Pardeep with Support's help we got rid of those alerts by adding 'misfire_grace_time':10 to APS_CONFIG in /usr/lib/python2.6/site-packages/ambari_agent/AlertSchedulerHandler.py on every node. After the update that section should read:

APS_CONFIG = { 
'threadpool.core_threads': 3, 
'coalesce': True, 
'standalone': False, 
'misfire_grace_time':10 
}

In this we are allowing up to 10 seconds for all tests to complete. After that restart all ambari_agents. We tried on one cluster and it worked. This is most likely fixed in Ambari-2.2 but happens in 2.1.2.

View solution in original post

20 REPLIES 20

avatar
Master Mentor

@Pardeep You may have to open a support ticket for this. I had the same experience and support had to troubleshoot it

avatar
Expert Contributor
@Neeraj Sabharwal

Hey Neeraj! I'm having this same issue now, too. Do you have any input regarding how support troubleshoot it for you or do I need to hit them up as well with a ticket for an answer?

avatar
Master Mentor

@Mark Petronic Support will webex and check for configs and other settings related to AMS.

avatar
Contributor

Check the ambari heap size, it may be running out of memory.

/var/lib/ambari-server/ambari-env.sh

Change -Xmx2048m to 8GB if you have enough memory availbale and restart ambari-server.

avatar
Expert Contributor

Well, I solved it another way - I just upgraded to Ambari 2.2. I will watch see if this comes back as I run with this version and open a ticket if that happens. I was in the midst of upgrading anyway when this started to happen. I need to move to HDP 2.3.4 and Spark 1.5. But, 8 GB, really? Wow! That seems ridiculously expensive for a monitoring framework.

avatar
Master Guru

Hi @Pardeep with Support's help we got rid of those alerts by adding 'misfire_grace_time':10 to APS_CONFIG in /usr/lib/python2.6/site-packages/ambari_agent/AlertSchedulerHandler.py on every node. After the update that section should read:

APS_CONFIG = { 
'threadpool.core_threads': 3, 
'coalesce': True, 
'standalone': False, 
'misfire_grace_time':10 
}

In this we are allowing up to 10 seconds for all tests to complete. After that restart all ambari_agents. We tried on one cluster and it worked. This is most likely fixed in Ambari-2.2 but happens in 2.1.2.

avatar
Master Guru

avatar
Explorer

After increasing the timeout, the error still persists and do you think this might be because of system coonfiguration??

avatar
Expert Contributor

HDP 2.3, Ambari 2.1

Adding 'misfire_grace_time':10 to APS_CONFIG in /usr/lib/python2.6/site-packages/ambari_agent/AlertSchedulerHandler.py on every node and restarting ambari server and agent on all nodes didn't worked for me.