Support Questions

pardeep_kumar · ‎01-13-2016

I have restarted Ambari Server and all agents along with complete HDP stack multiple times in past 5 days for different activities but these alerts don't go away.

pminovic · ‎01-20-2016

Hi @Pardeep with Support's help we got rid of those alerts by adding 'misfire_grace_time':10 to APS_CONFIG in /usr/lib/python2.6/site-packages/ambari_agent/AlertSchedulerHandler.py on every node. After the update that section should read:

APS_CONFIG = { 
'threadpool.core_threads': 3, 
'coalesce': True, 
'standalone': False, 
'misfire_grace_time':10 
}

In this we are allowing up to 10 seconds for all tests to complete. After that restart all ambari_agents. We tried on one cluster and it worked. This is most likely fixed in Ambari-2.2 but happens in 2.1.2.

View solution in original post

nsabharwal · ‎01-13-2016

@Pardeep You may have to open a support ticket for this. I had the same experience and support had to troubleshoot it

Mark_Petronic · ‎01-18-2016

@Neeraj Sabharwal

Hey Neeraj! I'm having this same issue now, too. Do you have any input regarding how support troubleshoot it for you or do I need to hit them up as well with a ticket for an answer?

nsabharwal · ‎01-18-2016

@Mark Petronic Support will webex and check for configs and other settings related to AMS.

smishra1 · ‎01-18-2016

Check the ambari heap size, it may be running out of memory.

/var/lib/ambari-server/ambari-env.sh

Change -Xmx2048m to 8GB if you have enough memory availbale and restart ambari-server.

Mark_Petronic · ‎01-19-2016

Well, I solved it another way - I just upgraded to Ambari 2.2. I will watch see if this comes back as I run with this version and open a ticket if that happens. I was in the midst of upgrading anyway when this started to happen. I need to move to HDP 2.3.4 and Spark 1.5. But, 8 GB, really? Wow! That seems ridiculously expensive for a monitoring framework.

pminovic · ‎01-20-2016

Hi @Pardeep with Support's help we got rid of those alerts by adding 'misfire_grace_time':10 to APS_CONFIG in /usr/lib/python2.6/site-packages/ambari_agent/AlertSchedulerHandler.py on every node. After the update that section should read:

APS_CONFIG = { 
'threadpool.core_threads': 3, 
'coalesce': True, 
'standalone': False, 
'misfire_grace_time':10 
}

In this we are allowing up to 10 seconds for all tests to complete. After that restart all ambari_agents. We tried on one cluster and it worked. This is most likely fixed in Ambari-2.2 but happens in 2.1.2.

KuldeepK · ‎04-12-2016

@Sagar Shimpi

gauravb117 · ‎12-07-2017

After increasing the timeout, the error still persists and do you think this might be because of system coonfiguration??

sushil61 · ‎02-05-2016

HDP 2.3, Ambari 2.1

Adding 'misfire_grace_time':10 to APS_CONFIG in /usr/lib/python2.6/site-packages/ambari_agent/AlertSchedulerHandler.py on every node and restarting ambari server and agent on all nodes didn't worked for me.

Cloudera Community

Support Questions

How to get rid of stale alerts in Ambari