Created on 01-13-2016 04:27 PM - edited 08-19-2019 05:15 AM
I have restarted Ambari Server and all agents along with complete HDP stack multiple times in past 5 days for different activities but these alerts don't go away.
Created 01-20-2016 07:20 AM
Hi @Pardeep with Support's help we got rid of those alerts by adding 'misfire_grace_time':10 to APS_CONFIG in /usr/lib/python2.6/site-packages/ambari_agent/AlertSchedulerHandler.py on every node. After the update that section should read:
APS_CONFIG = { 'threadpool.core_threads': 3, 'coalesce': True, 'standalone': False, 'misfire_grace_time':10 }
In this we are allowing up to 10 seconds for all tests to complete. After that restart all ambari_agents. We tried on one cluster and it worked. This is most likely fixed in Ambari-2.2 but happens in 2.1.2.
Created 01-13-2016 04:28 PM
@Pardeep You may have to open a support ticket for this. I had the same experience and support had to troubleshoot it
Created 01-18-2016 06:15 PM
Hey Neeraj! I'm having this same issue now, too. Do you have any input regarding how support troubleshoot it for you or do I need to hit them up as well with a ticket for an answer?
Created 01-18-2016 07:53 PM
@Mark Petronic Support will webex and check for configs and other settings related to AMS.
Created 01-18-2016 09:30 PM
Check the ambari heap size, it may be running out of memory.
/var/lib/ambari-server/ambari-env.sh
Change -Xmx2048m to 8GB if you have enough memory availbale and restart ambari-server.
Created 01-19-2016 02:20 AM
Well, I solved it another way - I just upgraded to Ambari 2.2. I will watch see if this comes back as I run with this version and open a ticket if that happens. I was in the midst of upgrading anyway when this started to happen. I need to move to HDP 2.3.4 and Spark 1.5. But, 8 GB, really? Wow! That seems ridiculously expensive for a monitoring framework.
Created 01-20-2016 07:20 AM
Hi @Pardeep with Support's help we got rid of those alerts by adding 'misfire_grace_time':10 to APS_CONFIG in /usr/lib/python2.6/site-packages/ambari_agent/AlertSchedulerHandler.py on every node. After the update that section should read:
APS_CONFIG = { 'threadpool.core_threads': 3, 'coalesce': True, 'standalone': False, 'misfire_grace_time':10 }
In this we are allowing up to 10 seconds for all tests to complete. After that restart all ambari_agents. We tried on one cluster and it worked. This is most likely fixed in Ambari-2.2 but happens in 2.1.2.
Created 04-12-2016 05:48 AM
Created 12-07-2017 08:25 AM
After increasing the timeout, the error still persists and do you think this might be because of system coonfiguration??
Created 02-05-2016 01:40 AM
HDP 2.3, Ambari 2.1
Adding 'misfire_grace_time':10 to APS_CONFIG in /usr/lib/python2.6/site-packages/ambari_agent/AlertSchedulerHandler.py on every node and restarting ambari server and agent on all nodes didn't worked for me.