Created on 01-13-2016 04:27 PM - edited 08-19-2019 05:15 AM
I have restarted Ambari Server and all agents along with complete HDP stack multiple times in past 5 days for different activities but these alerts don't go away.
Created 01-20-2016 07:20 AM
Hi @Pardeep with Support's help we got rid of those alerts by adding 'misfire_grace_time':10 to APS_CONFIG in /usr/lib/python2.6/site-packages/ambari_agent/AlertSchedulerHandler.py on every node. After the update that section should read:
APS_CONFIG = { 'threadpool.core_threads': 3, 'coalesce': True, 'standalone': False, 'misfire_grace_time':10 }
In this we are allowing up to 10 seconds for all tests to complete. After that restart all ambari_agents. We tried on one cluster and it worked. This is most likely fixed in Ambari-2.2 but happens in 2.1.2.
Created on 01-30-2020 08:19 AM - edited 01-30-2020 08:20 AM
This was what fixed it for me.
Using Ambari Version 2.7.3.0
SSH into host that's reporting stale alerts:
sudo vi /etc/ambari-agent/conf/ambari-agent.ini
change alert_grace_period (default 5 | change to: 15)
optional: restart service's instance reporting stale alerts (JournalNodes, NameNodes,etc)
sudo systemctl restart ambari-agent.service
Example using CentOS / Redhat