Support Questions

Find answers, ask questions, and share your expertise

How to purge/reset Ambari alerts

avatar
Explorer

Hello I am using Ambari version 2.4.2. Can anyone provide the steps for purging alerts especially those that are stale without having to delete the alert or restart the individual components/services?

1 ACCEPTED SOLUTION

avatar
Super Collaborator

You can clear out an individual alert's state by disabling it and then re-enabling it. This will cause all active instances of that alert to disappear and it will run clean. On some versions of Ambari, this was required when you did things like delete a host which could leave orphaned alerts which never run again (and thus become stale).

If you are seeing the actual "Stale Alert" trigger, you'll want to identify which alerts are causing it to fire - in other words, which alerts are not running. Disabling/Enabling those could help - but if they seem to continue to be stale, then something else is going on which is preventing them from running.

View solution in original post

8 REPLIES 8

avatar
Master Mentor

@Freemon Johnson

You can try using the "db-cleanup" option (in ambari 2.4 and 2.2.2), It will batch delete the ambari.alert_notice, ambari.alert_current ,ambari.alert_history tablesPrior to the date mentioned.

# ambari-server db-cleanup -d 2017-09-15 --cluster-name=DemoCluster

.

From amabri 2.5 this is more enhanced and changed as "db-purge-history" . The db-purge-history command will analyze mode tables (in addition to onlye the alerts table)

https://docs.hortonworks.com/HDPDocuments/Ambari-2.5.2.0/bk_ambari-administration/content/purging-am...

avatar
Explorer

@Jay SenSharma

Thank you very much for the reference. I will keep this approach in my notes in case the answer below fails to work.

avatar
Super Collaborator

You can clear out an individual alert's state by disabling it and then re-enabling it. This will cause all active instances of that alert to disappear and it will run clean. On some versions of Ambari, this was required when you did things like delete a host which could leave orphaned alerts which never run again (and thus become stale).

If you are seeing the actual "Stale Alert" trigger, you'll want to identify which alerts are causing it to fire - in other words, which alerts are not running. Disabling/Enabling those could help - but if they seem to continue to be stale, then something else is going on which is preventing them from running.

avatar
Explorer
@Jonathan Hurley

Thank you Jonathan! Disabling and re-enabling worked just fine.

avatar

I have purge some alerts by Enalbe/Disable. but one alert appeared again after Enable/Disable.
How to purge it ? Please see the image :

42971-data-node-healthy.jpg

Thanks

avatar
Super Collaborator

Hi Mudassar,

Generally, it's better to open a new issue instead of tacking onto an existing one since the problem/resolution could be very different.

To answer your question, no, you can't clear it in this case. This is a metric alert coming from HDFS. The HDFS service is broadcasting that 1 DataNode is considered dead. Ambari is simply detecting this and alerting on it. You'll need to figure out why the NameNode is sending that metric.

Normally I think the NN considers a DataNode "dead" after more than a few minutes of lost contact (without a decommission). However, if the DataNode makes contact again, it should be clearing it.

avatar
Explorer

@Mudassar Hussain

When I posted my initial question it was pertaining to alerts that are triggered because of HDFS storage capacity exceeding a configurable threshold. Then there are some alerts that trigger once a week, month about capacity or if you restarted a service. In short these are trivial alerts that are just annoying but not detrimental to the health of the cluster.

In your image you provided correct me if I am wrong, it appears you have an issue with your namenode. If that is accurate that is serious and that is an alert you do not want to ignore. Have you checked to see if your name node volumes are corrupted? A datanode is corrupted or exceded capacity?

I found this article as well. Not sure if this is precisely related:

https://community.hortonworks.com/questions/81840/node-in-maintenance-mode-throws-stale-alert-from-m...

avatar

Thanks @Jonathan Hurley got your point regarding raising question. My @Namenode was on Safemode that time when i was facing this alert. i put the @namenode into Normal mode. now the alert Purge. is this exactly the reason ? whats your views
Thanks