We have an org-wide monitoring and alerting system that is being used. We now want to integrate HDP related metrics as well into this system which works on scanning the log files with regular expressions and alerting in case of any threshold breaches.
I have the following questions regarding this scenario:
Thanks in advance.
1. I don't think its good idea to scan the log files for HDP related metrics. ambari-alerts.log does not have any metrics. AMS provides APIs to read the metrics and you can try and use that - you can find documentation at https://cwiki-test.apache.org/confluence/display/AMBARI/Metrics+Collector+API+Specification
2. ambari-alert.log is only for debugging purpose - if you would like to find out the alerts generated then there is a notification feature which you can consider using that.
3. Currently Grafana does not provide an option (or even Ambari) to configure the alerts for metrics.
Note: Please mark this as correct answer if you satiesfied
Ambari is capable of taking the custom script as input - so you can write your own custom script and within that you decide what ever way you wanted. you can follow the document at https://risdenk.github.io/2018/03/25/apache-ambari-custom-alert-dispatch-script.html
Thanks @amarnath reddy pappu for your response. It provides some food for thought to further explore and better our understanding. We intend to use the external system mainly for alerting purposes so that the support team can act quickly in case any action is required. I was thinking of using the ambari-alerts.log and other log files for only alerting purposes.
As the present alerting mechanism is only within Ambari, and our Ops team want to use their existing setup, wondering what is the best possible alternative if scanning the log files is not the way to go?