Created 08-21-2018 03:41 PM
We have an org-wide monitoring and alerting system that is being used. We now want to integrate HDP related metrics as well into this system which works on scanning the log files with regular expressions and alerting in case of any threshold breaches.
I have the following questions regarding this scenario:
Thanks in advance.
Created 08-21-2018 03:58 PM
1. I don't think its good idea to scan the log files for HDP related metrics. ambari-alerts.log does not have any metrics. AMS provides APIs to read the metrics and you can try and use that - you can find documentation at https://cwiki-test.apache.org/confluence/display/AMBARI/Metrics+Collector+API+Specification
2. ambari-alert.log is only for debugging purpose - if you would like to find out the alerts generated then there is a notification feature which you can consider using that.
3. Currently Grafana does not provide an option (or even Ambari) to configure the alerts for metrics.
Note: Please mark this as correct answer if you satiesfied
Created 08-21-2018 04:19 PM
@Greenhorn Techie As I mentioned - Ambari is capable of sending alert notifications- you can consider using that if your system is capable of processing SNMP/EMAIL notifications.
Created 08-21-2018 08:17 PM
@amarnath reddy pappu our organisation requirement is to use enterprise wide tool instead of Ambari and hence that option is ruled out altogether.
Created 08-21-2018 08:39 PM
Ambari is capable of taking the custom script as input - so you can write your own custom script and within that you decide what ever way you wanted. you can follow the document at https://risdenk.github.io/2018/03/25/apache-ambari-custom-alert-dispatch-script.html
Created 08-23-2018 04:43 PM
@Greenhorn Techie If you are happy with the explanation/solution then please accept the correct answer so that it will help others.
Created 08-21-2018 04:09 PM
Thanks @amarnath reddy pappu for your response. It provides some food for thought to further explore and better our understanding. We intend to use the external system mainly for alerting purposes so that the support team can act quickly in case any action is required. I was thinking of using the ambari-alerts.log and other log files for only alerting purposes.
As the present alerting mechanism is only within Ambari, and our Ops team want to use their existing setup, wondering what is the best possible alternative if scanning the log files is not the way to go?
Created 08-23-2018 04:40 PM