Created on 06-07-2016 02:16 PM - edited 08-17-2019 12:03 PM
This article talks about two things:
1. How to develop a simple python script which returns the result_code like 'CRITICAL', 'WARN', 'OK', 'UNKNOWN' based on certain conditions matching our requirement.
2. How to write an alert definition in json format so that we can install those alerts to the Ambari Server in order to get the alerts.
The scripts/files attached to this article are "test_alert_disk_space.py" and "alerts.json" that we are going to use as part of the following steps:
Step-1). Create a python script as attached "test_alert_disk_space.py"
that finds the list of mount points and the it returns the disk usage status of those mount points. The based on the percentages specified it returns the return code as
'CRITICAL', 'WARN', 'OK', 'UNKNOWN'
Step-2). Place the file "test_alert_disk_space.py" in the following Path on the ambari-server : "/var/lib/ambari-server/resources/host_scripts/"
Example:
cp -f test_alert_disk_space.py /var/lib/ambari-server/resources/host_scripts
Step-3). Now restart the Ambari-Server.
Also we will need to restart the ambari-agents on each hosts so that it (agents) can pull the script from the ambari server. In this case when we restart ambari agents then the file "test_alert_disk_space.py" will be fetched by agents and will be stored inside the ambari-agent cache dir: "/var/lib/ambari-agent/cache/host_scripts" on agent hosts.
Step-4). Run the following command to list all the existing alerts:
curl -u admin:admin -i -H 'X-Requested-By:ambari' -X GET http://node1.example.com:8080/api/v1/clusters/ClusterDemo/alert_definitions
Here Ambari Server Host & port is : "node1.example.com:8080" and Cluster name is : "ClusterDemo"
Step-5). Install the custom alert using Curl command as following:
curl -u admin:admin -i -H 'X-Requested-By:ambari' -X POST -d @alerts.json http://node1.example.com:8080/api/v1/clusters/ClusterDemo/alert_definitions
The output of the above command execution should have the response code as 201 (created). If the response code is 500 (Internal Server Error) or 404 ( Resource not fount) then please double check the command URL/json file.
Check the ambari console to find if the alerts are getting triggered fine or not. Alternatively run the command mentioned in Step-4 to verify of the custom alert is registered fine or not. If needed please do "ambari-server restart".
In the "test_alert_disk_space.py" script users can change the values of "PERCENT_USED_WARNING_KEY" and "PERCENT_USED_CRITICAL_KEY" as per their requirement.
# script parameter keys MIN_FREE_SPACE_KEY = "minimum.free.space" PERCENT_USED_WARNING_KEY = "percent.used.space.warning.threshold" PERCENT_USED_CRITICAL_KEY = "percent.free.space.critical.threshold" PERCENT_USED_WARNING_KEY = 60 PERCENT_USED_CRITICAL_KEY = 80
Any changes if we make in this file then the ambari-server restart is needed so that ambari-server can push those changes to the agent hosts.
We should be able to see the following kind of alerts based on the the should we have set:
List of every mount point and it's disk usage percentage.
[OPTIONAL]
Manually Running Alert
If we want to manually run the alert then do the following (Notice the "?run_now=true" part in the url)
curl -u admin:admin -i -H 'X-Requested-By:ambari' -X PUT http://node1.example.com:8080/api/v1/clusters/ClusterDemo/alert_definitions/151?run_now=true
Notice here the "151 is the "alert id" which we can get from the ambari console alert definitions page. or using the curl GET command from Step-4.
Deleting the Alertscurl -u admin:admin -i -H 'X-Requested-By:ambari' -X DELETE http://node1.example.com:8080/api/v1/clusters/ClusterDemo/alert_definitions/151
Reference: https://cwiki.apache.org/confluence/display/AMBARI/Creating+a+Script-based+Alert+Dispatcher
Attachments: custom-alerts.zip
Created on 06-07-2016 02:16 PM
Remember that users might do mistake in changing the threshold of alert in "PERCENT_USED_WARNING_DEFAULT" (default value), Instead of the correct variable "PERCENT_USED_WARNING_KEY"
Created on 07-19-2016 04:48 PM
Hi Jay SenSharma.
This is a great tutorial. You provided a complete tutorial. I followed up the steps and it seems alerts are being generated for disk usage.
I have a question about the json file format. If I would create my own alerts I need to know what are the required fields and what are the expected values? Do you have a documentation?
Regards,
Alex Feldman
Created on 07-21-2016 01:30 AM
You can see examples of the various alert definitions by getting them using API. The JSON returned can be used as a template to create your own definition. https://docs.daplab.ch/ambari_cheat_sheet/ eg.
curl -v -X GET -u ${ambari_credentials} -H 'X-Requested-By:ambari' https://admin.daplab.ch/api/v1/clusters/DAPLAB02/alert_definitions?AlertDefinition/service_name=HIVE
The 5 alert types are here.
Created on 08-05-2016 11:54 PM
Hi Jay SenSharma.
After some research these are my latest comments. Since this is the community site my comments are intended to be helpful to other users of this portal.
While the Short Description promised to cover two things, none was actually covered.
1. How to develop a simple python script which returns the result code like 'CRITICAL', 'WARN', 'OK', 'UNKNOWN' based on certain conditions matching our requirement.
This point was never covered since no script was developed. The script was provided to complete this exercise. It was not mentioned that the script should be written using Python. The statement “a simple python script which returns the … code” is misleading, since script does not return any strings listed. The method exec does…when it is being called. It seems that the script must contain the exec method that is being called by ambary framework.
In addition, the method exec does not return just a token string… It returns a json with a listed token being one of the values…
2. How to write an alert definition in json format so that we can install those alerts to the Ambari Server in order to get the alerts.
This point is not covered as well. See my prior posting. Considering that we only have 4 API calls to play with: GET, POST, PUT, DELETE to list required fields used in json and their expected values would be appropriate.
Good Samaritan Andrew Sears provided the link to Ambari Cheat Sheet site… the example lists a shell script to be used and python script with the identical name in the json to be used in API call… for more confusion I presume.
On the other hand it lists json that is relatively easy to “comprehend” by initiated.
Still what is the meaning of the fields "ignore_host"? "component_name"? "service_name"? "scope"?
In conclusion: for the article to be helpful it needs more actual details.
Regards,
Alex Feldman
Created on 08-11-2016 07:32 PM
@Jay SenSharmaHi, I almost developed my script and then found this. Why can't we simply execute df -h from python and parse the result instead of doing all the calculations? Any problem with this approach?
Created on 07-31-2017 08:40 AM
Thanks for the awesome article.
When I follow your example, everything works fine. However, it I change a script an restart ambari-server, the changes will not be propagated to the amber-agents, I have to restart the agent manually to trigger a sync.
Is there a sync interval setting for the agents?
Created on 07-31-2017 09:10 AM
Actually we will need to restart the ambari agents on all the hosts, so that during the restart of agents they will pull the scripts(or changes in the script) from ambari server. I have fixed the Step3) as following:
Step-3). Now restart the Ambari-Server.
Also we will need to restart the ambari-agents on each hosts so that it (agents) can pull the script from the ambari server. In this case when we restart ambari agents then the file "test_alert_disk_space.py" will be fetched by agents and will be stored inside the ambari-agent cache dir: "/var/lib/ambari-agent/cache/host_scripts" on agent hosts..
Created on 07-31-2017 11:19 AM
Thanks Jay for your awesome support,
it would be very helpful if you could provide information on how to access ambari configuration in service checks. When I let ambari print the configurations and parameters variables of the execute function, then configurations is empty and parameters contains my parameters as well as "kerberos.kinit.timer": 14400000.
EDIT: I found the answer by digging through the Ambari source code, if anyone else is struggling, see: Ambari @GitHub
Created on 09-08-2017 02:21 AM
Is that only me can't extract the downloaded custom-alerts.zip?
Created on 09-08-2017 02:32 AM
I just tried downloading the "4811-custom-alerts.zip" and i can extract it properly. How are you downloading it.
$ md5 4811-custom-alerts.zip MD5 (4811-custom-alerts.zip) = a33f105860d07e05149b68960a0ea0c9
.
