Created on 01-26-2016 09:10 PM - edited 08-17-2019 01:26 PM
In order to check the status and stability of your cluster it makes sense to run the service checks that are included in Ambari. Usually each Ambari Service provides its own service check, but their might be services that wont include any service check at all. To run a service check you have to select the service (e.g. HDFS) in Ambari and click "Run Service Check" in the "Actions" dropdown menu.
Service Checks can be started via the Ambari API and it is also possible to start all available service checks with a single API command. To bulk run these checks it is necessary to use the same API/method that is used to trigger a rolling restart of Datanodes (request_schedules). The "request_schedules" API starts all defined commands in the specified order, its even possible to specify a pause between the commands.
Available Service Checks:
Service Name | service_name | Command |
---|---|---|
HDFS | HDFS | HDFS_SERVICE_CHECK |
YARN | YARN | YARN_SERVICE_CHECK |
MapReduce2 | MAPREDUCE2 | MAPREDUCE2_SERVICE_CHECK |
HBase | HBASE | HBASE_SERVICE_CHECK |
Hive | HIVE | HIVE_SERVICE_CHECK |
WebHCat | WEBHCAT | WEBHCAT_SERVICE_CHECK |
Pig | PIG | PIG_SERVICE_CHECK |
Falcon | FALCON | FALCON_SERVICE_CHECK |
Storm | STORM | STORM_SERVICE_CHECK |
Oozie | OOZIE | OOZIE_SERVICE_CHECK |
ZooKeeper | ZOOKEEPER | ZOOKEEPER_QUORUM_SERVICE_CHECK |
Tez | TEZ | TEZ_SERVICE_CHECK |
Sqoop | SQOOP | SQOOP_SERVICE_CHECK |
Ambari Metrics | AMBARI_METRICS | AMBARI_METRICS_SERVICE_CHECK |
Atlas | ATLAS | ATLAS_SERVICE_CHECK |
Kafka | KAFKA | KAFKA_SERVICE_CHECK |
Knox | KNOX | KNOX_SERVICE_CHECK |
Spark | SPARK | SPARK_SERVICE_CHECK |
SmartSense | SMARTSENSE | SMARTSENSE_SERVICE_CHECK |
Ranger | RANGER | RANGER_SERVICE_CHECK |
Note: Make sure you replace user, password, clustername and ambari-server with the actual values
Start single service check via Ambari API (e.g. HDFS Service Check):
curl -ivk -H "X-Requested-By: ambari" -u <user>:<password> -X POST -d @payload http://<ambari-server>:8080/api/v1/clusters/<clustername>/requests
Payload:
{ "RequestInfo":{ "context":"HDFS Service Check", "command":"HDFS_SERVICE_CHECK" }, "Requests/resource_filters":[ { "service_name":"HDFS" } ] }
curl -ivk -H "X-Requested-By: ambari" -u <user>:<password> -X POST -d @payload http://<ambari-server>:8080/api/v1/clusters/<clustername>/request_schedules
Payload:
[ { "RequestSchedule":{ "batch":[ { "requests":[ { "order_id":1, "type":"POST", "uri":"/api/v1/clusters/<clustername>/requests", "RequestBodyInfo":{ "RequestInfo":{ "context":"HDFS Service Check (batch 1 of 3)", "command":"HDFS_SERVICE_CHECK" }, "Requests/resource_filters":[ { "service_name":"HDFS" } ] } }, { "order_id":2, "type":"POST", "uri":"/api/v1/clusters/<clustername>/requests", "RequestBodyInfo":{ "RequestInfo":{ "context":"YARN Service Check (batch 2 of 3)", "command":"YARN_SERVICE_CHECK" }, "Requests/resource_filters":[ { "service_name":"YARN" } ] } }, { "order_id":3, "type":"POST", "uri":"/api/v1/clusters/<clustername>/requests", "RequestBodyInfo":{ "RequestInfo":{ "context":"MapReduce Service Check (batch 3 of 3)", "command":"MAPREDUCE2_SERVICE_CHECK" }, "Requests/resource_filters":[ { "service_name":"MAPREDUCE2" } ] } } ] }, { "batch_settings":{ "batch_separation_in_seconds":1, "task_failure_tolerance":1 } } ] } } ]
This is returned by the api
{ "resources" : [ { "href" : "http://<ambari-server>:8080/api/v1/clusters/<clustername>/request_schedules/68", "RequestSchedule" : { "id" : 68 } } ] }
This is what it looks like in Ambari
Please see this gist: https://gist.github.com/mr-jstraub/0b55de318eeae6695c3f#payload-to-run-all-service-checks
Created on 05-20-2016 04:50 AM
This is what you need to start creating automated cluster health checks. You can parse the return from the curl command and use the results to trigger monitoring tools.
Created on 09-27-2016 06:23 PM
I've published a CLI tool to handle all of this more easily including auto-generating the payload, inferring the cluster name and services to check etc. It has --help with lots of options, including features for --wait which tracks the progress status of the request and returns only when complete, and --cancel to stop any outstanding service checks if you accidentally launch too many by playing with the tool 🙂
You can find it on my github here:
https://github.com/harisekhon/pytools
./ambari_trigger_service_checks.py --help
examples:
./ambari_trigger_service_checks.py --all
./ambari_trigger_service_checks.py --cancel
./ambari_trigger_service_checks.py --services hdfs,yarn --wait
Created on 09-27-2016 07:41 PM
I've actually already published Nagios Plugins that integrate with the Ambari API which can retrieve the service & host states, health, alerts, even detect stale configs. You can just run them as is using the option switches in any normal open source monitoring platform that supports nagios plugins, see here:
https://github.com/harisekhon/nagios-plugins
If you want to proactively trigger service checks as well you can also use the tool I wrote specifically for that which I mentioned in the other comment on this page.
Created on 03-09-2018 05:41 PM
@Jonas Straub - Nice article!
Can you please update the commands with the following additional service checks?
RANGER_KMS_SERVICE_CHECK, AMBARI_INFRA_SERVICE_CHECK, KERBEROS_SERVICE_CHECK, SLIDER_SERVICE_CHECK