Community Articles

Find and share helpful community-sourced technical articles.
Labels (1)
avatar

In order to check the status and stability of your cluster it makes sense to run the service checks that are included in Ambari. Usually each Ambari Service provides its own service check, but their might be services that wont include any service check at all. To run a service check you have to select the service (e.g. HDFS) in Ambari and click "Run Service Check" in the "Actions" dropdown menu.

Service Checks can be started via the Ambari API and it is also possible to start all available service checks with a single API command. To bulk run these checks it is necessary to use the same API/method that is used to trigger a rolling restart of Datanodes (request_schedules). The "request_schedules" API starts all defined commands in the specified order, its even possible to specify a pause between the commands.

Available Service Checks:

Service Name service_name Command
HDFS HDFS HDFS_SERVICE_CHECK
YARN YARN YARN_SERVICE_CHECK
MapReduce2 MAPREDUCE2 MAPREDUCE2_SERVICE_CHECK
HBase HBASE HBASE_SERVICE_CHECK
Hive HIVE HIVE_SERVICE_CHECK
WebHCat WEBHCAT WEBHCAT_SERVICE_CHECK
Pig PIG PIG_SERVICE_CHECK
Falcon FALCON FALCON_SERVICE_CHECK
Storm STORM STORM_SERVICE_CHECK
Oozie OOZIE OOZIE_SERVICE_CHECK
ZooKeeper ZOOKEEPER ZOOKEEPER_QUORUM_SERVICE_CHECK
Tez TEZ TEZ_SERVICE_CHECK
Sqoop SQOOP SQOOP_SERVICE_CHECK
Ambari Metrics AMBARI_METRICS AMBARI_METRICS_SERVICE_CHECK
Atlas ATLAS ATLAS_SERVICE_CHECK
Kafka KAFKA KAFKA_SERVICE_CHECK
Knox KNOX KNOX_SERVICE_CHECK
Spark SPARK SPARK_SERVICE_CHECK
SmartSense SMARTSENSE SMARTSENSE_SERVICE_CHECK
Ranger RANGER RANGER_SERVICE_CHECK

Note: Make sure you replace user, password, clustername and ambari-server with the actual values

Start single service check via Ambari API (e.g. HDFS Service Check):

curl -ivk -H "X-Requested-By: ambari" -u <user>:<password> -X POST -d @payload http://<ambari-server>:8080/api/v1/clusters/<clustername>/requests

Payload:

{
   "RequestInfo":{
      "context":"HDFS Service Check",
      "command":"HDFS_SERVICE_CHECK"
   },
   "Requests/resource_filters":[
      {
         "service_name":"HDFS"
      }
   ]
}

Start bulk Service checks via Ambari API (e.g. HDFS, Yarn, MapReduce2 Service Checks):

curl -ivk -H "X-Requested-By: ambari" -u <user>:<password> -X POST -d @payload http://<ambari-server>:8080/api/v1/clusters/<clustername>/request_schedules

Payload:

[
   {
      "RequestSchedule":{
         "batch":[
            {
               "requests":[
                  {
                     "order_id":1,
                     "type":"POST",
                     "uri":"/api/v1/clusters/<clustername>/requests",
                     "RequestBodyInfo":{
                        "RequestInfo":{
                           "context":"HDFS Service Check (batch 1 of 3)",
                           "command":"HDFS_SERVICE_CHECK"
                        },
                        "Requests/resource_filters":[
                           {
                              "service_name":"HDFS"
                           }
                        ]
                     }
                  },
                  {
                     "order_id":2,
                     "type":"POST",
                     "uri":"/api/v1/clusters/<clustername>/requests",
                     "RequestBodyInfo":{
                        "RequestInfo":{
                           "context":"YARN Service Check (batch 2 of 3)",
                           "command":"YARN_SERVICE_CHECK"
                        },
                        "Requests/resource_filters":[
                           {
                              "service_name":"YARN"
                           }
                        ]
                     }
                  },
                  {
                     "order_id":3,
                     "type":"POST",
                     "uri":"/api/v1/clusters/<clustername>/requests",
                     "RequestBodyInfo":{
                        "RequestInfo":{
                           "context":"MapReduce Service Check (batch 3 of 3)",
                           "command":"MAPREDUCE2_SERVICE_CHECK"
                        },
                        "Requests/resource_filters":[
                           {
                              "service_name":"MAPREDUCE2"
                           }
                        ]
                     }
                  }
               ]
            },
            {
               "batch_settings":{
                  "batch_separation_in_seconds":1,
                  "task_failure_tolerance":1
               }
            }
         ]
      }
   }
]

This is returned by the api

{
  "resources" : [
    {
      "href" : "http://<ambari-server>:8080/api/v1/clusters/<clustername>/request_schedules/68",
      "RequestSchedule" : {
        "id" : 68
      }
    }
  ]
}

This is what it looks like in Ambari

1573-screen-shot-2016-01-23-at-94120-am.png

Payload to run all Service Checks

Please see this gist: https://gist.github.com/mr-jstraub/0b55de318eeae6695c3f#payload-to-run-all-service-checks

15,330 Views
Comments
avatar
Contributor

This is what you need to start creating automated cluster health checks. You can parse the return from the curl command and use the results to trigger monitoring tools.

avatar
Contributor

I've published a CLI tool to handle all of this more easily including auto-generating the payload, inferring the cluster name and services to check etc. It has --help with lots of options, including features for --wait which tracks the progress status of the request and returns only when complete, and --cancel to stop any outstanding service checks if you accidentally launch too many by playing with the tool 🙂

You can find it on my github here:

https://github.com/harisekhon/pytools

./ambari_trigger_service_checks.py --help

examples:

./ambari_trigger_service_checks.py --all

./ambari_trigger_service_checks.py --cancel

./ambari_trigger_service_checks.py --services hdfs,yarn --wait

avatar
Contributor

I've actually already published Nagios Plugins that integrate with the Ambari API which can retrieve the service & host states, health, alerts, even detect stale configs. You can just run them as is using the option switches in any normal open source monitoring platform that supports nagios plugins, see here:

https://github.com/harisekhon/nagios-plugins

If you want to proactively trigger service checks as well you can also use the tool I wrote specifically for that which I mentioned in the other comment on this page.

avatar
Rising Star

@Jonas Straub - Nice article!

Can you please update the commands with the following additional service checks?

RANGER_KMS_SERVICE_CHECK, AMBARI_INFRA_SERVICE_CHECK, KERBEROS_SERVICE_CHECK, SLIDER_SERVICE_CHECK