Community Articles

Find and share helpful community-sourced technical articles.
avatar
Expert Contributor

Auto-recovery in Ambari is a useful way of getting cluster components restarted automatically in the event that a component fails (without the need for human intervention).

Ambari 2.4.0 introduced dynamic auto-recovery, which allows auto-start properties to be configured without needing an ambari-agent / ambari-server restart. Currently, the simplest way to manage the auto-recovery features within Ambari is via the REST API (documented within this article), although on-going work in the community will bring the feature to the UI: https://issues.apache.org/jira/browse/AMBARI-2330

Check Auto-Recovery Settings

To check if auto-recovery is enabled for all components, run the following command on the Ambari server node:

curl -u admin:<password> -i -H 'X-Requested-By: ambari' -X GET http://localhost:8080/api/v1/clusters/<cluster_name>/components?fields=ServiceComponentInfo/componen...

Note, you will need to replace with your own <password> and <cluster_name>.

The output of the above command will look something like this:

...
  "items" : [
    {
      "href" : "http://localhost:8080/api/v1/clusters/horton/components/APP_TIMELINE_SERVER",
      "ServiceComponentInfo" : {
        "category" : "MASTER",
        "cluster_name" : "horton",
        "component_name" : "APP_TIMELINE_SERVER",
        "recovery_enabled" : "false",
        "service_name" : "YARN"
      }
    },
    {
      "href" : "http://localhost:8080/api/v1/clusters/horton/components/DATANODE",
      "ServiceComponentInfo" : {
        "category" : "SLAVE",
        "cluster_name" : "horton",
        "component_name" : "DATANODE",
        "recovery_enabled" : "false",
        "service_name" : "HDFS"
      }
    },
...

Notice the "recovery_enabled" : "false" flag on each component.

Enable Auto-Recovery for HDP Components

To enable auto-recovery for a single component (in this case HBASE_REGIONSERVER):

curl -u admin:<password> -H "X-Requested-By: ambari" -X PUT 'http://localhost:8080/api/v1/clusters/<cluster_name>/components?ServiceComponentInfo/component_name.in(HBASE_REGIONSERVER)' -d '{"ServiceComponentInfo" : {"recovery_enabled":"true"}}'

To enable auto-recovery for multiple HDP components:

curl -u admin:<password> -H "X-Requested-By: ambari" -X PUT 'http://localhost:8080/api/v1/clusters/<cluster_name>/components?ServiceComponentInfo/component_name.in(APP_TIMELINE_SERVER,DATANODE,HBASE_MASTER,HBASE_REGIONSERVER,HISTORYSERVER,HIVE_METASTORE,HIVE_SERVER,INFRA_SOLR,LIVY_SERVER,LOGSEARCH_LOGFEEDER,LOGSEARCH_SERVER,METRICS_COLLECTOR,METRICS_GRAFANA,METRICS_MONITOR,MYSQL_SERVER,NAMENODE,NODEMANAGER,RESOURCEMANAGER,SECONDARY_NAMENODE,WEBHCAT_SERVER,ZOOKEEPER_SERVER)' -d '{"ServiceComponentInfo" : {"recovery_enabled":"true"}}'

Enable Auto-Recovery for HDF Components

The process is the same for an Ambari managed HDF cluster, here is an example of enabling auto-recovery for the HDF services:

curl -u admin:<password> -H "X-Requested-By: ambari" -X PUT 'http://localhost:8080/api/v1/clusters/<cluster_name>/components?ServiceComponentInfo/component_name.in(NIFI_MASTER,ZOOKEEPER_SERVER,KAFKA_BROKER,INFRA_SOLR,LOGSEARCH_LOGFEEDER,LOGSEARCH_SERVER,METRICS_COLLECTOR,METRICS_GRAFANA,METRICS_MONITOR)' -d '{"ServiceComponentInfo" : {"recovery_enabled":"true"}}'

Using an older version of Ambari?

If you're using an older version of Ambari (older than 2.4.0), check out the following ambari doc for details on how to enable auto-recovery via the ambari.properties file:

https://cwiki.apache.org/confluence/display/AMBARI/Recovery%3A+auto+start+components

7,279 Views
Comments

I got

{  "status": 500,
  "message": "Server Error"
}

at 2.5 HDP when try this solution