Created on 12-15-2016 03:11 PM
Auto-recovery in Ambari is a useful way of getting cluster components restarted automatically in the event that a component fails (without the need for human intervention).
Ambari 2.4.0 introduced dynamic auto-recovery, which allows auto-start properties to be configured without needing an ambari-agent / ambari-server restart. Currently, the simplest way to manage the auto-recovery features within Ambari is via the REST API (documented within this article), although on-going work in the community will bring the feature to the UI: https://issues.apache.org/jira/browse/AMBARI-2330
To check if auto-recovery is enabled for all components, run the following command on the Ambari server node:
curl -u admin:<password> -i -H 'X-Requested-By: ambari' -X GET http://localhost:8080/api/v1/clusters/<cluster_name>/components?fields=ServiceComponentInfo/componen...
Note, you will need to replace with your own <password> and <cluster_name>.
The output of the above command will look something like this:
... "items" : [ { "href" : "http://localhost:8080/api/v1/clusters/horton/components/APP_TIMELINE_SERVER", "ServiceComponentInfo" : { "category" : "MASTER", "cluster_name" : "horton", "component_name" : "APP_TIMELINE_SERVER", "recovery_enabled" : "false", "service_name" : "YARN" } }, { "href" : "http://localhost:8080/api/v1/clusters/horton/components/DATANODE", "ServiceComponentInfo" : { "category" : "SLAVE", "cluster_name" : "horton", "component_name" : "DATANODE", "recovery_enabled" : "false", "service_name" : "HDFS" } }, ...
Notice the "recovery_enabled" : "false" flag on each component.
To enable auto-recovery for a single component (in this case HBASE_REGIONSERVER):
curl -u admin:<password> -H "X-Requested-By: ambari" -X PUT 'http://localhost:8080/api/v1/clusters/<cluster_name>/components?ServiceComponentInfo/component_name.in(HBASE_REGIONSERVER)' -d '{"ServiceComponentInfo" : {"recovery_enabled":"true"}}'
To enable auto-recovery for multiple HDP components:
curl -u admin:<password> -H "X-Requested-By: ambari" -X PUT 'http://localhost:8080/api/v1/clusters/<cluster_name>/components?ServiceComponentInfo/component_name.in(APP_TIMELINE_SERVER,DATANODE,HBASE_MASTER,HBASE_REGIONSERVER,HISTORYSERVER,HIVE_METASTORE,HIVE_SERVER,INFRA_SOLR,LIVY_SERVER,LOGSEARCH_LOGFEEDER,LOGSEARCH_SERVER,METRICS_COLLECTOR,METRICS_GRAFANA,METRICS_MONITOR,MYSQL_SERVER,NAMENODE,NODEMANAGER,RESOURCEMANAGER,SECONDARY_NAMENODE,WEBHCAT_SERVER,ZOOKEEPER_SERVER)' -d '{"ServiceComponentInfo" : {"recovery_enabled":"true"}}'
The process is the same for an Ambari managed HDF cluster, here is an example of enabling auto-recovery for the HDF services:
curl -u admin:<password> -H "X-Requested-By: ambari" -X PUT 'http://localhost:8080/api/v1/clusters/<cluster_name>/components?ServiceComponentInfo/component_name.in(NIFI_MASTER,ZOOKEEPER_SERVER,KAFKA_BROKER,INFRA_SOLR,LOGSEARCH_LOGFEEDER,LOGSEARCH_SERVER,METRICS_COLLECTOR,METRICS_GRAFANA,METRICS_MONITOR)' -d '{"ServiceComponentInfo" : {"recovery_enabled":"true"}}'
If you're using an older version of Ambari (older than 2.4.0), check out the following ambari doc for details on how to enable auto-recovery via the ambari.properties file:
https://cwiki.apache.org/confluence/display/AMBARI/Recovery%3A+auto+start+components
Created on 01-31-2017 03:17 PM
I got
{ "status": 500,
"message": "Server Error"
}
at 2.5 HDP when try this solution