Created 03-07-2016 05:02 PM
If a node running ambari-agent reboots or one component (one kafka broker for instance) fails due to some out of memory issue. Is it possible to automatically restart the service without a human intervention ?
Supervisord or systemctl may handle this kind of issue. However, with Ambari I am not sure this is supported. I was reading the following jira which looks to be related to this subject:
I did the with ambari-2.2.0 test and once the VM was up again, components did not recover automatically. I need to restart them manually from Ambari...
I think it is important to have this feature especially in a production environement!
Created 04-23-2016 10:19 AM
I actually tested it with ambari 2.2.0. I did not add the properties: recovery.lifetime_max_count, recovery.window_in_minutes... I just used the predefined properties in ambari.properties. Then I tested it with KAFKA_BROKER and NODEMANAGER, ZOOKEEPER_SERVER then I killed the kafka, zk, nm processes manually, and it worked! On monday, I will test it with ambari 2.2.1..
Created 03-07-2016 05:09 PM
Created 03-07-2016 05:16 PM
@Artem ErvitsI think this command makes ambari-agent to restart after a reboot but not the components running on the host right ? or am I wrong ?
Created 03-07-2016 05:38 PM
@Ali Gouta that's correct, you can have agent restart on it's own but for components, it's a lot more involved and @vpoornalingam answer touches on that.
Created 03-07-2016 05:10 PM
Yes it isn't possible to do this directly. But you could use Ambari API to check the current status of services in a given node and restart them if required using the API's.
Created 03-07-2016 07:01 PM
The auto-restart functionality in AMBARI-10029 requires few ambari config changes. Did you make appropriate config changes?
Created 03-07-2016 07:37 PM
@jluniya , ah I have just opened the pdf. It suggests to tweak some properties at the agents. such as: (recovery.type, recovery.lifetime_max_count, recovery.max_count, ...). Is this the answer ? impatient to give it a try tomorrow !
Created 04-22-2016 10:08 PM
In ambari 2.2.1.0 this feature is not working on my setup.
First I tried with the following settings in my ambari.properties file
recovery.lifetime_max_count=4
recovery.retry_interval=1
recovery.max_count=5
recovery.type=AUTO_START
recovery.window_in_minutes=20
On rebooting one of the nodes, the node came back up with Ambari agent running and connecting to Ambari Server. But none of the components started.
Then added the following setting as well and still no luck.
recovery.enabled_components=METRICS_COLLECTOR,NAMENODE,DATANODE,ZKFC,JOURNANODE,RESOURCEMANAGER,NODEMANAGER,APP_TIMELINE_SERVER,ZOOKEEPER_SERVER,HISTORYSERVER,SPARK_JOBHISTORYSERVER
Created 04-23-2016 10:19 AM
I actually tested it with ambari 2.2.0. I did not add the properties: recovery.lifetime_max_count, recovery.window_in_minutes... I just used the predefined properties in ambari.properties. Then I tested it with KAFKA_BROKER and NODEMANAGER, ZOOKEEPER_SERVER then I killed the kafka, zk, nm processes manually, and it worked! On monday, I will test it with ambari 2.2.1..