Support Questions
Find answers, ask questions, and share your expertise

Restarting Components/services after rebooting a node or service failure

Contributor

If a node running ambari-agent reboots or one component (one kafka broker for instance) fails due to some out of memory issue. Is it possible to automatically restart the service without a human intervention ?

Supervisord or systemctl may handle this kind of issue. However, with Ambari I am not sure this is supported. I was reading the following jira which looks to be related to this subject:

AMBARI-10029

I did the with ambari-2.2.0 test and once the VM was up again, components did not recover automatically. I need to restart them manually from Ambari...

I think it is important to have this feature especially in a production environement!

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Restarting Components/services after rebooting a node or service failure

Contributor

I actually tested it with ambari 2.2.0. I did not add the properties: recovery.lifetime_max_count, recovery.window_in_minutes... I just used the predefined properties in ambari.properties. Then I tested it with KAFKA_BROKER and NODEMANAGER, ZOOKEEPER_SERVER then I killed the kafka, zk, nm processes manually, and it worked! On monday, I will test it with ambari 2.2.1..

View solution in original post

8 REPLIES 8

Re: Restarting Components/services after rebooting a node or service failure

Mentor

@Ali Gouta what about chkconfig? If you set

chkconfig ambari-agent on

wouldn't it do it for you?

Re: Restarting Components/services after rebooting a node or service failure

Contributor

@Artem ErvitsI think this command makes ambari-agent to restart after a reboot but not the components running on the host right ? or am I wrong ?

Re: Restarting Components/services after rebooting a node or service failure

Mentor

@Ali Gouta that's correct, you can have agent restart on it's own but for components, it's a lot more involved and @vpoornalingam answer touches on that.

Re: Restarting Components/services after rebooting a node or service failure

@Ali Gouta

Yes it isn't possible to do this directly. But you could use Ambari API to check the current status of services in a given node and restart them if required using the API's.

Re: Restarting Components/services after rebooting a node or service failure

Contributor

@Ali Gouta

The auto-restart functionality in AMBARI-10029 requires few ambari config changes. Did you make appropriate config changes?

Re: Restarting Components/services after rebooting a node or service failure

Contributor

@jluniya , ah I have just opened the pdf. It suggests to tweak some properties at the agents. such as: (recovery.type, recovery.lifetime_max_count, recovery.max_count, ...). Is this the answer ? impatient to give it a try tomorrow !

Re: Restarting Components/services after rebooting a node or service failure

New Contributor

In ambari 2.2.1.0 this feature is not working on my setup.

First I tried with the following settings in my ambari.properties file

recovery.lifetime_max_count=4

recovery.retry_interval=1

recovery.max_count=5

recovery.type=AUTO_START

recovery.window_in_minutes=20

On rebooting one of the nodes, the node came back up with Ambari agent running and connecting to Ambari Server. But none of the components started.

Then added the following setting as well and still no luck.

recovery.enabled_components=METRICS_COLLECTOR,NAMENODE,DATANODE,ZKFC,JOURNANODE,RESOURCEMANAGER,NODEMANAGER,APP_TIMELINE_SERVER,ZOOKEEPER_SERVER,HISTORYSERVER,SPARK_JOBHISTORYSERVER

Re: Restarting Components/services after rebooting a node or service failure

Contributor

I actually tested it with ambari 2.2.0. I did not add the properties: recovery.lifetime_max_count, recovery.window_in_minutes... I just used the predefined properties in ambari.properties. Then I tested it with KAFKA_BROKER and NODEMANAGER, ZOOKEEPER_SERVER then I killed the kafka, zk, nm processes manually, and it worked! On monday, I will test it with ambari 2.2.1..

View solution in original post