Support Questions

Find answers, ask questions, and share your expertise

How can I get to know that the automatic component restart happened?

avatar
Contributor

I'm using Ambari 2.6.2.2 and Ambari 2.6.2.2 has "Service Auto Start Configuration" that enables a component restarting when it went down unexpectedly.

However, I could not find an automatic "Restart" operation in the operations history when the automatic restart functionality worked.

How can I get to know that the automatic component restart happened?

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Takefumi Oide

In Ambari UI operation log you can not see the operations that are performed by Ambari Internally via Agents. Only user performed operations (explicit operations) can be seen there.

The "/usr/lib/ambari-agent/lib/ambari_agent/RecoveryManager.py" is basically responsible for recovery of service components.

For example: When we kill AMS collector and if the Auto Restart is enable for this component then we can see the following kind of message in the Agent log to know if the "AUTO_EXECUTION_COMMAND" was performed.

# grep 'Adding recovery command START for component' /var/log/ambari-agent/ambari-agent.log
INFO 2018-10-09 06:33:52,324 Controller.py:410 - Adding recovery command START for component METRICS_COLLECTOR
.
.
INFO 2018-10-09 06:33:52,325 ActionQueue.py:113 - Adding AUTO_EXECUTION_COMMAND for role METRICS_COLLECTOR for service AMBARI_METRICS of cluster NewCluster to the queue.
.
INFO 2018-10-09 06:36:25,643 RecoveryManager.py:185 - current status is set to STARTED for METRICS_COLLECTOR

.

Or just grep that script:

# grep 'RecoveryManager.py'  /var/log/ambari-agent/ambari-agent.log
INFO 2018-10-09 06:33:52,310 RecoveryManager.py:255 - METRICS_COLLECTOR needs recovery, desired = STARTED, and current = INSTALLED.
INFO 2018-10-09 06:36:25,643 RecoveryManager.py:185 - current status is set to STARTED for METRICS_COLLECTOR

.

View solution in original post

2 REPLIES 2

avatar
Master Mentor

@Takefumi Oide

In Ambari UI operation log you can not see the operations that are performed by Ambari Internally via Agents. Only user performed operations (explicit operations) can be seen there.

The "/usr/lib/ambari-agent/lib/ambari_agent/RecoveryManager.py" is basically responsible for recovery of service components.

For example: When we kill AMS collector and if the Auto Restart is enable for this component then we can see the following kind of message in the Agent log to know if the "AUTO_EXECUTION_COMMAND" was performed.

# grep 'Adding recovery command START for component' /var/log/ambari-agent/ambari-agent.log
INFO 2018-10-09 06:33:52,324 Controller.py:410 - Adding recovery command START for component METRICS_COLLECTOR
.
.
INFO 2018-10-09 06:33:52,325 ActionQueue.py:113 - Adding AUTO_EXECUTION_COMMAND for role METRICS_COLLECTOR for service AMBARI_METRICS of cluster NewCluster to the queue.
.
INFO 2018-10-09 06:36:25,643 RecoveryManager.py:185 - current status is set to STARTED for METRICS_COLLECTOR

.

Or just grep that script:

# grep 'RecoveryManager.py'  /var/log/ambari-agent/ambari-agent.log
INFO 2018-10-09 06:33:52,310 RecoveryManager.py:255 - METRICS_COLLECTOR needs recovery, desired = STARTED, and current = INSTALLED.
INFO 2018-10-09 06:36:25,643 RecoveryManager.py:185 - current status is set to STARTED for METRICS_COLLECTOR

.

avatar
Contributor