Created 10-09-2018 06:11 AM
I'm using Ambari 2.6.2.2 and Ambari 2.6.2.2 has "Service Auto Start Configuration" that enables a component restarting when it went down unexpectedly.
However, I could not find an automatic "Restart" operation in the operations history when the automatic restart functionality worked.
How can I get to know that the automatic component restart happened?
Created 10-09-2018 06:41 AM
In Ambari UI operation log you can not see the operations that are performed by Ambari Internally via Agents. Only user performed operations (explicit operations) can be seen there.
The "/usr/lib/ambari-agent/lib/ambari_agent/RecoveryManager.py" is basically responsible for recovery of service components.
For example: When we kill AMS collector and if the Auto Restart is enable for this component then we can see the following kind of message in the Agent log to know if the "AUTO_EXECUTION_COMMAND" was performed.
# grep 'Adding recovery command START for component' /var/log/ambari-agent/ambari-agent.log INFO 2018-10-09 06:33:52,324 Controller.py:410 - Adding recovery command START for component METRICS_COLLECTOR . . INFO 2018-10-09 06:33:52,325 ActionQueue.py:113 - Adding AUTO_EXECUTION_COMMAND for role METRICS_COLLECTOR for service AMBARI_METRICS of cluster NewCluster to the queue. . INFO 2018-10-09 06:36:25,643 RecoveryManager.py:185 - current status is set to STARTED for METRICS_COLLECTOR
.
Or just grep that script:
# grep 'RecoveryManager.py' /var/log/ambari-agent/ambari-agent.log INFO 2018-10-09 06:33:52,310 RecoveryManager.py:255 - METRICS_COLLECTOR needs recovery, desired = STARTED, and current = INSTALLED. INFO 2018-10-09 06:36:25,643 RecoveryManager.py:185 - current status is set to STARTED for METRICS_COLLECTOR
.
Created 10-09-2018 06:41 AM
In Ambari UI operation log you can not see the operations that are performed by Ambari Internally via Agents. Only user performed operations (explicit operations) can be seen there.
The "/usr/lib/ambari-agent/lib/ambari_agent/RecoveryManager.py" is basically responsible for recovery of service components.
For example: When we kill AMS collector and if the Auto Restart is enable for this component then we can see the following kind of message in the Agent log to know if the "AUTO_EXECUTION_COMMAND" was performed.
# grep 'Adding recovery command START for component' /var/log/ambari-agent/ambari-agent.log INFO 2018-10-09 06:33:52,324 Controller.py:410 - Adding recovery command START for component METRICS_COLLECTOR . . INFO 2018-10-09 06:33:52,325 ActionQueue.py:113 - Adding AUTO_EXECUTION_COMMAND for role METRICS_COLLECTOR for service AMBARI_METRICS of cluster NewCluster to the queue. . INFO 2018-10-09 06:36:25,643 RecoveryManager.py:185 - current status is set to STARTED for METRICS_COLLECTOR
.
Or just grep that script:
# grep 'RecoveryManager.py' /var/log/ambari-agent/ambari-agent.log INFO 2018-10-09 06:33:52,310 RecoveryManager.py:255 - METRICS_COLLECTOR needs recovery, desired = STARTED, and current = INSTALLED. INFO 2018-10-09 06:36:25,643 RecoveryManager.py:185 - current status is set to STARTED for METRICS_COLLECTOR
.
Created 10-09-2018 07:51 AM
Thanks, @Jay Kumar SenSharma!