I have setup a new server using Ambari 18.104.22.168 on a CentOS 6.10 server. It is essentially the simplest Ambari setup possible with the server also acting as the remote agent and a single hadoop node also running on the same server. I have set all the services on the Ambari web platform to automatically restart upon reboot but rebooting the CentOS server does not result in the services automatically restarting in the Ambari web platform. I have to manually restart them. I have read and implemented the instructions outlined in:
The services still do not automatically restart after rebooting the server. Any ideas on how to figure out why the services do not automatically restart?
When your Host rebooted ... did you notice that the Ambari Agent came up?
Ambari Agent should actually start those components present in the host so please make sure that the Agent is configured as operating system service means as soon as the Host is rebooted the agent service should also come up.
In order to verify if the agent is performing recovery you shoudl grep the following in your agent logs:
# grep 'Adding recovery command START for component' /var/log/ambari-agent/ambari-agent.log
Yes, the Ambari agent came up and the log reports the 'Adding recovery command' entries for the individual components. This is strange, two days ago, I tried rebooting the server and waited an hour and no components started within that time frame so I assumed it was a failure. I tried again this morning and the components are now starting up within about 20 minutes and nothing related has been changed on the server that I know of in that time frame.
I guess my only question then is this a normal amount of time for the system to start all the components? The software is running on a VM using 8 cores of a Xeon E5-2683V3 processor and 64 GB of RAM. Are there any configuration changes I can make to speed up the starting of the components?
As mentioned earlier that in case of auro-recovery. Ambari Server detects the "desired State" and current state of the host components and if they do not match then it sends a "Recobery" instruction to the agents.
At this point you should see exact time when ambari server sent the "Recovery" command to the agent on the problematic host inside the "ambari-server.log"
Then you can check the "/var/log/ambari-agent.log" to know when it actually received the recovery command and action performed on it.
Using above way we can findout how much time the communication between agent and server took to initiate the actual start of the component.
Then from that time in the agent log we can get and then check how much time individual components took to restart Like NameNode might have taken longer time ... or DataNode ..etc.
If the host is slow then by looking at the individual component startup time we can get some idea like who is taking more time ....