Several services are failing in ambari. I try to restart them manually and they seem to be "up and running" for a few minutes before failing again. I'd like to know how I can start debugging to find the real cause of the issue and get an action plan so that the project becomes stable. This is a screenshot I got in Ambari this morning.
I restarted the services in red manually (make sure one works before going to the next one). So - I restarted YARN and it's up and running - I restarted Hive and seems to be up and running too - I restarted HBase, it failed the first time , then I restarted for a 2nd time and seems to be up and running. After a few minutes it started failing again. "Connection failed [Errno 111] connection refused to server_ip:16000" - Sometimes HBase seems to be up and running so I try to restart zookeeper but I never get it up, by the time I try to restart ZooKeeper, either Hive or HBase start failing again.
I know it's a wide open question, but where should I start looking for details on what failed and how should I proceed to fix it?