Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Services failing on a daily basis in cluster, where do I start

Highlighted

Services failing on a daily basis in cluster, where do I start

New Contributor

Hello community!

Several services are failing in ambari. I try to restart them manually and they seem to be "up and running" for a few minutes before failing again. I'd like to know how I can start debugging to find the real cause of the issue and get an action plan so that the project becomes stable. This is a screenshot I got in Ambari this morning.

109757-ambari-capture.png


I restarted the services in red manually (make sure one works before going to the next one). So
- I restarted YARN and it's up and running
- I restarted Hive and seems to be up and running too
- I restarted HBase, it failed the first time , then I restarted for a 2nd time and seems to be up and running. After a few minutes it started failing again. "Connection failed [Errno 111] connection refused to server_ip:16000"
- Sometimes HBase seems to be up and running so I try to restart zookeeper but I never get it up, by the time I try to restart ZooKeeper, either Hive or HBase start failing again.


I know it's a wide open question, but where should I start looking for details on what failed and how should I proceed to fix it?

Thanks in advance for your help!