I installed ambari 6 to 7 months ago and everything worked fine until yesterday. Suddenly,Hbase master failed and it cannot be restarted. I tried to stop all the services and restart them, stop ambari and start everything manually but nothing seems to work.
The problem is that when I restart the service from the UI, it seems to start normally but it fails again after 5 to 6 seconds. There is no error code in the logs and the only thing I see in the Alert is :Connection failed: [Errno 111] Connection refused for Hbase,Hbase Master process and Hbase RegionServer Process.
The log file just prints the following information:
Mon Oct 22 14:12:42 EEST 2018 Starting master on hbase3.server core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 47976 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 47976 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited
My servers available memory is 8gb and the HBase Master Maximum Memory is set to 1gb by default. Also the RegionServers maximum value for -Xmn is set to 4000MB.
Do you think that I should increase Master's Max Memory?
Can you check the value of 'free -m' in the node where HBase master is running. Are both HBase master and region server are running on same node ? If "free -m" returns less than 1gb then there is not enough memory for hbase to start. Just for testing, you can stop some other services running on that node and try starting HBase master.
Free -m returns the result that I posted earlier. Both HBase Master and region server are running on the same machine. The only services I installed are HDFS, HBase ,Zookeeper and Ambari Metrics that was required.
I rolled back to the logs and it seems like two days ago the NameNode went down and the server returned "failed on connection exception: java.net.ConnectException: Connection refused;".
However, the NameNode is now up and running. Do you think that this might be related since the alert says that this the Hbase issue exists for two days?
HBase is dependent on HDFS. Since NameNode was down it might have stopped but now since NameNode is up and running HBase master should start properly. Did you check all the memory settings and logs of hbase master
Yes. Everything was checked as posted earlier and it does not produces any error. I also tried to increase the max heap size but nothing changed. HBase master starts and shutdowns immediately.
I also tried to start the master manually -from the cli- but it always returns "Error: Could not find or load main class exists". The problem is that everything worked fine until Today.
Hi, do you have any suggestion on this? Do you think it has to do with an update or something like that? Because the exact same issue occured to another instance of ambari we run in a completely different cluster.