Support Questions

Find answers, ask questions, and share your expertise

Hbase Master Fails

avatar
Explorer

Hi All,

I'm have installed Hbase Master service and region server on separate nodes and Hbase master started successfully but after within few minutes it goes down.

Examined the log /var/log/hbase/hbase-hbase-master-chd104746.com.log and its showing the below errors. (Attached the part of error log file)

2017-08-09 17:27:18,533 INFO [chd104746.com,16000,1502279420790_ChoreService_1-EventThread] zookeeper.ClientCnxn: EventThread shut down 2017-08-09 17:27:18,544 ERROR [chd104746.com,16000,1502279420790_ChoreService_1] master.BackupLogCleaner: Failed to get hbase:backup table, therefore will keep all files org.apache.hadoop.hbase.TableNotFoundException: hbase:backup at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1264) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1162)

....

.....

2017-08-09 17:28:18,374 FATAL [chd104746:16000.activeMasterManager] master.HMaster: Failed to become active master java.io.IOException: Timedout 300000ms waiting for namespace table to be assigned at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:104) at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:1061) at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:840) at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:213)

......

......

....

2017-08-09 17:28:18,401 FATAL [chd104746:16000.activeMasterManager] master.HMaster: Master server abort: loaded coprocessors are: [org.apache.hadoop.hbase.backup.master.BackupController] 2017-08-09 17:28:18,401 FATAL [chd104746:16000.activeMasterManager] master.HMaster: Unhandled exception. Starting shutdown. java.io.IOException: Timedout 300000ms waiting for namespace table to be assigned at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:104) at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:1061) at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:840) at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:213) at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1863) at java.lang.Thread.run(Thread.java:745) 2017-08-09 17:28:18,401 INFO [chd104746:16000.activeMasterManager] regionserver.HRegionServer: STOPPED: Unhandled exception. Starting shutdown. 2017-08-09 17:28:18,401 INFO [master/chd104746.com/10.100.204.22:16000] regionserver.HRegionServer: Stopping infoServer

Could anyone please help on this ...

Thanks,

Vijay

hbase-master-error.txt

6 REPLIES 6

avatar

If it is not a production cluster(or not doing replication or other stuff dependent on zookeeper), can you try cleaning your zookeeper as it may be possible that there is znode for non-existent tables .

bin/hbase clean --cleanZk

avatar
Explorer

Ankit - have tried and getting below message.

ZNode(s) [hdp.node4.com,16020,1502289449381] of regionservers are not expired. Exiting without cleaning hbase data.

avatar

Before running above command please take regionserver and master down if not already.(and keep the zookeeper running)

avatar
Explorer

Ankit - No luck, as you suggested downed the region and master and ran the hbase clean --cleanZk

Again getting the same error.

Thanks

avatar

Do you have any errors in regionserver logs (it seems hbase:namespace table is not getting assigned some how).

avatar

I have also received the same error.

2018-05-03 15:23:01,869 ERROR [xxxxxxxx.eastus.cloudapp.azure.com,16000,1525360743919_ChoreService_1] master.BackupLogCleaner: Failed to get hbase:backup table, therefore will keep all files

I've observed that upon start/ restart, Active HBase Master are turning into Secondary HBase Master and then getting Stopped.

Garbage Collection log is showing below message:

Memory: 4k page, physical 8157984k(4535280k free), swap 0k(0k free) CommandLine flags: -XX:ErrorFile=/var/log/hbase/hs_err_pid%p.log -XX:InitialHeapSize=130527744 -XX:MaxHeapSize=1073741824 -XX:MaxNewSize=174485504 -XX:MaxTenuringThreshold=6 -XX:OldPLABSize=16 -XX:OnOutOfMemoryError=kill -9 %p -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseParNewGC 2018-05-08T09:59:21.106+0000: 0.786: [GC (Allocation Failure) 2018-05-08T09:59:21.106+0000: 0.786: [ParNew: 34432K->4288K(38720K), 0.0066260 secs] 34432K->5497K(124736K), 0.0067169 secs] [Times: user=0.01 sys=0.00, real=0.00 secs] 2018-05-08T09:59:21.819+0000: 1.500: [GC (Allocation Failure) 2018-05-08T09:59:21.819+0000: 1.500: [ParNew: 38720K->3327K(38720K), 0.0128638 secs] 39929K->6553K(124736K), 0.0129458 secs] [Times: user=0.02 sys=0.01, real=0.01 secs] 2018-05-08T09:59:22.098+0000: 1.778: [GC (Allocation Failure) 2018-05-08T09:59:22.098+0000: 1.778: [ParNew: 37759K->4287K(38720K), 0.0067072 secs] 40985K->10042K(124736K), 0.0067618 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 2018-05-08T09:59:22.457+0000: 2.137: [GC (Allocation Failure) 2018-05-08T09:59:22.457+0000: 2.137: [ParNew: 38719K->3036K(38720K), 0.0071463 secs] 44474K->9248K(124736K), 0.0072080 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]

Please share your comments if you've already found out the resolution for this issue.