Created 08-09-2017 12:40 PM
Hi All,
I'm have installed Hbase Master service and region server on separate nodes and Hbase master started successfully but after within few minutes it goes down.
Examined the log /var/log/hbase/hbase-hbase-master-chd104746.com.log and its showing the below errors. (Attached the part of error log file)
2017-08-09 17:27:18,533 INFO [chd104746.com,16000,1502279420790_ChoreService_1-EventThread] zookeeper.ClientCnxn: EventThread shut down 2017-08-09 17:27:18,544 ERROR [chd104746.com,16000,1502279420790_ChoreService_1] master.BackupLogCleaner: Failed to get hbase:backup table, therefore will keep all files org.apache.hadoop.hbase.TableNotFoundException: hbase:backup at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1264) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1162)
....
.....
2017-08-09 17:28:18,374 FATAL [chd104746:16000.activeMasterManager] master.HMaster: Failed to become active master java.io.IOException: Timedout 300000ms waiting for namespace table to be assigned at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:104) at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:1061) at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:840) at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:213)
......
......
....
2017-08-09 17:28:18,401 FATAL [chd104746:16000.activeMasterManager] master.HMaster: Master server abort: loaded coprocessors are: [org.apache.hadoop.hbase.backup.master.BackupController] 2017-08-09 17:28:18,401 FATAL [chd104746:16000.activeMasterManager] master.HMaster: Unhandled exception. Starting shutdown. java.io.IOException: Timedout 300000ms waiting for namespace table to be assigned at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:104) at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:1061) at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:840) at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:213) at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1863) at java.lang.Thread.run(Thread.java:745) 2017-08-09 17:28:18,401 INFO [chd104746:16000.activeMasterManager] regionserver.HRegionServer: STOPPED: Unhandled exception. Starting shutdown. 2017-08-09 17:28:18,401 INFO [master/chd104746.com/10.100.204.22:16000] regionserver.HRegionServer: Stopping infoServer
Could anyone please help on this ...
Thanks,
Vijay
Created 08-09-2017 02:39 PM
If it is not a production cluster(or not doing replication or other stuff dependent on zookeeper), can you try cleaning your zookeeper as it may be possible that there is znode for non-existent tables .
bin/hbase clean --cleanZk
Created 08-09-2017 03:02 PM
Ankit - have tried and getting below message.
ZNode(s) [hdp.node4.com,16020,1502289449381] of regionservers are not expired. Exiting without cleaning hbase data.
Created 08-09-2017 03:04 PM
Before running above command please take regionserver and master down if not already.(and keep the zookeeper running)
Created 08-10-2017 07:08 AM
Ankit - No luck, as you suggested downed the region and master and ran the hbase clean --cleanZk
Again getting the same error.
Thanks
Created 08-10-2017 11:21 AM
Do you have any errors in regionserver logs (it seems hbase:namespace table is not getting assigned some how).
Created 05-08-2018 10:44 AM
I have also received the same error.
2018-05-03 15:23:01,869 ERROR [xxxxxxxx.eastus.cloudapp.azure.com,16000,1525360743919_ChoreService_1] master.BackupLogCleaner: Failed to get hbase:backup table, therefore will keep all files
I've observed that upon start/ restart, Active HBase Master are turning into Secondary HBase Master and then getting Stopped.
Garbage Collection log is showing below message:
Memory: 4k page, physical 8157984k(4535280k free), swap 0k(0k free) CommandLine flags: -XX:ErrorFile=/var/log/hbase/hs_err_pid%p.log -XX:InitialHeapSize=130527744 -XX:MaxHeapSize=1073741824 -XX:MaxNewSize=174485504 -XX:MaxTenuringThreshold=6 -XX:OldPLABSize=16 -XX:OnOutOfMemoryError=kill -9 %p -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseParNewGC 2018-05-08T09:59:21.106+0000: 0.786: [GC (Allocation Failure) 2018-05-08T09:59:21.106+0000: 0.786: [ParNew: 34432K->4288K(38720K), 0.0066260 secs] 34432K->5497K(124736K), 0.0067169 secs] [Times: user=0.01 sys=0.00, real=0.00 secs] 2018-05-08T09:59:21.819+0000: 1.500: [GC (Allocation Failure) 2018-05-08T09:59:21.819+0000: 1.500: [ParNew: 38720K->3327K(38720K), 0.0128638 secs] 39929K->6553K(124736K), 0.0129458 secs] [Times: user=0.02 sys=0.01, real=0.01 secs] 2018-05-08T09:59:22.098+0000: 1.778: [GC (Allocation Failure) 2018-05-08T09:59:22.098+0000: 1.778: [ParNew: 37759K->4287K(38720K), 0.0067072 secs] 40985K->10042K(124736K), 0.0067618 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 2018-05-08T09:59:22.457+0000: 2.137: [GC (Allocation Failure) 2018-05-08T09:59:22.457+0000: 2.137: [ParNew: 38719K->3036K(38720K), 0.0071463 secs] 44474K->9248K(124736K), 0.0072080 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
Please share your comments if you've already found out the resolution for this issue.