Created on 05-13-2016 08:05 AM - edited 08-18-2019 06:25 AM
Hello,
HBase master wouldn't start spitting the below error.
2016-05-13 03:56:53,431 FATAL [ip-172-31-21-148:16000.activeMasterManager] master.HMaster: Failed to become active master java.io.IOException: Timedout 300000ms waiting for namespace table to be assigned
It appears to be started okay in Ambari view, but the it shows stopped in about 3 minutes.
FYI. it's a Vanilla HDP-2.4.0.0-169 install plus some configuration changes on disks, HDFS, but I haven't created anything on Hbase.
The attached is the screenshot of the HBase view of Ambari.
Below are the log at the time of failure:
/var/log/hbase/hbase-hbase-master-serverhost.log
Could you please advise on where to look at to make it started?
Thanks,
2016-05-13 03:56:41,651 INFO [ip-172-31-21-148.ap-southeast-2.compute.internal,16000,1463125847935_splitLogManager__ChoreService_1] master.SplitLogManager: total tasks = 1 unassigned = 0 tasks={/hbase-unsecure/splitWAL/WALs%2Fip-172-31-25-148.ap-southeast-2.compute.internal%2C16020%2C1463025987708-splitting%2Fip-172-31-25-148.ap-southeast-2.compute.internal%252C16020%252C1463025987708.default.1463030679002=last_update = 1463126191938 last_version = 399 cur_worker_name = ip-172-31-28-128.ap-southeast-2.compute.internal,16020,1463125906252 status = in_progress incarnation = 1 resubmits = 1 batch = installed = 1 done = 0 error = 0} 2016-05-13 03:56:47,651 INFO [ip-172-31-21-148.ap-southeast-2.compute.internal,16000,1463125847935_splitLogManager__ChoreService_1] master.SplitLogManager: total tasks = 1 unassigned = 0 tasks={/hbase-unsecure/splitWAL/WALs%2Fip-172-31-25-148.ap-southeast-2.compute.internal%2C16020%2C1463025987708-splitting%2Fip-172-31-25-148.ap-southeast-2.compute.internal%252C16020%252C1463025987708.default.1463030679002=last_update = 1463126191938 last_version = 399 cur_worker_name = ip-172-31-28-128.ap-southeast-2.compute.internal,16020,1463125906252 status = in_progress incarnation = 1 resubmits = 1 batch = installed = 1 done = 0 error = 0} 2016-05-13 03:56:53,431 FATAL [ip-172-31-21-148:16000.activeMasterManager] master.HMaster: Failed to become active master java.io.IOException: Timedout 300000ms waiting for namespace table to be assigned at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:104) at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:1005) at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:799) at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:191) at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1783) at java.lang.Thread.run(Thread.java:745) 2016-05-13 03:56:53,433 FATAL [ip-172-31-21-148:16000.activeMasterManager] master.HMaster: Master server abort: loaded coprocessors are: [] 2016-05-13 03:56:53,433 FATAL [ip-172-31-21-148:16000.activeMasterManager] master.HMaster: Unhandled exception. Starting shutdown. java.io.IOException: Timedout 300000ms waiting for namespace table to be assigned at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:104) at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:1005) at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:799) at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:191) at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1783) at java.lang.Thread.run(Thread.java:745) 2016-05-13 03:56:53,433 INFO [ip-172-31-21-148:16000.activeMasterManager] regionserver.HRegionServer: STOPPED: Unhandled exception. Starting shutdown. 2016-05-13 03:56:53,433 INFO [master/ip-172-31-21-148.ap-southeast-2.compute.internal/172.31.21.148:16000] regionserver.HRegionServer: Stopping infoServer 2016-05-13 03:56:53,456 INFO [master/ip-172-31-21-148.ap-southeast-2.compute.internal/172.31.21.148:16000] mortbay.log: Stopped SelectChannelConnector@0.0.0.0:16010 2016-05-13 03:56:53,500 WARN [MASTER_SERVER_OPERATIONS-ip-172-31-21-148:16000-1] master.SplitLogManager: Stopped while waiting for log splits to be completed 2016-05-13 03:56:53,500 WARN [MASTER_SERVER_OPERATIONS-ip-172-31-21-148:16000-1] master.SplitLogManager: error while splitting logs in [hdfs://ip-172-31-23-165.ap-southeast-2.compute.internal:8020/apps/hbase/data/WALs/ip-172-31-25-148.ap-southeast-2.compute.internal,16020,1463025987708-splitting] installed = 1 but only 0 done 2016-05-13 03:56:53,501 ERROR [MASTER_SERVER_OPERATIONS-ip-172-31-21-148:16000-1] executor.EventHandler: Caught throwable while processing event M_SERVER_SHUTDOWN java.io.IOException: failed log splitting for ip-172-31-25-148.ap-southeast-2.compute.internal,16020,1463025987708, will retry at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.resubmit(ServerShutdownHandler.java:357) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:220) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: error or interrupted while splitting logs in [hdfs://ip-172-31-23-165.ap-southeast-2.compute.internal:8020/apps/hbase/data/WALs/ip-172-31-25-148.ap-southeast-2.compute.internal,16020,1463025987708-splitting] Task = installed = 1 done = 0 error = 0 at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:290) at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:391) at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:364) at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:286) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:213) ... 4 more 2016-05-13 03:56:53,502 ERROR [MASTER_SERVER_OPERATIONS-ip-172-31-21-148:16000-3] executor.EventHandler: Caught throwable while processing event M_SERVER_SHUTDOWN java.io.IOException: Server is stopped at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:194) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2016-05-13 03:56:53,556 INFO [master/ip-172-31-21-148.ap-southeast-2.compute.internal/172.31.21.148:16000] procedure2.ProcedureExecutor: Stopping the procedure executor 2016-05-13 03:56:53,557 INFO [master/ip-172-31-21-148.ap-southeast-2.compute.internal/172.31.21.148:16000] wal.WALProcedureStore: Stopping the WAL Procedure Store 2016-05-13 03:56:53,656 INFO [ip-172-31-21-148.ap-southeast-2.compute.internal,16000,1463125847935_splitLogManager__ChoreService_1] master.SplitLogManager$TimeoutMonitor: Chore: SplitLogManager Timeout Monitor was stopped 2016-05-13 03:56:53,691 INFO [master/ip-172-31-21-148.ap-southeast-2.compute.internal/172.31.21.148:16000] regionserver.HRegionServer: stopping server ip-172-31-21-148.ap-southeast-2.compute.internal,16000,1463125847935 2016-05-13 03:56:53,691 INFO [master/ip-172-31-21-148.ap-southeast-2.compute.internal/172.31.21.148:16000] client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x154a916c5040001 2016-05-13 03:56:53,692 INFO [master/ip-172-31-21-148.ap-southeast-2.compute.internal/172.31.21.148:16000] zookeeper.ZooKeeper: Session: 0x154a916c5040001 closed 2016-05-13 03:56:53,693 INFO [master/ip-172-31-21-148.ap-southeast-2.compute.internal/172.31.21.148:16000-EventThread] zookeeper.ClientCnxn: EventThread shut down 2016-05-13 03:56:53,703 INFO [master/ip-172-31-21-148.ap-southeast-2.compute.internal/172.31.21.148:16000] regionserver.HRegionServer: stopping server ip-172-31-21-148.ap-southeast-2.compute.internal,16000,1463125847935; all regions closed. 2016-05-13 03:56:53,703 INFO [master/ip-172-31-21-148.ap-southeast-2.compute.internal/172.31.21.148:16000] hbase.ChoreService: Chore service for: ip-172-31-21-148.ap-southeast-2.compute.internal,16000,1463125847935 had [[ScheduledChore: Name: ip-172-31-21-148.ap-southeast-2.compute.internal,16000,1463125847935-BalancerChore Period: 300000 Unit: MILLISECONDS], [ScheduledChore: Name: ip-172-31-21-148.ap-southeast-2.compute.internal,16000,1463125847935-ClusterStatusChore Period: 60000 Unit: MILLISECONDS], [ScheduledChore: Name: ip-172-31-21-148.ap-southeast-2.compute.internal,16000,1463125847935-RegionNormalizerChore Period: 1800000 Unit: MILLISECONDS], [ScheduledChore: Name: HFileCleaner Period: 60000 Unit: MILLISECONDS], [ScheduledChore: Name: LogsCleaner Period: 60000 Unit: MILLISECONDS], [ScheduledChore: Name: CatalogJanitor-ip-172-31-21-148:16000 Period: 300000 Unit: MILLISECONDS]] on shutdown 2016-05-13 03:56:53,706 INFO [master/ip-172-31-21-148.ap-southeast-2.compute.internal/172.31.21.148:16000] client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x154a916c5040002 2016-05-13 03:56:53,707 INFO [master/ip-172-31-21-148.ap-southeast-2.compute.internal/172.31.21.148:16000] zookeeper.ZooKeeper: Session: 0x154a916c5040002 closed 2016-05-13 03:56:53,707 INFO [master/ip-172-31-21-148.ap-southeast-2.compute.internal/172.31.21.148:16000] hbase.ChoreService: Chore service for: ip-172-31-21-148.ap-southeast-2.compute.internal,16000,1463125847935_splitLogManager_ had [] on shutdown 2016-05-13 03:56:53,707 INFO [master/ip-172-31-21-148.ap-southeast-2.compute.internal/172.31.21.148:16000] flush.MasterFlushTableProcedureManager: stop: server shutting down. 2016-05-13 03:56:53,707 INFO [master/ip-172-31-21-148.ap-southeast-2.compute.internal/172.31.21.148:16000] ipc.RpcServer: Stopping server on 16000 2016-05-13 03:56:53,707 INFO [ip-172-31-21-148:16000.activeMasterManager-EventThread] zookeeper.ClientCnxn: EventThread shut down 2016-05-13 03:56:53,716 INFO [master/ip-172-31-21-148.ap-southeast-2.compute.internal/172.31.21.148:16000] zookeeper.RecoverableZooKeeper: Node /hbase-unsecure/rs/ip-172-31-21-148.ap-southeast-2.compute.internal,16000,1463125847935 already deleted, retry=false 2016-05-13 03:56:53,718 INFO [master/ip-172-31-21-148.ap-southeast-2.compute.internal/172.31.21.148:16000] zookeeper.ZooKeeper: Session: 0x154a916c5040000 closed 2016-05-13 03:56:53,718 INFO [master/ip-172-31-21-148.ap-southeast-2.compute.internal/172.31.21.148:16000] regionserver.HRegionServer: stopping server ip-172-31-21-148.ap-southeast-2.compute.internal,16000,1463125847935; zookeeper connection closed. 2016-05-13 03:56:53,718 INFO [master/ip-172-31-21-148.ap-southeast-2.compute.internal/172.31.21.148:16000] regionserver.HRegionServer: master/ip-172-31-21-148.ap-southeast-2.compute.internal/172.31.21.148:16000 exiting 2016-05-13 03:56:53,718 INFO [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
Created 05-13-2016 08:45 AM
Seems to be an issue with stale WAL splits, try removing WAL's from below hdfs location and restart the Hbase master and region services.
/apps/hbase/data/WALs/
Created 12-29-2016 08:57 AM
In our experience, this seems to be related to https://issues.apache.org/jira/browse/HBASE-14729 which is solved in HBase 2.0.0 (we are running 1.1.2 with hdp 2.5)
Created 01-03-2017 02:20 AM
Please re-read the description on HBASE-14729. There no code-changes made by that JIRA issue -- it was closed as a duplicate of https://issues.apache.org/jira/browse/HBASE-14223 which is still outstanding.
Created 11-28-2018 03:01 PM
so what's the best to resolve this issue for production