Created 01-29-2019 04:31 AM
Hbase master stopped after running for few seconds.region servers are running .
hbase master logs:
WARN [slnxhadoop01:16000.activeMasterManager] master.SplitLogManager: error while splitting logs in [hdfs://slnxhadoop01.noid.in.sopra:8020/apps/hbase/data/WALs/slnxhadoop04.dhcp.noid.in.sopra,16020,1536839826364-splitting] installed = 1 but only 0 done 2019-01-10 16:48:57,344 FATAL [slnxhadoop01:16000.activeMasterManager] master.HMaster: Failed to become active master java.io.IOException: error or interrupted while splitting logs in [hdfs://slnxhadoop01.noid.in.sopra:8020/apps/hbase/data/WALs/slnxhadoop04.dhcp.noid.in.sopra,16020,1536839826364-splitting] Task = installed = 1 done = 0 error = 1 at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:290) at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:429) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:339) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:330) at org.apache.hadoop.hbase.master.HMaster.splitMetaLogBeforeAssignment(HMaster.java:1203) at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:806) at org.apache.hadoop.hbase.master.HMaster.access$900(HMaster.java:225) at org.apache.hadoop.hbase.master.HMaster$3.run(HMaster.java:2038) at java.lang.Thread.run(Thread.java:745) 2019-01-10 16:48:57,345 FATAL [slnxhadoop01:16000.activeMasterManager] master.HMaster: Master server abort: loaded coprocessors are: [] 2019-01-10 16:48:57,345 FATAL [slnxhadoop01:16000.activeMasterManager] master.HMaster: Unhandled exception. Starting shutdown. java.io.IOException: error or interrupted while splitting logs in [hdfs://slnxhadoop01.noid.in.sopra:8020/apps/hbase/data/WALs/slnxhadoop04.dhcp.noid.in.sopra,16020,1536839826364-splitting] Task = installed = 1 done = 0 error = 1 at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:290) at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:429) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:339) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:330) at org.apache.hadoop.hbase.master.HMaster.splitMetaLogBeforeAssignment(HMaster.java:1203) at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:806) at org.apache.hadoop.hbase.master.HMaster.access$900(HMaster.java:225) at org.apache.hadoop.hbase.master.HMaster$3.run(HMaster.java:2038) at java.lang.Thread.run(Thread.java:745) 2019-01-10 16:48:57,345 INFO [slnxhadoop01:16000.activeMasterManager] regionserver.HRegionServer: STOPPED: Unhandled exception. Starting shutdown. 2019-01-10 16:48:57,346 INFO [master/slnxhadoop01.noid.in.sopra/172.26.50.102:16000] regionserver.HRegionServer: Stopping infoServer 2019-01-10 16:48:57,400 INFO [master/slnxhadoop01.noid.in.sopra/172.26.50.102:16000] mortbay.log: Stopped SelectChannelConnector@0.0.0.0:16010 2019-01-10 16:48:57,403 INFO [master/slnxhadoop01.noid.in.sopra/172.26.50.102:16000] procedure2.ProcedureExecutor: Stopping the procedure executor 2019-01-10 16:48:57,403 INFO [master/slnxhadoop01.noid.in.sopra/172.26.50.102:16000] wal.WALProcedureStore: Stopping the WAL Procedure Store 2019-01-10 16:48:57,417 INFO [master/slnxhadoop01.noid.in.sopra/172.26.50.102:16000] regionserver.HRegionServer: stopping server slnxhadoop01.noid.in.sopra,16000,1547117834090 2019-01-10 16:48:57,417 INFO [master/slnxhadoop01.noid.in.sopra/172.26.50.102:16000] client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x36691ecadd9004c 2019-01-10 16:48:57,422 INFO [master/slnxhadoop01.noid.in.sopra/172.26.50.102:16000] zookeeper.ZooKeeper: Session: 0x36691ecadd9004c closed 2019-01-10 16:48:57,422 INFO [master/slnxhadoop01.noid.in.sopra/172.26.50.102:16000-EventThread] zookeeper.ClientCnxn: EventThread shut down 2019-01-10 16:48:57,423 INFO [master/slnxhadoop01.noid.in.sopra/172.26.50.102:16000] regionserver.HRegionServer: stopping server slnxhadoop01.noid.in.sopra,16000,1547117834090; all regions closed. 2019-01-10 16:48:57,423 INFO [master/slnxhadoop01.noid.in.sopra/172.26.50.102:16000] hbase.ChoreService: Chore service for: slnxhadoop01.noid.in.sopra,16000,1547117834090 had [[ScheduledChore: Name: HFileCleaner Period: 60000 Unit: MILLISECONDS], [ScheduledChore: Name: LogsCleaner Period: 60000 Unit: MILLISECONDS]] on shutdown 2019-01-10 16:48:57,427 INFO [master/slnxhadoop01.noid.in.sopra/172.26.50.102:16000] client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x36691ecadd9004d 2019-01-10 16:48:57,428 INFO [master/slnxhadoop01.noid.in.sopra/172.26.50.102:16000] zookeeper.ZooKeeper: Session: 0x36691ecadd9004d closed 2019-01-10 16:48:57,428 INFO [master/slnxhadoop01.noid.in.sopra/172.26.50.102:16000] hbase.ChoreService: Chore service for: slnxhadoop01.noid.in.sopra,16000,1547117834090_splitLogManager_ had [[ScheduledChore: Name: SplitLogManager Timeout Monitor Period: 1000 Unit: MILLISECONDS]] on shutdown 2019-01-10 16:48:57,428 INFO [master/slnxhadoop01.noid.in.sopra/172.26.50.102:16000] flush.MasterFlushTableProcedureManager: stop: server shutting down. 2019-01-10 16:48:57,428 INFO [master/slnxhadoop01.noid.in.sopra/172.26.50.102:16000] ipc.RpcServer: Stopping server on 16000 2019-01-10 16:48:57,428 INFO [slnxhadoop01:16000.activeMasterManager-EventThread] zookeeper.ClientCnxn: EventThread shut down 2019-01-10 16:48:57,429 INFO [RpcServer.listener,port=16000] ipc.RpcServer: RpcServer.listener,port=16000: stopping 2019-01-10 16:48:57,430 INFO [RpcServer.responder] ipc.RpcServer: RpcServer.responder: stopped 2019-01-10 16:48:57,430 INFO [RpcServer.responder] ipc.RpcServer: RpcServer.responder: stopping 2019-01-10 16:48:57,436 INFO [master/slnxhadoop01.noid.in.sopra/172.26.50.102:16000] zookeeper.RecoverableZooKeeper: Node /hbase-unsecure/rs/slnxhadoop01.noid.in.sopra,16000,1547117834090 already deleted, retry=false 2019-01-10 16:48:57,437 INFO [master/slnxhadoop01.noid.in.sopra/172.26.50.102:16000] zookeeper.ZooKeeper: Session: 0x36691ecadd9004b closed 2019-01-10 16:48:57,437 INFO [master/slnxhadoop01.noid.in.sopra/172.26.50.102:16000] regionserver.HRegionServer: stopping server slnxhadoop01.noid.in.sopra,16000,1547117834090; zookeeper connection closed. 2019-01-10 16:48:57,437 INFO [master/slnxhadoop01.noid.in.sopra/172.26.50.102:16000] regionserver.HRegionServer: master/slnxhadoop01.noid.in.sopra/172.26.50.102:16000 exiting 2019-01-10 16:48:57,438 INFO [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
Created 01-29-2019 07:32 AM
WAL is the write-ahead log for all puts/deletes executes to a table. The WAL ensures Hbase has durable writes if it's not a production environment you can actually delete the file mentioned in the log.
This could be an issue with stale WAL splits, try removing WAL's from below hdfs location and restart the HBase master and region services.
$ hdfs dfs -rm -skipTrash /apps/hbase/data/WALs/*
Then restart the Master and region servers.
Created 01-29-2019 02:37 PM
Please be aware that the above may cause data loss, and may not be something you want to do on any system.
You'll have to look at the rest of the Master log, and potentially a RegionServer log, to understand why the log splitting failed. Turning on DEBUG logging may help shed some more light on the situation, but would likely not be required. It can't be said what went wrong with only the provided information.