Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hbase master stops after restarting .

avatar
Explorer

Hbase master stopped after running for few seconds.region servers are running .

hbase master logs:

 WARN  [slnxhadoop01:16000.activeMasterManager] master.SplitLogManager: error while splitting logs in [hdfs://slnxhadoop01.noid.in.sopra:8020/apps/hbase/data/WALs/slnxhadoop04.dhcp.noid.in.sopra,16020,1536839826364-splitting] installed = 1 but only 0 done
2019-01-10 16:48:57,344 FATAL [slnxhadoop01:16000.activeMasterManager] master.HMaster: Failed to become active master
java.io.IOException: error or interrupted while splitting logs in [hdfs://slnxhadoop01.noid.in.sopra:8020/apps/hbase/data/WALs/slnxhadoop04.dhcp.noid.in.sopra,16020,1536839826364-splitting] Task = installed = 1 done = 0 error = 1
        at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:290)
        at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:429)
        at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:339)
        at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:330)
        at org.apache.hadoop.hbase.master.HMaster.splitMetaLogBeforeAssignment(HMaster.java:1203)
        at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:806)
        at org.apache.hadoop.hbase.master.HMaster.access$900(HMaster.java:225)
        at org.apache.hadoop.hbase.master.HMaster$3.run(HMaster.java:2038)
        at java.lang.Thread.run(Thread.java:745)
2019-01-10 16:48:57,345 FATAL [slnxhadoop01:16000.activeMasterManager] master.HMaster: Master server abort: loaded coprocessors are: []
2019-01-10 16:48:57,345 FATAL [slnxhadoop01:16000.activeMasterManager] master.HMaster: Unhandled exception. Starting shutdown.
java.io.IOException: error or interrupted while splitting logs in [hdfs://slnxhadoop01.noid.in.sopra:8020/apps/hbase/data/WALs/slnxhadoop04.dhcp.noid.in.sopra,16020,1536839826364-splitting] Task = installed = 1 done = 0 error = 1
        at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:290)
        at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:429)
        at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:339)
        at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:330)
        at org.apache.hadoop.hbase.master.HMaster.splitMetaLogBeforeAssignment(HMaster.java:1203)
        at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:806)
        at org.apache.hadoop.hbase.master.HMaster.access$900(HMaster.java:225)
        at org.apache.hadoop.hbase.master.HMaster$3.run(HMaster.java:2038)
        at java.lang.Thread.run(Thread.java:745)
2019-01-10 16:48:57,345 INFO  [slnxhadoop01:16000.activeMasterManager] regionserver.HRegionServer: STOPPED: Unhandled exception. Starting shutdown.
2019-01-10 16:48:57,346 INFO  [master/slnxhadoop01.noid.in.sopra/172.26.50.102:16000] regionserver.HRegionServer: Stopping infoServer
2019-01-10 16:48:57,400 INFO  [master/slnxhadoop01.noid.in.sopra/172.26.50.102:16000] mortbay.log: Stopped SelectChannelConnector@0.0.0.0:16010
2019-01-10 16:48:57,403 INFO  [master/slnxhadoop01.noid.in.sopra/172.26.50.102:16000] procedure2.ProcedureExecutor: Stopping the procedure executor
2019-01-10 16:48:57,403 INFO  [master/slnxhadoop01.noid.in.sopra/172.26.50.102:16000] wal.WALProcedureStore: Stopping the WAL Procedure Store
2019-01-10 16:48:57,417 INFO  [master/slnxhadoop01.noid.in.sopra/172.26.50.102:16000] regionserver.HRegionServer: stopping server slnxhadoop01.noid.in.sopra,16000,1547117834090
2019-01-10 16:48:57,417 INFO  [master/slnxhadoop01.noid.in.sopra/172.26.50.102:16000] client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x36691ecadd9004c
2019-01-10 16:48:57,422 INFO  [master/slnxhadoop01.noid.in.sopra/172.26.50.102:16000] zookeeper.ZooKeeper: Session: 0x36691ecadd9004c closed
2019-01-10 16:48:57,422 INFO  [master/slnxhadoop01.noid.in.sopra/172.26.50.102:16000-EventThread] zookeeper.ClientCnxn: EventThread shut down
2019-01-10 16:48:57,423 INFO  [master/slnxhadoop01.noid.in.sopra/172.26.50.102:16000] regionserver.HRegionServer: stopping server slnxhadoop01.noid.in.sopra,16000,1547117834090; all regions closed.
2019-01-10 16:48:57,423 INFO  [master/slnxhadoop01.noid.in.sopra/172.26.50.102:16000] hbase.ChoreService: Chore service for: slnxhadoop01.noid.in.sopra,16000,1547117834090 had [[ScheduledChore: Name: HFileCleaner Period: 60000 Unit: MILLISECONDS], [ScheduledChore: Name: LogsCleaner Period: 60000 Unit: MILLISECONDS]] on shutdown
2019-01-10 16:48:57,427 INFO  [master/slnxhadoop01.noid.in.sopra/172.26.50.102:16000] client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x36691ecadd9004d
2019-01-10 16:48:57,428 INFO  [master/slnxhadoop01.noid.in.sopra/172.26.50.102:16000] zookeeper.ZooKeeper: Session: 0x36691ecadd9004d closed
2019-01-10 16:48:57,428 INFO  [master/slnxhadoop01.noid.in.sopra/172.26.50.102:16000] hbase.ChoreService: Chore service for: slnxhadoop01.noid.in.sopra,16000,1547117834090_splitLogManager_ had [[ScheduledChore: Name: SplitLogManager Timeout Monitor Period: 1000 Unit: MILLISECONDS]] on shutdown
2019-01-10 16:48:57,428 INFO  [master/slnxhadoop01.noid.in.sopra/172.26.50.102:16000] flush.MasterFlushTableProcedureManager: stop: server shutting down.
2019-01-10 16:48:57,428 INFO  [master/slnxhadoop01.noid.in.sopra/172.26.50.102:16000] ipc.RpcServer: Stopping server on 16000
2019-01-10 16:48:57,428 INFO  [slnxhadoop01:16000.activeMasterManager-EventThread] zookeeper.ClientCnxn: EventThread shut down
2019-01-10 16:48:57,429 INFO  [RpcServer.listener,port=16000] ipc.RpcServer: RpcServer.listener,port=16000: stopping
2019-01-10 16:48:57,430 INFO  [RpcServer.responder] ipc.RpcServer: RpcServer.responder: stopped
2019-01-10 16:48:57,430 INFO  [RpcServer.responder] ipc.RpcServer: RpcServer.responder: stopping
2019-01-10 16:48:57,436 INFO  [master/slnxhadoop01.noid.in.sopra/172.26.50.102:16000] zookeeper.RecoverableZooKeeper: Node /hbase-unsecure/rs/slnxhadoop01.noid.in.sopra,16000,1547117834090 already deleted, retry=false
2019-01-10 16:48:57,437 INFO  [master/slnxhadoop01.noid.in.sopra/172.26.50.102:16000] zookeeper.ZooKeeper: Session: 0x36691ecadd9004b closed
2019-01-10 16:48:57,437 INFO  [master/slnxhadoop01.noid.in.sopra/172.26.50.102:16000] regionserver.HRegionServer: stopping server slnxhadoop01.noid.in.sopra,16000,1547117834090; zookeeper connection closed.
2019-01-10 16:48:57,437 INFO  [master/slnxhadoop01.noid.in.sopra/172.26.50.102:16000] regionserver.HRegionServer: master/slnxhadoop01.noid.in.sopra/172.26.50.102:16000 exiting
2019-01-10 16:48:57,438 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
2 REPLIES 2

avatar
Master Mentor

@Shobhna Dhami

WAL is the write-ahead log for all puts/deletes executes to a table. The WAL ensures Hbase has durable writes if it's not a production environment you can actually delete the file mentioned in the log.

This could be an issue with stale WAL splits, try removing WAL's from below hdfs location and restart the HBase master and region services.

$ hdfs dfs -rm -skipTrash /apps/hbase/data/WALs/* 

Then restart the Master and region servers.

avatar
Super Guru

Please be aware that the above may cause data loss, and may not be something you want to do on any system.

You'll have to look at the rest of the Master log, and potentially a RegionServer log, to understand why the log splitting failed. Turning on DEBUG logging may help shed some more light on the situation, but would likely not be required. It can't be said what went wrong with only the provided information.