Support Questions
Find answers, ask questions, and share your expertise

master.HMasterCommandLine: Master exiting

New Contributor

2021-10-27 14:10:44,606 INFO [StoreOpener-1595e783b53d99cd5eef43b6debb2682-1] regionserver.HStore: 1595e783b53d99cd5eef43b6debb2682/proc created, memstore type=DefaultMemStore, storagePolicy=HOT, verifyBulkLoads=false, parallelPutCountPrintThreshold=50, encoding=NONE, compression=NONE
2021-10-27 14:10:44,614 INFO [master/ubuntu22:16000:becomeActiveMaster] regionserver.HRegion: Replaying edits from hdfs://ha-cluster:8020/hbase/MasterData/data/master/store/1595e783b53d99cd5eef43b6debb2682/recovered.wals/ubuntu22.mcloud.com%2C16000%2C1635246532227.1635287061008
2021-10-27 14:10:44,652 INFO [master/ubuntu22:16000:becomeActiveMaster] regionserver.HRegion: Replaying edits from hdfs://ha-cluster:8020/hbase/MasterData/data/master/store/1595e783b53d99cd5eef43b6debb2682/recovered.wals/ubuntu22.mcloud.com%2C16000%2C1635246532227.1635287511345
2021-10-27 14:10:44,657 WARN [master/ubuntu22:16000:becomeActiveMaster] regionserver.HRegion: Failed initialize of region= master:store,,1.1595e783b53d99cd5eef43b6debb2682., starting to roll back memstore
java.io.EOFException: Cannot seek after EOF
at org.apache.hadoop.hdfs.DFSInputStream.seek(DFSInputStream.java:1648)
at org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:66)
at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.initInternal(ProtobufLogReader.java:211)
at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.initReader(ProtobufLogReader.java:173)
at org.apache.hadoop.hbase.regionserver.wal.ReaderBase.init(ReaderBase.java:64)
at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.init(ProtobufLogReader.java:168)
at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:323)
at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:305)
at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:293)
at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:429)
at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:4863)
at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:4769)
at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:1013)
at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:955)
at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7497)
at org.apache.hadoop.hbase.regionserver.HRegion.openHRegionFromTableDir(HRegion.java:7455)
at org.apache.hadoop.hbase.master.region.MasterRegion.open(MasterRegion.java:269)
at org.apache.hadoop.hbase.master.region.MasterRegion.create(MasterRegion.java:309)
at org.apache.hadoop.hbase.master.region.MasterRegionFactory.create(MasterRegionFactory.java:104)
at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:948)
at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2240)
at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:621)
at java.lang.Thread.run(Thread.java:748)
2021-10-27 14:10:44,673 INFO [master/ubuntu22:16000:becomeActiveMaster] regionserver.HRegion: Drop memstore for Store proc in region master:store,,1.1595e783b53d99cd5eef43b6debb2682. , dropped memstoresize: [dataSize=0, getHeapSize=256, getOffHeapSize=0, getCellsCount=0 }
2021-10-27 14:10:44,673 INFO [master/ubuntu22:16000:becomeActiveMaster] regionserver.HRegion: Closing region master:store,,1.1595e783b53d99cd5eef43b6debb2682.
2021-10-27 14:10:44,697 INFO [master/ubuntu22:16000:becomeActiveMaster] regionserver.HRegion: Closed master:store,,1.1595e783b53d99cd5eef43b6debb2682.
2021-10-27 14:10:44,702 ERROR [master/ubuntu22:16000:becomeActiveMaster] master.HMaster: Failed to become active master
java.io.EOFException: Cannot seek after EOF
at org.apache.hadoop.hdfs.DFSInputStream.seek(DFSInputStream.java:1648)
at org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:66)
at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.initInternal(ProtobufLogReader.java:211)
at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.initReader(ProtobufLogReader.java:173)
at org.apache.hadoop.hbase.regionserver.wal.ReaderBase.init(ReaderBase.java:64)
at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.init(ProtobufLogReader.java:168)
at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:323)
at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:305)
at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:293)
at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:429)
at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:4863)
at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:4769)
at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:1013)
at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:955)
at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7497)
at org.apache.hadoop.hbase.regionserver.HRegion.openHRegionFromTableDir(HRegion.java:7455)
at org.apache.hadoop.hbase.master.region.MasterRegion.open(MasterRegion.java:269)
at org.apache.hadoop.hbase.master.region.MasterRegion.create(MasterRegion.java:309)
at org.apache.hadoop.hbase.master.region.MasterRegionFactory.create(MasterRegionFactory.java:104)
at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:948)
at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2240)
at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:621)
at java.lang.Thread.run(Thread.java:748)
2021-10-27 14:10:44,702 ERROR [master/ubuntu22:16000:becomeActiveMaster] master.HMaster: ***** ABORTING master ubuntu22.mcloud.com,16000,1635324039444: Unhandled exception. Starting shutdown. *****
java.io.EOFException: Cannot seek after EOF
at org.apache.hadoop.hdfs.DFSInputStream.seek(DFSInputStream.java:1648)
at org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:66)
at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.initInternal(ProtobufLogReader.java:211)
at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.initReader(ProtobufLogReader.java:173)
at org.apache.hadoop.hbase.regionserver.wal.ReaderBase.init(ReaderBase.java:64)
at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.init(ProtobufLogReader.java:168)
at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:323)
at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:305)
at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:293)
at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:429)
at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:4863)
at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:4769)
at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:1013)
at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:955)
at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7497)
at org.apache.hadoop.hbase.regionserver.HRegion.openHRegionFromTableDir(HRegion.java:7455)
at org.apache.hadoop.hbase.master.region.MasterRegion.open(MasterRegion.java:269)
at org.apache.hadoop.hbase.master.region.MasterRegion.create(MasterRegion.java:309)
at org.apache.hadoop.hbase.master.region.MasterRegionFactory.create(MasterRegionFactory.java:104)
at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:948)
at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2240)
at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:621)
at java.lang.Thread.run(Thread.java:748)
2021-10-27 14:10:44,703 INFO [master/ubuntu22:16000:becomeActiveMaster] regionserver.HRegionServer: ***** STOPPING region server 'ubuntu22.mcloud.com,16000,1635324039444' *****
2021-10-27 14:10:44,703 INFO [master/ubuntu22:16000:becomeActiveMaster] regionserver.HRegionServer: STOPPED: Stopped by master/ubuntu22:16000:becomeActiveMaster
2021-10-27 14:10:44,904 INFO [ubuntu22:16000.splitLogManager..Chore.1] hbase.ScheduledChore: Chore: SplitLogManager Timeout Monitor was stopped
2021-10-27 14:10:45,513 INFO [master/ubuntu22:16000] ipc.NettyRpcServer: Stopping server on /10.13.10.22:16000
2021-10-27 14:10:45,634 INFO [master/ubuntu22:16000] regionserver.HRegionServer: Stopping infoServer
2021-10-27 14:10:45,681 INFO [master/ubuntu22:16000] handler.ContextHandler: Stopped o.e.j.w.WebAppContext@7f5538a1{/,null,UNAVAILABLE}{file:/usr/local/share/packages/paresh/hbase-2.3.5/hbase-webapps/master}
2021-10-27 14:10:45,687 INFO [master/ubuntu22:16000] server.AbstractConnector: Stopped ServerConnector@574a89e2{HTTP/1.1,[http/1.1]}{0.0.0.0:60010}
2021-10-27 14:10:45,687 INFO [master/ubuntu22:16000] handler.ContextHandler: Stopped o.e.j.s.ServletContextHandler@262816a8{/static,file:///usr/local/share/packages/paresh/hbase-2.3.5/hbase-webapps/static/,UNAVAILABLE}
2021-10-27 14:10:45,688 INFO [master/ubuntu22:16000] handler.ContextHandler: Stopped o.e.j.s.ServletContextHandler@60f70249{/logs,file:///usr/local/share/packages/paresh/hbase-2.3.5/logs/,UNAVAILABLE}
2021-10-27 14:10:45,691 INFO [master/ubuntu22:16000] regionserver.HRegionServer: aborting server ubuntu22.mcloud.com,16000,1635324039444
2021-10-27 14:10:45,691 INFO [master/ubuntu22:16000] regionserver.HRegionServer: stopping server ubuntu22.mcloud.com,16000,1635324039444; all regions closed.
2021-10-27 14:10:45,691 INFO [master/ubuntu22:16000] hbase.ChoreService: Chore service for: master/ubuntu22:16000 had [] on shutdown
2021-10-27 14:10:45,693 WARN [master/ubuntu22:16000] master.ActiveMasterManager: Failed get of master address: java.io.IOException: Can't get master address from ZooKeeper; znode data == null
2021-10-27 14:10:45,693 INFO [master/ubuntu22:16000] hbase.ChoreService: Chore service for: ubuntu22:16000.splitLogManager. had [] on shutdown
2021-10-27 14:10:45,805 INFO [ReadOnlyZKClient-ubuntu19.mcloud.com:2181,ubuntu20.mcloud.com:2181,ubuntu22.mcloud.com:2181,ubuntu24.mcloud.com:2181,ubuntu25.mcloud.com:2181@0x45c8633c] zookeeper.ZooKeeper: Session: 0x37cc0997a7f0006 closed
2021-10-27 14:10:45,805 INFO [ReadOnlyZKClient-ubuntu19.mcloud.com:2181,ubuntu20.mcloud.com:2181,ubuntu22.mcloud.com:2181,ubuntu24.mcloud.com:2181,ubuntu25.mcloud.com:2181@0x45c8633c-EventThread] zookeeper.ClientCnxn: EventThread shut down for session: 0x37cc0997a7f0006
2021-10-27 14:10:45,904 INFO [master/ubuntu22:16000] zookeeper.ZooKeeper: Session: 0x27cc0997a4d0005 closed
2021-10-27 14:10:45,904 INFO [main-EventThread] zookeeper.ClientCnxn: EventThread shut down for session: 0x27cc0997a4d0005
2021-10-27 14:10:45,904 INFO [master/ubuntu22:16000] regionserver.HRegionServer: Exiting; stopping=ubuntu22.mcloud.com,16000,1635324039444; zookeeper connection closed.
2021-10-27 14:10:45,905 ERROR [main] master.HMasterCommandLine: Master exiting
java.lang.RuntimeException: HMaster Aborted
at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:261)
at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:149)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:149)
at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:3085)

 

 

Tried solutions :

remove WALs and MasterProcWALs from /hbase.

remove /hbase/MasterData/WALs

remove /hbase under znode using zkcli.

 

Please suggest 

1 REPLY 1

Super Collaborator

Hello @paresh 

 

Thanks for using Cloudera Community. Based on the Post, HBase Service is being impacted owing to the Trace shared by you in the Post with the Solutions attempted. While Complete Logs helps, It appears the Region "1595e783b53d99cd5eef43b6debb2682" Replay via "recovered.wals" is being interrupted owing to EOFException.

 

When a Region is being Opened, any Contents in the "recovered.wals" are Replayed by reading them & pushing to MemStore. Once the Edits are persisted from MemStore to an Hfile, the "recovered.wals" are Removed. 

 

With the Possibility of DataLoss, You may attempt to Stop HBase > Sideline the RecoveredEdits "hdfs://ha-cluster:8020/hbase/MasterData/data/master/store/1595e783b53d99cd5eef43b6debb2682/recovered.wals" & MasterProcWALs (To avoid any Replay of associated PID) > Start HBase > Verify the Status. Note the "DataLoss" is likely from removing the RecoveredEdits File not persisted yet. 

 

Additionally, You may use WALPlayer [1] for replaying the Contents of RecoveredEdits as well. 

 

Regards, Smarak

 

[1] https://hbase.apache.org/book.html#walplayer