Support Questions

Find answers, ask questions, and share your expertise
Announcements
Welcome to the upgraded Community! Read this blog to see What’s New!

Hbase regionserver shutdown after few hours

avatar
Explorer

I installed a new hdfs+hbase cluster. But after started for few hours, hbase regionservers all shutdown. There isnt any data writing. 

versions:

HBase  1.2.0-cdh5.10.0

Hadoop  2.6.0-cdh5.10.0

zookeeper 3.4.5-cdh5.10

2021-11-09 21:15:51,928 WARN  [ResponseProcessor for block BP-899263853-9.180.152.33-1634099649590:blk_1073742680_2626] hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block BP-899263853-9.180.152.33-1634099649590:blk_1073742680_2626
java.io.EOFException: Premature EOF: no length prefix available
        at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2272)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:235)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:1075)
2021-11-09 21:15:51,928 WARN  [DataStreamer for file /hbase/WALs/jfhbase03,60020,1636456250380/jfhbase03%2C60020%2C1636456250380.default.1636463451629 block BP-899263853-9.180.152.33-1634099649590:blk_1073742680_2626] hdfs.DFSClient: Error Recovery for block BP-899263853-9.180.152.33-1634099649590:blk_1073742680_2626 in pipeline DatanodeInfoWithStorage[9.180.152.40:50010,DS-04c3a0da-36df-41b1-98ae-628c357fad41,DISK], DatanodeInfoWithStorage[9.180.152.33:50010,DS-e9e87481-2dc1-46aa-9685-c9e5bd920f9f,DISK], DatanodeInfoWithStorage[9.180.152.39:50010,DS-2cf89d04-34d8-4623-82b2-58daa7ac3e0c,DISK]: bad datanode DatanodeInfoWithStorage[9.180.152.40:50010,DS-04c3a0da-36df-41b1-98ae-628c357fad41,DISK]
2021-11-09 21:16:52,007 WARN  [ResponseProcessor for block BP-899263853-9.180.152.33-1634099649590:blk_1073742680_2634] hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block BP-899263853-9.180.152.33-1634099649590:blk_1073742680_2634
java.io.EOFException: Premature EOF: no length prefix available
        at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2272)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:235)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:1075)
2021-11-09 21:16:52,008 WARN  [DataStreamer for file /hbase/WALs/jfhbase03,60020,1636456250380/jfhbase03%2C60020%2C1636456250380.default.1636463451629 block BP-899263853-9.180.152.33-1634099649590:blk_1073742680_2634] hdfs.DFSClient: Error recovering pipeline for writing BP-899263853-9.180.152.33-1634099649590:blk_1073742680_2634. Already retried 5 times for the same packet.
2021-11-09 21:17:51,743 INFO  [main-EventThread] replication.ReplicationTrackerZKImpl: /hbase/rs/jfhbase10,60020,1636456250396 znode expired, triggering replicatorRemoved event
2021-11-09 21:18:24,288 INFO  [ReplicationExecutor-0] replication.ReplicationQueuesZKImpl: Atomically moving jfhbase10,60020,1636456250396's WALs to my queue
2021-11-09 21:20:50,426 INFO  [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=6.64 MB, freeSize=6.32 GB, max=6.33 GB, blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=779, evicted=0, evictedPerRun=0.0
2021-11-09 21:25:50,426 INFO  [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=6.64 MB, freeSize=6.32 GB, max=6.33 GB, blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=809, evicted=0, evictedPerRun=0.0
2021-11-09 21:30:50,426 INFO  [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=6.64 MB, freeSize=6.32 GB, max=6.33 GB, blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=839, evicted=0, evictedPerRun=0.0
2021-11-09 21:35:50,426 INFO  [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=6.64 MB, freeSize=6.32 GB, max=6.33 GB, blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=869, evicted=0, evictedPerRun=0.0
2021-11-09 21:40:50,426 INFO  [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=6.64 MB, freeSize=6.32 GB, max=6.33 GB, blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=899, evicted=0, evictedPerRun=0.0
2021-11-09 21:45:50,426 INFO  [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=6.64 MB, freeSize=6.32 GB, max=6.33 GB, blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=929, evicted=0, evictedPerRun=0.0
2021-11-09 21:50:50,426 INFO  [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=6.64 MB, freeSize=6.32 GB, max=6.33 GB, blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=959, evicted=0, evictedPerRun=0.0
2021-11-09 21:55:50,426 INFO  [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=6.64 MB, freeSize=6.32 GB, max=6.33 GB, blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=989, evicted=0, evictedPerRun=0.0
2021-11-09 22:00:50,426 INFO  [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=6.64 MB, freeSize=6.32 GB, max=6.33 GB, blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=1019, evicted=0, evictedPerRun=0.0
2021-11-09 22:05:50,426 INFO  [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=6.64 MB, freeSize=6.32 GB, max=6.33 GB, blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=1049, evicted=0, evictedPerRun=0.0
2021-11-09 22:10:50,426 INFO  [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=6.64 MB, freeSize=6.32 GB, max=6.33 GB, blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=1079, evicted=0, evictedPerRun=0.0
2021-11-09 22:10:50,428 INFO  [MobFileCache #0] mob.MobFileCache: MobFileCache Statistics, access: 0, miss: 0, hit: 0, hit ratio: 0%, evicted files: 0
2021-11-09 22:10:51,722 ERROR [sync.3] wal.FSHLog: Error syncing, request close of WAL
java.io.IOException: Failing write. Tried pipeline recovery 5 times without success.
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1230)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:721)
2021-11-09 22:10:51,722 WARN  [regionserver/jfhbase03/9.180.152.43:60020.logRoller] wal.FSHLog: Failed sync-before-close but no outstanding appends; closing WAL: java.io.IOException: Failing write. Tried pipeline recovery 5 times without success.
2021-11-09 22:10:51,722 WARN  [regionserver/jfhbase03/9.180.152.43:60020.logRoller] wal.ProtobufLogWriter: Failed to write trailer, non-fatal, continuing...
java.io.IOException: Failing write. Tried pipeline recovery 5 times without success.
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1230)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:721)
2021-11-09 22:10:51,722 ERROR [regionserver/jfhbase03/9.180.152.43:60020.logRoller] wal.FSHLog: Failed close of WAL writer hdfs://JFHbaseHDFS/hbase/WALs/jfhbase03,60020,1636456250380/jfhbase03%2C60020%2C1636456250380.default.1636463451629, unflushedEntries=0
java.io.IOException: Failing write. Tried pipeline recovery 5 times without success.
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1230)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:721)
2021-11-09 22:10:51,722 FATAL [regionserver/jfhbase03/9.180.152.43:60020.logRoller] regionserver.HRegionServer: ABORTING region server jfhbase03,60020,1636456250380: Failed log close in log roller
org.apache.hadoop.hbase.regionserver.wal.FailedLogCloseException: hdfs://JFHbaseHDFS/hbase/WALs/jfhbase03,60020,1636456250380/jfhbase03%2C60020%2C1636456250380.default.1636463451629, unflushedEntries=0
        at org.apache.hadoop.hbase.regionserver.wal.FSHLog.replaceWriter(FSHLog.java:886)
        at org.apache.hadoop.hbase.regionserver.wal.FSHLog.rollWriter(FSHLog.java:703)
        at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:148)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Failing write. Tried pipeline recovery 5 times without success.
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1230)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:721)
2021-11-09 22:10:51,722 FATAL [regionserver/jfhbase03/9.180.152.43:60020.logRoller] regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: []
2021-11-09 22:10:51,742 INFO  [regionserver/jfhbase03/9.180.152.43:60020.logRoller] regionserver.HRegionServer: STOPPED: Failed log close in log roller
2021-11-09 22:10:51,742 INFO  [regionserver/jfhbase03/9.180.152.43:60020.logRoller] regionserver.LogRoller: LogRoller exiting.
2021-11-09 22:10:51,742 INFO  [regionserver/jfhbase03/9.180.152.43:60020] regionserver.SplitLogWorker: Sending interrupt to stop the worker thread
2021-11-09 22:10:51,742 INFO  [regionserver/jfhbase03/9.180.152.43:60020] regionserver.HRegionServer: Stopping infoServer
2021-11-09 22:10:51,742 INFO  [SplitLogWorker-jfhbase03:60020] regionserver.SplitLogWorker: SplitLogWorker interrupted. Exiting. 
2021-11-09 22:10:51,743 INFO  [SplitLogWorker-jfhbase03:60020] regionserver.SplitLogWorker: SplitLogWorker jfhbase03,60020,1636456250380 exiting
2021-11-09 22:10:51,743 INFO  [regionserver/jfhbase03/9.180.152.43:60020] mortbay.log: Stopped SelectChannelConnector@0.0.0.0:60030
2021-11-09 22:10:51,843 INFO  [regionserver/jfhbase03/9.180.152.43:60020] regionserver.HeapMemoryManager: Stoping HeapMemoryTuner chore.
2021-11-09 22:10:51,843 INFO  [regionserver/jfhbase03/9.180.152.43:60020] flush.RegionServerFlushTableProcedureManager: Stopping region server flush procedure manager abruptly.
2021-11-09 22:10:51,843 INFO  [MemStoreFlusher.0] regionserver.MemStoreFlusher: MemStoreFlusher.0 exiting
2021-11-09 22:10:51,843 INFO  [MemStoreFlusher.1] regionserver.MemStoreFlusher: MemStoreFlusher.1 exiting
2021-11-09 22:10:51,843 INFO  [regionserver/jfhbase03/9.180.152.43:60020] snapshot.RegionServerSnapshotManager: Stopping RegionServerSnapshotManager abruptly.
2021-11-09 22:10:51,844 INFO  [regionserver/jfhbase03/9.180.152.43:60020] regionserver.HRegionServer: aborting server jfhbase03,60020,1636456250380
2021-11-09 22:10:51,844 INFO  [regionserver/jfhbase03/9.180.152.43:60020] client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x17d046470ac0004
2021-11-09 22:10:51,847 INFO  [regionserver/jfhbase03/9.180.152.43:60020] zookeeper.ZooKeeper: Session: 0x17d046470ac0004 closed
2021-11-09 22:10:51,847 INFO  [regionserver/jfhbase03/9.180.152.43:60020-EventThread] zookeeper.ClientCnxn: EventThread shut down
2021-11-09 22:10:51,848 INFO  [regionserver/jfhbase03/9.180.152.43:60020] regionserver.HRegionServer: stopping server jfhbase03,60020,1636456250380; all regions closed.
2021-11-09 22:10:51,848 WARN  [regionserver/jfhbase03/9.180.152.43:60020] wal.ProtobufLogWriter: Failed to write trailer, non-fatal, continuing...
java.nio.channels.ClosedChannelException
        at org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1940)
        at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:105)
        at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
        at java.io.DataOutputStream.write(DataOutputStream.java:107)
        at com.google.protobuf.CodedOutputStream.refreshBuffer(CodedOutputStream.java:833)
        at com.google.protobuf.CodedOutputStream.flush(CodedOutputStream.java:843)
        at com.google.protobuf.AbstractMessageLite.writeTo(AbstractMessageLite.java:80)
        at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.writeWALTrailer(ProtobufLogWriter.java:157)
        at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.close(ProtobufLogWriter.java:130)
        at org.apache.hadoop.hbase.regionserver.wal.FSHLog.shutdown(FSHLog.java:1068)
        at org.apache.hadoop.hbase.wal.DefaultWALProvider.shutdown(DefaultWALProvider.java:114)
        at org.apache.hadoop.hbase.wal.WALFactory.shutdown(WALFactory.java:221)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.shutdownWAL(HRegionServer.java:1321)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1070)
        at java.lang.Thread.run(Thread.java:748)
2021-11-09 22:10:51,859 INFO  [regionserver/jfhbase03/9.180.152.43:60020] regionserver.Leases: regionserver/jfhbase03/9.180.152.43:60020 closing leases
2021-11-09 22:10:51,859 INFO  [regionserver/jfhbase03/9.180.152.43:60020] regionserver.Leases: regionserver/jfhbase03/9.180.152.43:60020 closed leases
2021-11-09 22:10:51,859 INFO  [regionserver/jfhbase03/9.180.152.43:60020] hbase.ChoreService: Chore service for: jfhbase03,60020,1636456250380 had [[ScheduledChore: Name: jfhbase03,60020,1636456250380-MemstoreFlusherChore Period: 10000 Unit: MILLISECONDS], [ScheduledChore: Name: MovedRegionsCleaner for region jfhbase03,60020,1636456250380 Period: 120000 Unit: MILLISECONDS]] on shutdown
2021-11-09 22:10:51,859 INFO  [regionserver/jfhbase03/9.180.152.43:60020] regionserver.CompactSplitThread: Waiting for Split Thread to finish...
2021-11-09 22:10:51,859 INFO  [regionserver/jfhbase03/9.180.152.43:60020] regionserver.CompactSplitThread: Waiting for Merge Thread to finish...
2021-11-09 22:10:51,859 INFO  [regionserver/jfhbase03/9.180.152.43:60020] regionserver.CompactSplitThread: Waiting for Large Compaction Thread to finish...
2021-11-09 22:10:51,859 INFO  [regionserver/jfhbase03/9.180.152.43:60020] regionserver.CompactSplitThread: Waiting for Small Compaction Thread to finish...
2021-11-09 22:10:51,866 INFO  [regionserver/jfhbase03/9.180.152.43:60020] ipc.RpcServer: Stopping server on 60020
2021-11-09 22:10:51,866 INFO  [RpcServer.listener,port=60020] ipc.RpcServer: RpcServer.listener,port=60020: stopping
2021-11-09 22:10:51,866 INFO  [RpcServer.responder] ipc.RpcServer: RpcServer.responder: stopped
2021-11-09 22:10:51,866 INFO  [RpcServer.responder] ipc.RpcServer: RpcServer.responder: stopping
2021-11-09 22:10:51,874 INFO  [regionserver/jfhbase03/9.180.152.43:60020] zookeeper.ZooKeeper: Session: 0x37d04643ce40000 closed
2021-11-09 22:10:51,874 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
2021-11-09 22:10:51,874 INFO  [regionserver/jfhbase03/9.180.152.43:60020] regionserver.HRegionServer: stopping server jfhbase03,60020,1636456250380; zookeeper connection closed.
2021-11-09 22:10:51,874 INFO  [regionserver/jfhbase03/9.180.152.43:60020] regionserver.HRegionServer: regionserver/jfhbase03/9.180.152.43:60020 exiting
2021-11-09 22:10:51,874 ERROR [main] regionserver.HRegionServerCommandLine: Region server exiting
java.lang.RuntimeException: HRegionServer Aborted
        at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:68)
        at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:87)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:127)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:2676)
2021-11-09 22:10:51,875 INFO  [Thread-6] regionserver.ShutdownHook: Shutdown hook starting; hbase.shutdown.hook=true; fsShutdownHook=org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@ea9b7c6
2021-11-09 22:10:51,875 INFO  [Thread-6] regionserver.ShutdownHook: Starting fs shutdown hook thread.
2021-11-09 22:10:51,885 INFO  [Thread-6] regionserver.ShutdownHook: Shutdown hook finished.
Wed Nov 10 15:35:54 CST 2021 Starting regionserver on jfhbase03
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 256263
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1000000
pipe size            (512 bytes, -p) 8
        at org.apache.hadoop.hbase.regionserver.wal.FSHLog.shutdown(FSHLog.java:1068)
        at org.apache.hadoop.hbase.wal.DefaultWALProvider.shutdown(DefaultWALProvider.java:114)
        at org.apache.hadoop.hbase.wal.WALFactory.shutdown(WALFactory.java:221)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.shutdownWAL(HRegionServer.java:1321)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1070)
        at java.lang.Thread.run(Thread.java:748)
2021-11-09 22:10:51,859 INFO  [regionserver/jfhbase03/9.180.152.43:60020] regionserver.Leases: regionserver/jfhbase03/9.180.152.43:60020 closing leases
2021-11-09 22:10:51,859 INFO  [regionserver/jfhbase03/9.180.152.43:60020] regionserver.Leases: regionserver/jfhbase03/9.180.152.43:60020 closed leases
2021-11-09 22:10:51,859 INFO  [regionserver/jfhbase03/9.180.152.43:60020] hbase.ChoreService: Chore service for: jfhbase03,60020,1636456250380 had [[ScheduledChore: Name: jfhbase03,60020,16364562503
80-MemstoreFlusherChore Period: 10000 Unit: MILLISECONDS], [ScheduledChore: Name: MovedRegionsCleaner for region jfhbase03,60020,1636456250380 Period: 120000 Unit: MILLISECONDS]] on shutdown
2021-11-09 22:10:51,859 INFO  [regionserver/jfhbase03/9.180.152.43:60020] regionserver.CompactSplitThread: Waiting for Split Thread to finish...
2021-11-09 22:10:51,859 INFO  [regionserver/jfhbase03/9.180.152.43:60020] regionserver.CompactSplitThread: Waiting for Merge Thread to finish...
2021-11-09 22:10:51,859 INFO  [regionserver/jfhbase03/9.180.152.43:60020] regionserver.CompactSplitThread: Waiting for Large Compaction Thread to finish...
2021-11-09 22:10:51,859 INFO  [regionserver/jfhbase03/9.180.152.43:60020] regionserver.CompactSplitThread: Waiting for Small Compaction Thread to finish...
2021-11-09 22:10:51,866 INFO  [regionserver/jfhbase03/9.180.152.43:60020] ipc.RpcServer: Stopping server on 60020
2021-11-09 22:10:51,866 INFO  [RpcServer.listener,port=60020] ipc.RpcServer: RpcServer.listener,port=60020: stopping
2021-11-09 22:10:51,866 INFO  [RpcServer.responder] ipc.RpcServer: RpcServer.responder: stopped
2021-11-09 22:10:51,866 INFO  [RpcServer.responder] ipc.RpcServer: RpcServer.responder: stopping
2021-11-09 22:10:51,874 INFO  [regionserver/jfhbase03/9.180.152.43:60020] zookeeper.ZooKeeper: Session: 0x37d04643ce40000 closed
2021-11-09 22:10:51,874 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
2021-11-09 22:10:51,874 INFO  [regionserver/jfhbase03/9.180.152.43:60020] regionserver.HRegionServer: stopping server jfhbase03,60020,1636456250380; zookeeper connection closed.
2021-11-09 22:10:51,874 INFO  [regionserver/jfhbase03/9.180.152.43:60020] regionserver.HRegionServer: regionserver/jfhbase03/9.180.152.43:60020 exiting
2021-11-09 22:10:51,874 ERROR [main] regionserver.HRegionServerCommandLine: Region server exiting
java.lang.RuntimeException: HRegionServer Aborted
        at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:68)
        at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:87)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:127)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:2676)
2021-11-09 22:10:51,875 INFO  [Thread-6] regionserver.ShutdownHook: Shutdown hook starting; hbase.shutdown.hook=true; fsShutdownHook=org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@ea9b7c6
2021-11-09 22:10:51,875 INFO  [Thread-6] regionserver.ShutdownHook: Starting fs shutdown hook thread.
2021-11-09 22:10:51,885 INFO  [Thread-6] regionserver.ShutdownHook: Shutdown hook finished.

Anyone can help, please

6 REPLIES 6

avatar
Super Collaborator

Hello @xgxshtc 

 

Thanks for using Cloudera Community. Based on the Post, RegionServer is reporting EOFException | "Bad DataNode" while replaying WALs "/hbase/WALs". It appears the HDFS Blocks are having issues. To fix the Issue, Your Team can Sideline the Contents of the "/hbase/WALs" (Specific "/hbase/WALs/jfhbase03,60020,1636456250380") & restart the concerned RegionServer. If all RegionServers are Impacted, Sideline each of the RegionServer Directories from "/hbase/WALs". Note that the WALs hold the Edits not yet persisted to Disk & Sidelining the WAL Directories (1 WAL per RegionServer) may incur Data Loss. 

 

Additionally, Review the HDFS FSCK of HDFS files with WALs "/hbase/WALs" & fix any Corrupt/Missing Blocks. After ensuring the HDFS FSCK of WALs "/hbase/WALs" is Healthy, Your Team can restart the HBase RegionServers. 

 

Regards, Smarak

avatar
Explorer

Thanks for replying. I have some questions. First I've tried reinstall the whole cluster(delete every contents), but the same thing happens again. And what do you mean Sildeline the Contents? When I restart the regionserver, the logs show like this:

 

2021-11-15 10:02:17,440 WARN  [ResponseProcessor for block BP-1448520106-9.180.152.33-1636618223804:blk_1073742122_1743] hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block BP-1448520106-9.180.152.33-1636618223804:blk_1073742122_1743
java.io.EOFException: Premature EOF: no length prefix available
        at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2272)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:235)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:1075)
2021-11-15 10:02:17,440 WARN  [DataStreamer for file /hbase/WALs/jfhbase03,60020,1636941117142/jfhbase03%2C60020%2C1636941117142..meta.1636941557288.meta block BP-1448520106-9.180.152.33-1636618223804:blk_1073742122_1743] hdfs.DFSClient: Error Recovery for block BP-1448520106-9.180.152.33-1636618223804:blk_1073742122_1743 in pipeline DatanodeInfoWithStorage[9.180.152.30:50010,DS-c7dc93d4-3561-4673-b49a-5e78b685ddb0,DISK], DatanodeInfoWithStorage[9.180.152.40:50010,DS-e7a5fd42-9133-4d9b-ba12-8c5349e51249,DISK]: bad datanode DatanodeInfoWithStorage[9.180.152.30:50010,DS-c7dc93d4-3561-4673-b49a-5e78b685ddb0,DISK]

 

avatar
Super Collaborator

Hello @xgxshtc 

 

Thanks for the Update. By Sideline, We meant ensuring the HBase WAL Directory "/hbase/WALs" is Empty. Let us know if the below Steps helps:

  • Stop HBase RegionServers,
  • Sideline the WAL Directory Contents i.e. There shouldn't be any Directories with "/hbase/WALs",
  • Restart HBase RegionServers. 

 

Additionally, You reinstalled the Cluster & yet observed the concerned Issue again. This likely indicates the HDFS State may be Unhealthy. Any Chance you can review the HDFS FSCK on the HBase WAL Directory [1] to confirm whether the Blocks associated with the HBase WAL Files are Healthy.

 

Regards, Smarak

 

[1] https://hadoop.apache.org/docs/r1.2.1/commands_manual.html#fsck

 

avatar
Explorer

When stop the hbase regionserver, /hbase/WALs is not empty. So I delete them in /hbase/WALs, and check the "hdfs fsck /hbase/WALs", which shows me like this, I think it means healthy. Then I started hbase regionservers, problem is not solved.

 

Connecting to namenode via http://jfhbase01:50070/fsck?ugi=mqq&path=%2Fhbase%2FWALs
FSCK started by xgx (auth:SIMPLE) from /9.180.152.33 for path /hbase/WALs at Mon Nov 15 16:31:30 CST 2021
Status: HEALTHY
 Total size:    0 B (Total open files size: 1069 B)
 Total dirs:    10
 Total files:   0
 Total symlinks:                0 (Files currently being written: 10)
 Total blocks (validated):      0 (Total open file blocks (not validated): 10)
 Minimally replicated blocks:   0
 Over-replicated blocks:        0
 Under-replicated blocks:       0
 Mis-replicated blocks:         0
 Default replication factor:    3
 Average block replication:     0.0
 Corrupt blocks:                0
 Missing replicas:              0
 Number of data-nodes:          10
 Number of racks:               1
FSCK ended at Mon Nov 15 16:31:30 CST 2021 in 0 milliseconds


The filesystem under path '/hbase/WALs' is HEALTHY

 

avatar
Super Collaborator

Hello @xgxshtc 

 

Thanks for the Update. If you try to access the WAL File (For which the DFSOutputStream reports Premature EOF) via "hdfs dfs -cat/-head", Is the Command running successfully ? 

 

The FSCK Output can be modified to include "-openforwrite" to show the details of the 10 Files currently opened. 

 

Regards, Smarak

avatar
Explorer

I can see like this(via "hdfs dfs -cat /hbase/WALs/jfhbase04,60020,1637139292945/jfhbase04%2C60020%2C1637139292945.default.1637142895380"):

21/11/17 18:44:54 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
PWAL"ProtobufLogWriter*5org.apache.hadoop.hbase.regionserver.wal.WALCellCodecD
 281aa8ab0c063ad4f5cddab6b97f1296test:tb_deposit_log    * METAFAMILYHBASE::REGION_EVENT}-V5test:tb_deposit_log 281aa8ab0c063ad4f5cddab6b97f1296 *

infoinfo2
        jfhbase04test:tb_deposit_log,,1637143163577.281aa8ab0c063ad4f5cddab6b97f1296.

It should be a data writing. Will it release the contents and be writen into disk later?

Labels