Thanks for using Cloudera Community. Based on the Post, RegionServer is reporting EOFException | "Bad DataNode" while replaying WALs "/hbase/WALs". It appears the HDFS Blocks are having issues. To fix the Issue, Your Team can Sideline the Contents of the "/hbase/WALs" (Specific "/hbase/WALs/jfhbase03,60020,1636456250380") & restart the concerned RegionServer. If all RegionServers are Impacted, Sideline each of the RegionServer Directories from "/hbase/WALs". Note that the WALs hold the Edits not yet persisted to Disk & Sidelining the WAL Directories (1 WAL per RegionServer) may incur Data Loss.
Additionally, Review the HDFS FSCK of HDFS files with WALs "/hbase/WALs" & fix any Corrupt/Missing Blocks. After ensuring the HDFS FSCK of WALs "/hbase/WALs" is Healthy, Your Team can restart the HBase RegionServers.
Thanks for replying. I have some questions. First I've tried reinstall the whole cluster(delete every contents), but the same thing happens again. And what do you mean Sildeline the Contents? When I restart the regionserver, the logs show like this:
2021-11-15 10:02:17,440 WARN [ResponseProcessor for block BP-1448520106-188.8.131.52-1636618223804:blk_1073742122_1743] hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block BP-1448520106-184.108.40.206-1636618223804:blk_1073742122_1743
java.io.EOFException: Premature EOF: no length prefix available
2021-11-15 10:02:17,440 WARN [DataStreamer for file /hbase/WALs/jfhbase03,60020,1636941117142/jfhbase03%2C60020%2C1636941117142..meta.1636941557288.meta block BP-1448520106-220.127.116.11-1636618223804:blk_1073742122_1743] hdfs.DFSClient: Error Recovery for block BP-1448520106-18.104.22.168-1636618223804:blk_1073742122_1743 in pipeline DatanodeInfoWithStorage[22.214.171.124:50010,DS-c7dc93d4-3561-4673-b49a-5e78b685ddb0,DISK], DatanodeInfoWithStorage[126.96.36.199:50010,DS-e7a5fd42-9133-4d9b-ba12-8c5349e51249,DISK]: bad datanode DatanodeInfoWithStorage[188.8.131.52:50010,DS-c7dc93d4-3561-4673-b49a-5e78b685ddb0,DISK]
Thanks for the Update. By Sideline, We meant ensuring the HBase WAL Directory "/hbase/WALs" is Empty. Let us know if the below Steps helps:
Stop HBase RegionServers,
Sideline the WAL Directory Contents i.e. There shouldn't be any Directories with "/hbase/WALs",
Restart HBase RegionServers.
Additionally, You reinstalled the Cluster & yet observed the concerned Issue again. This likely indicates the HDFS State may be Unhealthy. Any Chance you can review the HDFS FSCK on the HBase WAL Directory  to confirm whether the Blocks associated with the HBase WAL Files are Healthy.
When stop the hbase regionserver, /hbase/WALs is not empty. So I delete them in /hbase/WALs, and check the "hdfs fsck /hbase/WALs", which shows me like this, I think it means healthy. Then I started hbase regionservers, problem is not solved.
Connecting to namenode via http://jfhbase01:50070/fsck?ugi=mqq&path=%2Fhbase%2FWALs
FSCK started by xgx (auth:SIMPLE) from /184.108.40.206 for path /hbase/WALs at Mon Nov 15 16:31:30 CST 2021
Total size: 0 B (Total open files size: 1069 B)
Total dirs: 10
Total files: 0
Total symlinks: 0 (Files currently being written: 10)
Total blocks (validated): 0 (Total open file blocks (not validated): 10)
Minimally replicated blocks: 0
Over-replicated blocks: 0
Under-replicated blocks: 0
Mis-replicated blocks: 0
Default replication factor: 3
Average block replication: 0.0
Corrupt blocks: 0
Missing replicas: 0
Number of data-nodes: 10
Number of racks: 1
FSCK ended at Mon Nov 15 16:31:30 CST 2021 in 0 milliseconds
The filesystem under path '/hbase/WALs' is HEALTHY
I can see like this(via "hdfs dfs -cat /hbase/WALs/jfhbase04,60020,1637139292945/jfhbase04%2C60020%2C1637139292945.default.1637142895380"):
21/11/17 18:44:54 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
281aa8ab0c063ad4f5cddab6b97f1296test:tb_deposit_log * METAFAMILYHBASE::REGION_EVENT}-V5test:tb_deposit_log 281aa8ab0c063ad4f5cddab6b97f1296 *
It should be a data writing. Will it release the contents and be writen into disk later?