Created on 05-29-2025 11:40 AM - edited 05-29-2025 03:45 PM
I have a 3 node cluster (ml1, ml2, and vortex). ml1 is the namenode, ml2 and vortex are datanodes. Last week we had a power shutdown in our school and since then i have not been able to access the files in hdfs.
When i try to see the "Head of the file" i get an error: "Couldn't find datanode to read file from. Forbidden".
The size of the file is correct (my understanding is that the file is still somewhere). The output of command: hdfs dfsadmin -report is as below:
Configured Capacity: 2088742875136 (1.90 TB)
Present Capacity: 1673267032064 (1.52 TB)
DFS Remaining: 942202814464 (877.49 GB)
DFS Used: 731064217600 (680.86 GB)
DFS Used%: 43.69%
Replicated Blocks:
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 6303
Missing blocks (with replication factor 1): 6303
Low redundancy blocks with highest priority to recover: 0
Pending deletion blocks: 0
Erasure Coded Block Groups:
Low redundancy block groups: 0
Block groups with corrupt internal blocks: 0
Missing block groups: 0
Low redundancy blocks with highest priority to recover: 0
Pending deletion blocks: 0
-------------------------------------------------
Live datanodes (2):
Name: 10.3.0.38:9866 (vortex.serv)
Hostname: vortex.dbwks.erau.edu
Decommission Status : Normal
Configured Capacity: 1031987826688 (961.11 GB)
DFS Used: 531036213248 (494.57 GB)
Non DFS Used: 5448654848 (5.07 GB)
DFS Remaining: 443057381376 (412.63 GB)
DFS Used%: 51.46%
DFS Remaining%: 42.93%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 0
Last contact: Thu May 29 13:42:03 EDT 2025
Last Block Report: Never
Num of Blocks: 0
Name: 10.3.0.58:9866 (wxml2.serv)
Hostname: wxml2.db.erau.edu
Decommission Status : Normal
Configured Capacity: 1056755048448 (984.18 GB)
DFS Used: 200028004352 (186.29 GB)
Non DFS Used: 303877742592 (283.01 GB)
DFS Remaining: 499145433088 (464.87 GB)
DFS Used%: 18.93%
DFS Remaining%: 47.23%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 0
Last contact: Thu May 29 13:42:04 EDT 2025
Last Block Report: Thu May 29 12:56:07 EDT 2025
Num of Blocks: 3519
when i run command: hdfs fsck /hdfs_nifi/tfms/flightData/flModify/2023_03.csv -files -blocks -locations i get the error bwlow:
FileSystem is inaccessible due to:
java.io.FileNotFoundException: File does not exist: /hdfs_nifi/tfms/flightData/flModify/2023_03.csv
DFSck exiting.
But the file is visible under the above path. if i run command : hdfs dfs -ls /hdfs_nifi/tfms/flowInformation/Reroute/rrRouteData/ i get following:
Found 42 items
-rw-r--r-- 1 root supergroup 60193632 2025-04-26 21:01 /hdfs_nifi/tfms/flowInformation/Reroute/rrRouteData/2022_01.csv
-rw-r--r-- 1 root supergroup 120930719 2025-03-29 07:16 /hdfs_nifi/tfms/flowInformation/Reroute/rrRouteData/2022_02.csv
-rw-r--r-- 1 root supergroup 121990303 2025-04-30 16:37 /hdfs_nifi/tfms/flowInformation/Reroute/rrRouteData/2022_03.csv
-rw-r--r-- 1 root supergroup 116003694 2025-04-01 18:15 /hdfs_nifi/tfms/flowInformation/Reroute/rrRouteData/2022_04.csv
-rw-r--r-- 1 root supergroup 123227494 2025-05-04 08:24 /hdfs_nifi/tfms/flowInformation/Reroute/rrRouteData/2022_05.csv
I ran command: hdfs fsck -list-corruptfileblocks , then selected any random block from the output, and run command: find /data/hadoop_dir/datanode_dir/ -type f -iname blk_1073786680 , and then go to the location returned (/data/hadoop_dir/datanode_dir/current/BP-1329719554-155.31.114.228-1716835341630/current/finalized/subdir0/subdir15/blk_1073786680) i do see data block there but why am i not able to access the data in HDFS?
Also, on one of the datanodes (vortex) i get following warning:
2025-05-29 17:43:47,616 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Unsuccessfully sent block report 0x40e0062fac3f698a with lease ID 0x8d10044dcd041620 to namenode: wxml1.serv/10.3.0.48:8020, containing 1 storage report(s), of which we sent 0. The reports had 6312 total blocks and used 0 RPC(s). This took 1 msecs to generate and 2 msecs for RPC and NN processing. Got back no commands.
2025-05-29 17:43:47,616 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: RemoteException in offerService
org.apache.hadoop.ipc.RemoteException(java.io.IOException): java.lang.NoSuchMethodError: java.nio.ByteBuffer.position(I)Ljava/nio/ByteBuffer;
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.runBlockOp(BlockManager.java:5558)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1651)
at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:182)
at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:34769)
at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:621)
at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589)
at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1246)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1169)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1953)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3203)
Caused by: java.lang.NoSuchMethodError: java.nio.ByteBuffer.position(I)Ljava/nio/ByteBuffer;
at org.apache.hadoop.thirdparty.protobuf.IterableByteBufferInputStream.read(IterableByteBufferInputStream.java:143)
at org.apache.hadoop.thirdparty.protobuf.CodedInputStream$StreamDecoder.read(CodedInputStream.java:2080)
i checked command: find $HADOOP_HOME -name "protobuf*.jar"
it returns following on all three machines:
/apache/hadoop-3.4.0/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar
I would really appreciate if someone would suggest me something about this issue i.e., is there a way to get my data? I really need help with this.
Created 06-03-2025 07:40 AM
Hi @G_B
It could be a issue with your JDK version. Compare the JDK version in your working and non-working datanodes and try to upgrade/downgrade accordingly.
Created 01-09-2026 11:30 PM
@G_B FYI
➤Based on your report, your data is physically safe on the disks, but the HDFS Metadata Link is broken because your DataNodes cannot talk to your NameNode properly.
Here is the breakdown of why this is happening and how to fix it.
1. The Result: "Missing Blocks" and "Forbidden" Errors
Missing Blocks (6303): Your NameNode knows the files should exist (metadata is loaded), but because the DataNode blockReport failed due to the Java error, the DataNodes haven't told the NameNode which blocks they are holding.
Num of Blocks: 0: Look at your datanode dfsadmin report. It says Num of Blocks: 0. The NameNode thinks that node is empty because the Block Report failed.
Head of file / Forbidden: Since the NameNode thinks there are 0 blocks available, it tells your client "I have no DataNodes to give you for this file."
2 . Restart all Hadoop services (NameNode first, then DataNodes).
3. Solution Step 2: Clear the "Lease" and Trigger Block Reports
=> Check the DataNode logs. You want to see: Successfully sent block report.
=> Once the reports are successful, run the report again: hdfs dfsadmin -report.
=> The "Missing Blocks" count should start dropping toward zero, and Num of Blocks should increase.
4.. Troubleshooting the "File Not Found" in fsck
The reason fsck said the file didn't exist while ls showed it is likely due to NameNode Safe Mode.
When a NameNode starts up and sees 6,000+ missing blocks, it often enters Safe Mode to prevent data loss.
Check if you are in safe mode: hdfs dfsadmin -safemode get
If it is ON, do not leave it manually until your DataNodes have finished reporting their blocks.