About G_B

G_B · ‎05-29-2025

I have a 3 node cluster (ml1, ml2, and vortex). ml1 is the namenode, ml2 and vortex are datanodes. Last week we had a power shutdown in our school and since then i have not been able to access the files in hdfs. When i try to see the "Head of the file" i get an error: "Couldn't find datanode to read file from. Forbidden". The size of the file is correct (my understanding is that the file is still somewhere). The output of command: hdfs dfsadmin -report is as below: Configured Capacity: 2088742875136 (1.90 TB) Present Capacity: 1673267032064 (1.52 TB) DFS Remaining: 942202814464 (877.49 GB) DFS Used: 731064217600 (680.86 GB) DFS Used%: 43.69% Replicated Blocks: Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 6303 Missing blocks (with replication factor 1): 6303 Low redundancy blocks with highest priority to recover: 0 Pending deletion blocks: 0 Erasure Coded Block Groups: Low redundancy block groups: 0 Block groups with corrupt internal blocks: 0 Missing block groups: 0 Low redundancy blocks with highest priority to recover: 0 Pending deletion blocks: 0 ------------------------------------------------- Live datanodes (2): Name: 10.3.0.38:9866 (vortex.serv) Hostname: vortex.dbwks.erau.edu Decommission Status : Normal Configured Capacity: 1031987826688 (961.11 GB) DFS Used: 531036213248 (494.57 GB) Non DFS Used: 5448654848 (5.07 GB) DFS Remaining: 443057381376 (412.63 GB) DFS Used%: 51.46% DFS Remaining%: 42.93% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 0 Last contact: Thu May 29 13:42:03 EDT 2025 Last Block Report: Never Num of Blocks: 0 Name: 10.3.0.58:9866 (wxml2.serv) Hostname: wxml2.db.erau.edu Decommission Status : Normal Configured Capacity: 1056755048448 (984.18 GB) DFS Used: 200028004352 (186.29 GB) Non DFS Used: 303877742592 (283.01 GB) DFS Remaining: 499145433088 (464.87 GB) DFS Used%: 18.93% DFS Remaining%: 47.23% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 0 Last contact: Thu May 29 13:42:04 EDT 2025 Last Block Report: Thu May 29 12:56:07 EDT 2025 Num of Blocks: 3519 when i run command: hdfs fsck /hdfs_nifi/tfms/flightData/flModify/2023_03.csv -files -blocks -locations i get the error bwlow: FileSystem is inaccessible due to: java.io.FileNotFoundException: File does not exist: /hdfs_nifi/tfms/flightData/flModify/2023_03.csv DFSck exiting. But the file is visible under the above path. if i run command : hdfs dfs -ls /hdfs_nifi/tfms/flowInformation/Reroute/rrRouteData/ i get following: Found 42 items -rw-r--r-- 1 root supergroup 60193632 2025-04-26 21:01 /hdfs_nifi/tfms/flowInformation/Reroute/rrRouteData/2022_01.csv -rw-r--r-- 1 root supergroup 120930719 2025-03-29 07:16 /hdfs_nifi/tfms/flowInformation/Reroute/rrRouteData/2022_02.csv -rw-r--r-- 1 root supergroup 121990303 2025-04-30 16:37 /hdfs_nifi/tfms/flowInformation/Reroute/rrRouteData/2022_03.csv -rw-r--r-- 1 root supergroup 116003694 2025-04-01 18:15 /hdfs_nifi/tfms/flowInformation/Reroute/rrRouteData/2022_04.csv -rw-r--r-- 1 root supergroup 123227494 2025-05-04 08:24 /hdfs_nifi/tfms/flowInformation/Reroute/rrRouteData/2022_05.csv I ran command: hdfs fsck -list-corruptfileblocks , then selected any random block from the output, and run command: find /data/hadoop_dir/datanode_dir/ -type f -iname blk_1073786680 , and then go to the location returned (/data/hadoop_dir/datanode_dir/current/BP-1329719554-155.31.114.228-1716835341630/current/finalized/subdir0/subdir15/blk_1073786680) i do see data block there but why am i not able to access the data in HDFS? Also, on one of the datanodes (vortex) i get following warning: 2025-05-29 17:43:47,616 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Unsuccessfully sent block report 0x40e0062fac3f698a with lease ID 0x8d10044dcd041620 to namenode: wxml1.serv/10.3.0.48:8020, containing 1 storage report(s), of which we sent 0. The reports had 6312 total blocks and used 0 RPC(s). This took 1 msecs to generate and 2 msecs for RPC and NN processing. Got back no commands. 2025-05-29 17:43:47,616 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: RemoteException in offerService org.apache.hadoop.ipc.RemoteException(java.io.IOException): java.lang.NoSuchMethodError: java.nio.ByteBuffer.position(I)Ljava/nio/ByteBuffer; at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.runBlockOp(BlockManager.java:5558) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1651) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:182) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:34769) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:621) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1246) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1169) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1953) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3203) Caused by: java.lang.NoSuchMethodError: java.nio.ByteBuffer.position(I)Ljava/nio/ByteBuffer; at org.apache.hadoop.thirdparty.protobuf.IterableByteBufferInputStream.read(IterableByteBufferInputStream.java:143) at org.apache.hadoop.thirdparty.protobuf.CodedInputStream$StreamDecoder.read(CodedInputStream.java:2080) i checked command: find $HADOOP_HOME -name "protobuf*.jar" it returns following on all three machines: /apache/hadoop-3.4.0/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar I would really appreciate if someone would suggest me something about this issue i.e., is there a way to get my data? I really need help with this.

G_B · ‎06-05-2024

Hello, We have a 3 node cluster where one of the nodes is only 64 cores 128 GB RAM and other two machines are each 128 cores 128 GB RAM. I am using Round Robin load balancing in the flow as Listfile --> LB --> FetchFile, as a result of which load is getting divided (almost) uniformly among all the machines, which is expected behavior. However, the load average on smaller machine is exceeding 64 (this machine is sweating!) whereas the load average on other two machines is almost 50. So my question is there a way in NiFi to distribute load in such a way that the load on the smaller machine can be maintained to stay less than 64, and give the other two machines some more work to do? I tried using DistributeLoad processor, but not sure which strategy to use because Round robin would distribute load equally, next available would distribute when the next node is available. However, how do I configure DistributeLoad such that it divides load equally among two bigger machines but lesser load to the smaller machine? I would really appreciate your suggestions. Thanks!

Online	Offline
Last Visited	‎06-15-2025 05:02 PM

Member Since	‎06-05-2024 08:07 AM
Last Visited	‎06-15-2025 05:02 PM
Posts	2

Cloudera Community

Unable to open files - HDFS - Couldn't find datano...

Load balancing in NiFi - Heterogenous Nodes in Clu...