Support Questions

Find answers, ask questions, and share your expertise

Namenode | No node to choose | Failed to choose from the next rack

avatar
Contributor

Hello,

I'm seeing these logs in the namenode, it seems it can't connect to the datanodes? that we have

2024-04-26 14:45:21,470 DEBUG net.NetworkTopology: No node to choose.
2024-04-26 14:45:21,470 DEBUG blockmanagement.BlockPlacementPolicy: Failed to choose from the next rack (location = /rack-10.128.44.130), retry choosing randomly
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy$NotEnoughReplicasException:
at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:829)
at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:717)
at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseFromNextRack(BlockPlacementPolicyDefault.java:660)
at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseLocalRack(BlockPlacementPolicyDefault.java:636)
at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTargetInOrder(BlockPlacementPolicyDefault.java:511)
at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:414)
at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:463)
at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:290)
at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:143)
at org.apache.hadoop.hdfs.server.blockmanagement.ReplicationWork.chooseTargets(ReplicationWork.java:46)
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1858)
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1810)
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4643)
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4510)
at java.lang.Thread.run(Thread.java:748)
2024-04-26 14:45:21,470 DEBUG net.NetworkTopology: Choosing random from 0 available nodes on node /, scope=, excludedScope=null, excludeNodes=[10.128.43.221:9866, 10.128.43.204:9866, 10.128.44.130:9866]. numOfDatanodes=3.

 

But when I executed this command to check the datanode status

Live datanodes (3):

Name: 10.128.43.204:9866 (10-128-43-204.hdfs-datanode-web.metering.svc.cluster.local)
Hostname: hdfs-datanode-0.hdfs-datanode.metering.svc.cluster.local
Rack: /rack-10.128.43.204
Decommission Status : Normal
Configured Capacity: 1056759873536 (984.18 GB)
DFS Used: 60179288064 (56.05 GB)
Non DFS Used: 74838016 (71.37 MB)
DFS Remaining: 996488970240 (928.05 GB)
DFS Used%: 5.69%
DFS Remaining%: 94.30%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri Apr 26 15:07:59 UTC 2024
Last Block Report: Fri Apr 26 12:21:53 UTC 2024
Num of Blocks: 1777840


Name: 10.128.43.221:9866 (10-128-43-221.hdfs-datanode-web.metering.svc.cluster.local)
Hostname: hdfs-datanode-2.hdfs-datanode.metering.svc.cluster.local
Rack: /rack-10.128.43.221
Decommission Status : Normal
Configured Capacity: 1056759873536 (984.18 GB)
DFS Used: 60178792448 (56.05 GB)
Non DFS Used: 74838016 (71.37 MB)
DFS Remaining: 996489465856 (928.05 GB)
DFS Used%: 5.69%
DFS Remaining%: 94.30%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri Apr 26 15:08:00 UTC 2024
Last Block Report: Fri Apr 26 14:22:23 UTC 2024
Num of Blocks: 1777840


Name: 10.128.44.130:9866 (10-128-44-130.hdfs-datanode-web.metering.svc.cluster.local)
Hostname: hdfs-datanode-1.hdfs-datanode.metering.svc.cluster.local
Rack: /rack-10.128.44.130
Decommission Status : Normal
Configured Capacity: 1056759873536 (984.18 GB)
DFS Used: 60182228992 (56.05 GB)
Non DFS Used: 74838016 (71.37 MB)
DFS Remaining: 996486029312 (928.05 GB)
DFS Used%: 5.69%
DFS Remaining%: 94.30%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri Apr 26 15:08:00 UTC 2024
Last Block Report: Fri Apr 26 13:56:23 UTC 2024
Num of Blocks: 1777840


 Hope you can help us to fix this kind of blocker.

Thank you in advance!

1 REPLY 1

avatar
Expert Contributor

Hi Team, looks like you have 3 DN with rack topology have you check how many racks there as per rack awareness 1 block will be on 1 DN in each rach or max 2 block in each rack with different datanode

You can try triggering the block report manually from DN to NN

hdfs dfsadmin -triggerBlockReport <datanode>:<ipc_port>