Support Questions

Find answers, ask questions, and share your expertise

second replica is not found while writing a simple file to HDFS

avatar
I am trying to load a simple file to HDP Hadoop cluster using HDFS client and I got the following exception. 
Exception in thread "main" org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /home/maria_dev/read_write_hdfs_example.txt could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
I looked into the namenode logs and enabled debug log level for NetworkTopology and BlockPlacementPolicy components. After enabling logs, I found that the data node 172.18.0.2:50010 is  being excluded and since I am running only one datanode, it is unable to find second replica.
 
2023-04-28 06:00:22,188 DEBUG net.NetworkTopology (NetworkTopology.java:chooseRandom(780)) - Choosing random from 1 available nodes on node /default-
rack, scope=/default-rack, excludedScope=null, excludeNodes=[]
2023-04-28 06:00:22,188 DEBUG net.NetworkTopology (NetworkTopology.java:chooseRandom(796)) - chooseRandom returning 172.18.0.2:50010
2023-04-28 06:00:22,189 INFO hdfs.StateChange (FSNamesystem.java:logAllocatedBlock(3866)) - BLOCK* allocate blk_1073743107_2310, replicas=172.18.0.2
:50010 for /home/maria_dev/read_write_hdfs_example.txt
2023-04-28 06:00:24,972 INFO BlockStateChange (BlockManager.java:computeReplicationWorkForBlocks(1653)) - BLOCK* neededReplications = 0, pendingRepl
ications = 0.
2023-04-28 06:00:24,986 INFO destination.HDFSAuditDestination (HDFSAuditDestination.java:logJSON(179)) - Flushing HDFS audit. Event Size:2
2023-04-28 06:00:25,060 INFO hdfs.StateChange (FSNamesystem.java:completeFile(3759)) - DIR* completeFile: /spark2-history/.e3751543-0a05-4c2b-af27-f
3a7c02666b2 is closed by DFSClient_NONMAPREDUCE_-282543677_1
2023-04-28 06:00:27,974 INFO BlockStateChange (BlockManager.java:computeReplicationWorkForBlocks(1653)) - BLOCK* neededReplications = 0, pendingRepl
ications = 0.
2023-04-28 06:00:27,988 INFO provider.BaseAuditHandler (BaseAuditHandler.java:logStatus(310)) - Audit Status Log: name=hdfs.async.batch.hdfs, interv
al=01:00.313 minutes, events=23, succcessCount=23, totalEvents=878, totalSuccessCount=876, totalDeferredCount=2
2023-04-28 06:00:27,988 INFO destination.HDFSAuditDestination (HDFSAuditDestination.java:logJSON(179)) - Flushing HDFS audit. Event Size:3
2023-04-28 06:00:30,975 INFO BlockStateChange (BlockManager.java:computeReplicationWorkForBlocks(1653)) - BLOCK* neededReplications = 0, pendingRepl
ications = 0.
2023-04-28 06:00:33,976 INFO BlockStateChange (BlockManager.java:computeReplicationWorkForBlocks(1653)) - BLOCK* neededReplications = 0, pendingRepl
ications = 0.
2023-04-28 06:00:35,073 INFO hdfs.StateChange (FSNamesystem.java:completeFile(3759)) - DIR* completeFile: /spark2-history/.9aa45e6b-dff7-4f6d-b25d-a
662930c6797 is closed by DFSClient_NONMAPREDUCE_-282543677_1
2023-04-28 06:00:36,976 INFO BlockStateChange (BlockManager.java:computeReplicationWorkForBlocks(1653)) - BLOCK* neededReplications = 0, pendingRepl
ications = 0.
2023-04-28 06:00:36,989 INFO destination.HDFSAuditDestination (HDFSAuditDestination.java:logJSON(179)) - Flushing HDFS audit. Event Size:3
2023-04-28 06:00:39,977 INFO BlockStateChange (BlockManager.java:computeReplicationWorkForBlocks(1653)) - BLOCK* neededReplications = 0, pendingRepl
ications = 0.
2023-04-28 06:00:42,977 INFO BlockStateChange (BlockManager.java:computeReplicationWorkForBlocks(1653)) - BLOCK* neededReplications = 0, pendingRepl
ications = 0.
2023-04-28 06:00:43,251 DEBUG net.NetworkTopology (NetworkTopology.java:chooseRandom(780)) - Choosing random from 0 available nodes on node /default-
rack, scope=/default-rack, excludedScope=null, excludeNodes=[172.18.0.2:50010]
2023-04-28 06:00:43,251 DEBUG net.NetworkTopology (NetworkTopology.java:chooseRandom(796)) - chooseRandom returning null
2023-04-28 06:00:43,251 DEBUG blockmanagement.BlockPlacementPolicy (BlockPlacementPolicyDefault.java:chooseLocalRack(547)) - Failed to choose from lo
cal rack (location = /default-rack); the second replica is not found, retry choosing ramdomly
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy$NotEnoughReplicasException:
at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:701)
at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:622)
at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseLocalRack(BlockPlacementPolicyDefault.java:529)
at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseLocalStorage(BlockPlacementPolicyDefault.java:489)
 
Please help troubleshooting the issue further.
5 REPLIES 5

avatar
Super Collaborator

@iamlazycoder As you have only a single Datanode, the blockplacement policy won't allow putting the second replica on the same Datanode.

 

You can try to put the file with a single replica and check.

 

hdfs dfs -Ddfs.replication=1 -put /path/to/local/file /path/to/hdfs/dir

 

Or you can change the dfs.replication in hdfs-site.xml to 1 at a cluster level. 

avatar

@rki_ I already have the dfs.replication property as 1 in hdfs-site.xml, I can understand from the logs first it finds the data node 172.18.0.2:50010 and tries to allocate block for the write operation. Then why it tries to find the second replica when the dfs.replication property is 1

avatar
Super Collaborator

@iamlazycoder Have you tried putting the file with -Ddfs.replication=1 ?

avatar

@rki_ Yes I tried setting dfs.replication=1 while writing the file to HDFS as well. 

avatar

Exception in thread "main" org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /home/maria_dev/read_write_hdfs_example.txt could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation.

 

This is the exception from the client side. so it is clearly ignoring the datanode because of some issue. @rki_