Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

HDFS cluster with replication factor as 1, storing all blocks of a file in one data node. Is it a common behavior?

Contributor

Hi team,

I have 3 node HDFS cluster with replication factor as 1. I have copied 10GB file in to HDFS with “dfs hdfs -put” command then it was divided into 86 blocks of 128MB each. But all these 86 blocks were stored in one data node. Is it common behaviour?

I expected it to be distribute all 86 blocks across all 3 nodes?

Is there any configuration to do this distribution?

1 REPLY 1

Expert Contributor

Hi,

Yes this is default behavior (if you're placing the file from within a data node). You can have them distributed by issuing hadoop fs -put command from a client that isn't running DataNode.

According to docs:

* The replica placement strategy is that if the writer is on a datanode,
 * the 1st replica is placed on the local machine, 
 * otherwise a random datanode. The 2nd replica is placed on a datanode
 * that is on a different rack. The 3rd replica is placed on a datanode
 * which is on the same rack as the first replica.