Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

HDFS cluster with replication factor as 1, storing all blocks of a file in one data node. Is it a common behavior?

HDFS cluster with replication factor as 1, storing all blocks of a file in one data node. Is it a common behavior?

Contributor

Hi team,

I have 3 node HDFS cluster with replication factor as 1. I have copied 10GB file in to HDFS with “dfs hdfs -put” command then it was divided into 86 blocks of 128MB each. But all these 86 blocks were stored in one data node. Is it common behaviour?

I expected it to be distribute all 86 blocks across all 3 nodes?

Is there any configuration to do this distribution?

1 REPLY 1

Re: HDFS cluster with replication factor as 1, storing all blocks of a file in one data node. Is it a common behavior?

Expert Contributor

Hi,

Yes this is default behavior (if you're placing the file from within a data node). You can have them distributed by issuing hadoop fs -put command from a client that isn't running DataNode.

According to docs:

* The replica placement strategy is that if the writer is on a datanode,
 * the 1st replica is placed on the local machine, 
 * otherwise a random datanode. The 2nd replica is placed on a datanode
 * that is on a different rack. The 3rd replica is placed on a datanode
 * which is on the same rack as the first replica.