I have 3 node HDFS cluster with replication factor as 1. I have
copied 10GB file in to HDFS with “dfs hdfs -put” command then it was divided
into 86 blocks of 128MB each. But all these 86 blocks were stored in one data
node. Is it common behaviour?
I expected it to be distribute all 86 blocks across all 3
Is there any configuration to do this distribution?
Yes this is default behavior (if you're placing the file from within a data node). You can have them distributed by issuing hadoop fs -put command from a client that isn't running DataNode.
According to docs:
* The replica placement strategy is that if the writer is on a datanode,
* the 1st replica is placed on the local machine,
* otherwise a random datanode. The 2nd replica is placed on a datanode
* that is on a different rack. The 3rd replica is placed on a datanode
* which is on the same rack as the first replica.