I have 3 node HDFS cluster with replication factor as 1. I have copied 10GB file in to HDFS with “dfs hdfs -put” command then it was divided into 86 blocks of 128MB each. But all these 86 blocks were stored in one data node. Is it common behaviour?
I expected it to be distribute all 86 blocks across all 3 nodes?
Is there any configuration to do this distribution?
Yes this is default behavior (if you're placing the file from within a data node). You can have them distributed by issuing hadoop fs -put command from a client that isn't running DataNode.
According to docs:
* The replica placement strategy is that if the writer is on a datanode, * the 1st replica is placed on the local machine, * otherwise a random datanode. The 2nd replica is placed on a datanode * that is on a different rack. The 3rd replica is placed on a datanode * which is on the same rack as the first replica.