Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

HDFS datanode.data.dir

avatar
Contributor

I have a cluster of machines with 3 datanodes. i have added multiple directories for datanode.data.dir let say /var/home and /home/test. so my question is if my dfs.replication.factor is 3. Will it write the file replicas on 3 nodes & will it create same replica of data on multiple directory of every disk? 

1 ACCEPTED SOLUTION

avatar
Mentor
If you use 5.9+ or can upgrade to it, add the disk to configuration, and
use this feature:
http://blog.cloudera.com/blog/2016/10/how-to-use-the-new-hdfs-intra-datanode-disk-balancer-in-apache...

View solution in original post

3 REPLIES 3

avatar
Mentor
The replication factor is one replica per DataNode _instance_, not per
DataNode _Disk_.

Regardless of how many disk (directories - dfs.datanode.data.dirs) you
configure per DataNode, the replica on each DataNode host will be uniquely
one and may reside on any one disk.

avatar
Contributor
Thanks Harsh for your help.. Actually i am planning to change the datanode dir to new location as my present directory is almost full & new dir is pretty empty both are on same disk? can you tell me what i should do in this case?

avatar
Mentor
If you use 5.9+ or can upgrade to it, add the disk to configuration, and
use this feature:
http://blog.cloudera.com/blog/2016/10/how-to-use-the-new-hdfs-intra-datanode-disk-balancer-in-apache...