Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

HDFS datanode.data.dir

avatar
Contributor

I have a cluster of machines with 3 datanodes. i have added multiple directories for datanode.data.dir let say /var/home and /home/test. so my question is if my dfs.replication.factor is 3. Will it write the file replicas on 3 nodes & will it create same replica of data on multiple directory of every disk? 

1 ACCEPTED SOLUTION

avatar
Mentor
If you use 5.9+ or can upgrade to it, add the disk to configuration, and
use this feature:
http://blog.cloudera.com/blog/2016/10/how-to-use-the-new-hdfs-intra-datanode-disk-balancer-in-apache...

View solution in original post

3 REPLIES 3

avatar
Mentor
The replication factor is one replica per DataNode _instance_, not per
DataNode _Disk_.

Regardless of how many disk (directories - dfs.datanode.data.dirs) you
configure per DataNode, the replica on each DataNode host will be uniquely
one and may reside on any one disk.

avatar
Contributor
Thanks Harsh for your help.. Actually i am planning to change the datanode dir to new location as my present directory is almost full & new dir is pretty empty both are on same disk? can you tell me what i should do in this case?

avatar
Mentor
If you use 5.9+ or can upgrade to it, add the disk to configuration, and
use this feature:
http://blog.cloudera.com/blog/2016/10/how-to-use-the-new-hdfs-intra-datanode-disk-balancer-in-apache...