Support Questions

Find answers, ask questions, and share your expertise
Celebrating as our community reaches 100,000 members! Thank you!

multiple Datanode directories write operation


iam using hadoop apache 2.7.1 and i have configured data node directory to have multiple directories


according to this configuration writing file data should happens on both directories /opt/hadoop/data_dir and file:///hdd/data_dir/ with same blocks names and on same sub directories names

but in my cluster this behavior is not happening some times it writes data blocks to local directory /opt/hadoop/data_dir and some times it writes data blocks to

external hard directory file:///hdd/data_dir

what could be possible reasons and how to control this behavior



The parameter to specify more than one path for storage in Hadoop is in hdfs-site.xml.

Property: (Please verify) value can be any directory which is available on the datanode.It determines where on the local filesystem data node should store its blocks.

It can be a directory where disk partitions are mounted like '/user1/hadoop/data, /user2/hadoop/data' which is in case if you have multiple disks partitions to be used for HDFS the purpose. When it has multiple values, data is copied to the HDFS in a round-robin fashion. If one of the directory's disk is full, round-robin data copy will continue on the rest of the directories.

You can also define the Storage system in the HDFS for multiple locations Please refer below link.