Support Questions
Find answers, ask questions, and share your expertise

multiple Datanode directories write operation

multiple Datanode directories write operation

iam using hadoop apache 2.7.1 and i have configured data node directory to have multiple directories


according to this configuration writing file data should happens on both directories /opt/hadoop/data_dir and file:///hdd/data_dir/ with same blocks names and on same sub directories names

but in my cluster this behavior is not happening some times it writes data blocks to local directory /opt/hadoop/data_dir and some times it writes data blocks to

external hard directory file:///hdd/data_dir

what could be possible reasons and how to control this behavior


Re: multiple Datanode directories write operation

The parameter to specify more than one path for storage in Hadoop is in hdfs-site.xml.

Property: (Please verify) value can be any directory which is available on the datanode.It determines where on the local filesystem data node should store its blocks.

It can be a directory where disk partitions are mounted like '/user1/hadoop/data, /user2/hadoop/data' which is in case if you have multiple disks partitions to be used for HDFS the purpose. When it has multiple values, data is copied to the HDFS in a round-robin fashion. If one of the directory's disk is full, round-robin data copy will continue on the rest of the directories.

You can also define the Storage system in the HDFS for multiple locations Please refer below link.

Don't have an account?