Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

multiple Datanode directories write operation

avatar
Contributor

iam using hadoop apache 2.7.1 and i have configured data node directory to have multiple directories

        <property>
                 <name>dfs.data.dir</name>
                 <value>/opt/hadoop/data_dir,file:///hdd/data_dir/</value>
                 <final>true</final>
        </property>

according to this configuration writing file data should happens on both directories /opt/hadoop/data_dir and file:///hdd/data_dir/ with same blocks names and on same sub directories names

but in my cluster this behavior is not happening some times it writes data blocks to local directory /opt/hadoop/data_dir and some times it writes data blocks to

external hard directory file:///hdd/data_dir

what could be possible reasons and how to control this behavior

1 REPLY 1

avatar
Contributor

The parameter to specify more than one path for storage in Hadoop is in hdfs-site.xml.

Property: dfs.datanode.data.dir (Please verify)

dfs.datanode.data.dir value can be any directory which is available on the datanode.It determines where on the local filesystem data node should store its blocks.

It can be a directory where disk partitions are mounted like '/user1/hadoop/data, /user2/hadoop/data' which is in case if you have multiple disks partitions to be used for HDFS the purpose. When it has multiple values, data is copied to the HDFS in a round-robin fashion. If one of the directory's disk is full, round-robin data copy will continue on the rest of the directories.

You can also define the Storage system in the HDFS for multiple locations Please refer below link.

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_hdfs_admin_tools/content/configuring_arc...