Support Questions

Find answers, ask questions, and share your expertise

multiple Datanode directories write operation

avatar
Rising Star

iam using hadoop apache 2.7.1 and i have configured data node directory to have multiple directories

        <property>
                 <name>dfs.data.dir</name>
                 <value>/opt/hadoop/data_dir,file:///hdd/data_dir/</value>
                 <final>true</final>
        </property>

according to this configuration writing file data should happens on both directories /opt/hadoop/data_dir and file:///hdd/data_dir/ with same blocks names and on same sub directories names

but in my cluster this behavior is not happening some times it writes data blocks to local directory /opt/hadoop/data_dir and some times it writes data blocks to

external hard directory file:///hdd/data_dir

what could be possible reasons and how to control this behavior

1 REPLY 1

avatar
Contributor

The parameter to specify more than one path for storage in Hadoop is in hdfs-site.xml.

Property: dfs.datanode.data.dir (Please verify)

dfs.datanode.data.dir value can be any directory which is available on the datanode.It determines where on the local filesystem data node should store its blocks.

It can be a directory where disk partitions are mounted like '/user1/hadoop/data, /user2/hadoop/data' which is in case if you have multiple disks partitions to be used for HDFS the purpose. When it has multiple values, data is copied to the HDFS in a round-robin fashion. If one of the directory's disk is full, round-robin data copy will continue on the rest of the directories.

You can also define the Storage system in the HDFS for multiple locations Please refer below link.

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_hdfs_admin_tools/content/configuring_arc...