Created 08-15-2017 12:33 PM
iam using hadoop apache 2.7.1 and i have configured data node directory to have multiple directories
<property>
<name>dfs.data.dir</name>
<value>/opt/hadoop/data_dir,file:///hdd/data_dir/</value>
<final>true</final>
</property>
according to this configuration writing file data should happens on both directories /opt/hadoop/data_dir and file:///hdd/data_dir/ with same blocks names and on same sub directories names
but in my cluster this behavior is not happening some times it writes data blocks to local directory /opt/hadoop/data_dir and some times it writes data blocks to
external hard directory file:///hdd/data_dir
what could be possible reasons and how to control this behavior
Created 08-15-2017 03:51 PM
The parameter to specify more than one path for storage in Hadoop is in hdfs-site.xml.
Property: dfs.datanode.data.dir (Please verify)
dfs.datanode.data.dir value can be any directory which is available on the datanode.It determines where on the local filesystem data node should store its blocks.
It can be a directory where disk partitions are mounted like '/user1/hadoop/data, /user2/hadoop/data' which is in case if you have multiple disks partitions to be used for HDFS the purpose. When it has multiple values, data is copied to the HDFS in a round-robin fashion. If one of the directory's disk is full, round-robin data copy will continue on the rest of the directories.
You can also define the Storage system in the HDFS for multiple locations Please refer below link.