iam using hadoop apache 2.7.1 and i have configured data node directory to have multiple directories
<property> <name>dfs.data.dir</name> <value>/opt/hadoop/data_dir,file:///hdd/data_dir/</value> <final>true</final> </property>
according to this configuration writing file data should happens on both directories /opt/hadoop/data_dir and file:///hdd/data_dir/ with same blocks names and on same sub directories names
but in my cluster this behavior is not happening some times it writes data blocks to local directory /opt/hadoop/data_dir and some times it writes data blocks to
external hard directory
what could be possible reasons and how to control this behavior
The parameter to specify more than one path for storage in Hadoop is in hdfs-site.xml.
Property: dfs.datanode.data.dir (Please verify)
dfs.datanode.data.dir value can be any directory which is available on the datanode.It determines where on the local filesystem data node should store its blocks.
It can be a directory where disk partitions are mounted like '/user1/hadoop/data, /user2/hadoop/data' which is in case if you have multiple disks partitions to be used for HDFS the purpose. When it has multiple values, data is copied to the HDFS in a round-robin fashion. If one of the directory's disk is full, round-robin data copy will continue on the rest of the directories.
You can also define the Storage system in the HDFS for multiple locations Please refer below link.