- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
multiple Datanode directories write operation
- Labels:
-
Apache Hadoop
Created ‎08-15-2017 12:33 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
iam using hadoop apache 2.7.1 and i have configured data node directory to have multiple directories
<property>
<name>dfs.data.dir</name>
<value>/opt/hadoop/data_dir,file:///hdd/data_dir/</value>
<final>true</final>
</property>
according to this configuration writing file data should happens on both directories /opt/hadoop/data_dir and file:///hdd/data_dir/ with same blocks names and on same sub directories names
but in my cluster this behavior is not happening some times it writes data blocks to local directory /opt/hadoop/data_dir and some times it writes data blocks to
external hard directory file:///hdd/data_dir
what could be possible reasons and how to control this behavior
Created ‎08-15-2017 03:51 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The parameter to specify more than one path for storage in Hadoop is in hdfs-site.xml.
Property: dfs.datanode.data.dir (Please verify)
dfs.datanode.data.dir value can be any directory which is available on the datanode.It determines where on the local filesystem data node should store its blocks.
It can be a directory where disk partitions are mounted like '/user1/hadoop/data, /user2/hadoop/data' which is in case if you have multiple disks partitions to be used for HDFS the purpose. When it has multiple values, data is copied to the HDFS in a round-robin fashion. If one of the directory's disk is full, round-robin data copy will continue on the rest of the directories.
You can also define the Storage system in the HDFS for multiple locations Please refer below link.
