Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Re: Data block storage directory can't work

Re: Data block storage directory can't work

New Contributor

Hi Darren,

 

Thanks for your reply.

 

My system is only for testing now so that we can ignore the data loss. And currently I am not using the CM, but manually install a HDFS cluster with one name node and two data nodes. How to make the new hdfs-site.xml take effect now after I update the data directory property in hdfs-site.xml?

 

By default each data block has two replication in other data nodes, but what I am doing is redundant the whole data set. So that I want to add an additional data directory in hdfs-site.xml, so that the data can be written to two places all the time. I also notice that there is HDFS Replication in CM. Is it replication in two directory in each data node? Or does it setup a separate HDFS cluster then synchronize between two clusters? Could you tell me the expert on the HDFS replication?

http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Ent/4.7.1/Cloudera-Backup-Disaster...

 

 

Thanks,

Jack Chen 

4 REPLIES 4

Data block storage directory can't work

New Contributor

Hi all,

 
I customized data block storage directory in /etc/hadoop/conf.my_cluster/hdfs-site.xml but it doesn't take effect. It still store data blocks in /tmp/hadoop-root/dfs/data/current/.
 
1: Do I need to add the config file path /etc/hadoop/conf.my_cluster into some config place?
2: Do I need to restart HDFS after my change in hdfs-site.xml?
2: Can I add an additional data block storage directory in HDFS? So that every data block will be stored in two places for disaster recovery.
 
The configuration for datanode in my hdfs-site.xml is as below.
<property>
      <name>dfs.datanode.data.dir</name>
      <value>/data/1/dfs/dn,/data/2/dfs/dn,/data/3/dfs/dn,/data/4/dfs/dn</value>
 </property>
 
Thanks,
Jack Chen

Re: Data block storage directory can't work

New Contributor

Hi all,

 
Sorry. I have copy hdfs-site.xml and core-site.xml from /etc/hadoop/conf.my_cluster to /etc/hadoop/conf. But the data blocks are still stored in the default directory /tmp/hadoop-root/dfs/data/current/, but not my customized directory. Could anyone tell me how to make the customized data block directory take effect? Thanks.
 
Config info in core-site.xml:
<configuration>
   <property>
     <name>fs.defaultFS</name>
     <value>hdfs://CDHNode1.cn.com:8020</value>
  </property>
 </configuration>
 
Config info in hdfs-site.xml:
<configuration>
   #<property>
   #   <name>dfs.name.dir</name>
   #   <value>/var/lib/hadoop-hdfs/cache/hdfs/dfs/name</value>
   #</property>
   <property>
     <name>dfs.datanode.data.dir</name>
     <value>/data/1/dfs/dn,/data/2/dfs/dn,/data/3/dfs/dn,/data/4/dfs/dn</value>
   </property>
   <property>
     <name>dfs.permissions.superusergroup</name>
     <value>root</value>
   </property>
 </configuration>
 
Thanks,
Jack Chen

Re: Data block storage directory can't work

Hi Jack,

 

When using CM, you should use the UI (or API) to modify configuration, and basically never manually edit a file. I suggest you read this to better understand CM: http://blog.cloudera.com/blog/2013/07/how-does-cloudera-manager-work/

 

Since you are changing data dirs, you will need to be careful with your exact steps to make sure you don't lose data or cause unnecessary re-replication.

 

To edit this configuration, open up your CM UI (usually listening on port 7180), log in click on HDFS, then click on Configuration -> View and Edit. Search for "data dir" in the upper left search box, then edit the appropriate configuration. Restart HDFS data nodes (or the whole service) for this to take effect. You may want to stop HDFS, move the data to the new data dirs, make the config change, then start up HDFS again. It's probably a bad idea to move data files while the data node is running.

 

Note that setting extra data dirs is not going to help if your goal is backup / disaster recovery. By default, three copies of all blocks of data are stored on your cluster, on different machines. You can change the replication factor if this is not to your liking (in HDFS config).

 

The enterprise edition of CM can help you with true backup / disaster recovery, but basically you should set up a second cluster in a different physical location and keep them in sync. I'm not an expert on that subject though.

 

Thanks,

Darren

Re: Data block storage directory can't work

Hi Jack,

 

This forum is for Clooudera Manager questions. If not using Cloudera Manager, you may have better luck on the forum for HDFS, where you will find HDFS experts:

http://community.cloudera.com/t5/Storage-Random-Access-HDFS/bd-p/StorageFormat

 

Why do you want to put two copies of the data on the same machine? Storing replicas on different machines will protect against both disk and machine failure, and improve concurrency, all better than if the copies were on the same machine.

 

HDFS Replication in CM will replicate your HDFS data from one cluster to another.

 

Thanks,

Darren