Thanks for your reply.
My system is only for testing now so that we can ignore the data loss. And currently I am not using the CM, but manually install a HDFS cluster with one name node and two data nodes. How to make the new hdfs-site.xml take effect now after I update the data directory property in hdfs-site.xml?
By default each data block has two replication in other data nodes, but what I am doing is redundant the whole data set. So that I want to add an additional data directory in hdfs-site.xml, so that the data can be written to two places all the time. I also notice that there is HDFS Replication in CM. Is it replication in two directory in each data node? Or does it setup a separate HDFS cluster then synchronize between two clusters? Could you tell me the expert on the HDFS replication?
When using CM, you should use the UI (or API) to modify configuration, and basically never manually edit a file. I suggest you read this to better understand CM: http://blog.cloudera.com/blog/2013/07/how-does-cloudera-manager-work/
Since you are changing data dirs, you will need to be careful with your exact steps to make sure you don't lose data or cause unnecessary re-replication.
To edit this configuration, open up your CM UI (usually listening on port 7180), log in click on HDFS, then click on Configuration -> View and Edit. Search for "data dir" in the upper left search box, then edit the appropriate configuration. Restart HDFS data nodes (or the whole service) for this to take effect. You may want to stop HDFS, move the data to the new data dirs, make the config change, then start up HDFS again. It's probably a bad idea to move data files while the data node is running.
Note that setting extra data dirs is not going to help if your goal is backup / disaster recovery. By default, three copies of all blocks of data are stored on your cluster, on different machines. You can change the replication factor if this is not to your liking (in HDFS config).
The enterprise edition of CM can help you with true backup / disaster recovery, but basically you should set up a second cluster in a different physical location and keep them in sync. I'm not an expert on that subject though.
This forum is for Clooudera Manager questions. If not using Cloudera Manager, you may have better luck on the forum for HDFS, where you will find HDFS experts:
Why do you want to put two copies of the data on the same machine? Storing replicas on different machines will protect against both disk and machine failure, and improve concurrency, all better than if the copies were on the same machine.
HDFS Replication in CM will replicate your HDFS data from one cluster to another.