Support Questions

Find answers, ask questions, and share your expertise

"How we can change Replication factor when Data is already stored in HDFS? "

avatar

I have stored my data in HDFS, now how can I change the replication factor?

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Shailna Patidar

Yes you can do that here is a variation of doing that to a particular file or entire directory

You can use the below commands to set replication of an individual file to 4

hadoop dfs -setrep -w 4 /path of the file 

The below command will change for all the files under it recursively.To change replication of entire directory under HDFS to 4:

hadoop dfs -setrep -R -w 4 /Directory path

Hope that helps

View solution in original post

5 REPLIES 5

avatar
Master Mentor

@Shailna Patidar

Yes you can do that here is a variation of doing that to a particular file or entire directory

You can use the below commands to set replication of an individual file to 4

hadoop dfs -setrep -w 4 /path of the file 

The below command will change for all the files under it recursively.To change replication of entire directory under HDFS to 4:

hadoop dfs -setrep -R -w 4 /Directory path

Hope that helps

avatar
New Contributor

If we change the replication factor from Cloudera or Ambari console will it work for existing blocks as well?

avatar
Master Mentor

@Shailna Patidar

Any feedback?

To check the new replication factor use the below command in this example my_secret has a replication factor of 4

$ hdfs dfs -ls
Found 4 items
drwx------   - hive hdfs          0 2014-01-29 06:14 .staging
-rw-r--r--   4 hive hdfs       1943 2014-01-24 01:01 my_secret
drwxr-xr-x   - hive hdfs          0 2014-04-22 12:45 test
drwxr-xr-x   - hive hdfs          0 2014-04-22 12:45 payroll.csv

Hope that helped

avatar

Replication factor property is globally set in hdfs-site.xml, the default value is 3. The replication factor for the data stored in the HDFS can be modified by using the below command,

Hadoop fs -setrep -R 5 /
Here replication factor is changed to 5 using –setrep command.

We can define the replication factor for a file or directory or an entire system by specifying the file or directory or an entire system in the above command
File:
hadoop fs –setrep –w 3 /my/file
Directory:
hadoop fs –setrep –w 3 -R /my/dir

avatar
Master Mentor

@Shailna Patidar

Are you still encountering problems, YES globally its ser the hdfs-site.xml but the default is 3. Your question was to change the replication factor of an existing file I think your question was answered.

If you found this answer addressed your question, please take a moment to log in and click the "Accept" link on the answer.

That would be a great help to Community users to find the solution quickly for these kinds of question.