Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

"How we can change Replication factor when Data is already stored in HDFS? "

avatar

I have stored my data in HDFS, now how can I change the replication factor?

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Shailna Patidar

Yes you can do that here is a variation of doing that to a particular file or entire directory

You can use the below commands to set replication of an individual file to 4

hadoop dfs -setrep -w 4 /path of the file 

The below command will change for all the files under it recursively.To change replication of entire directory under HDFS to 4:

hadoop dfs -setrep -R -w 4 /Directory path

Hope that helps

View solution in original post

5 REPLIES 5

avatar
Master Mentor

@Shailna Patidar

Yes you can do that here is a variation of doing that to a particular file or entire directory

You can use the below commands to set replication of an individual file to 4

hadoop dfs -setrep -w 4 /path of the file 

The below command will change for all the files under it recursively.To change replication of entire directory under HDFS to 4:

hadoop dfs -setrep -R -w 4 /Directory path

Hope that helps

avatar
New Contributor

If we change the replication factor from Cloudera or Ambari console will it work for existing blocks as well?

avatar
Master Mentor

@Shailna Patidar

Any feedback?

To check the new replication factor use the below command in this example my_secret has a replication factor of 4

$ hdfs dfs -ls
Found 4 items
drwx------   - hive hdfs          0 2014-01-29 06:14 .staging
-rw-r--r--   4 hive hdfs       1943 2014-01-24 01:01 my_secret
drwxr-xr-x   - hive hdfs          0 2014-04-22 12:45 test
drwxr-xr-x   - hive hdfs          0 2014-04-22 12:45 payroll.csv

Hope that helped

avatar

Replication factor property is globally set in hdfs-site.xml, the default value is 3. The replication factor for the data stored in the HDFS can be modified by using the below command,

Hadoop fs -setrep -R 5 /
Here replication factor is changed to 5 using –setrep command.

We can define the replication factor for a file or directory or an entire system by specifying the file or directory or an entire system in the above command
File:
hadoop fs –setrep –w 3 /my/file
Directory:
hadoop fs –setrep –w 3 -R /my/dir

avatar
Master Mentor

@Shailna Patidar

Are you still encountering problems, YES globally its ser the hdfs-site.xml but the default is 3. Your question was to change the replication factor of an existing file I think your question was answered.

If you found this answer addressed your question, please take a moment to log in and click the "Accept" link on the answer.

That would be a great help to Community users to find the solution quickly for these kinds of question.