Support Questions

Find answers, ask questions, and share your expertise

Change hdfs replication on a existing large hdfs directory


I have a pre-existing hdfs directory which is 10 TB in size and I want to change the replication of this from 3 to 1. What is the best possible way I can achieve this?






Single file
hadoop fs –setrep –w 2 /path/to/file


Entire directory
hadoop fs –setrep –w 2 -R /path/to/dir


Master Guru
A replication factor of 1 isn't really recommended. Even a simple disk loss
somewhere in the cluster will cause a loss of such data.

In any case, the approach is to run the below (raise heap via
HADOOP_CLIENT_OPTS if the directory listing is coming up too large for the
CLI JVM to handle):

hadoop fs -setrep -R 1 /large/directory

Then monitor the over-replicated blocks in Cloudera Manager via the below
chart tsquery:

SELECT excess_blocks WHERE roleType = NAMENODE

This should show a spike and then begin a slow but steady drop back to zero
over time, which you can monitor.

Thanks for the solution Harsha. There is always a backup for this large directory so it is good to have a 1 replication but I will definitely reconsider your suggestion.
Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.