Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Change hdfs replication on a existing large hdfs directory

Change hdfs replication on a existing large hdfs directory

Contributor

I have a pre-existing hdfs directory which is 10 TB in size and I want to change the replication of this from 3 to 1. What is the best possible way I can achieve this?

3 REPLIES 3

Re: Change hdfs replication on a existing large hdfs directory

Champion

Try 

 

 

Single file
hadoop fs –setrep –w 2 /path/to/file

or 

Entire directory
hadoop fs –setrep –w 2 -R /path/to/dir

 

Re: Change hdfs replication on a existing large hdfs directory

Master Guru
A replication factor of 1 isn't really recommended. Even a simple disk loss
somewhere in the cluster will cause a loss of such data.

In any case, the approach is to run the below (raise heap via
HADOOP_CLIENT_OPTS if the directory listing is coming up too large for the
CLI JVM to handle):

hadoop fs -setrep -R 1 /large/directory

Then monitor the over-replicated blocks in Cloudera Manager via the below
chart tsquery:

SELECT excess_blocks WHERE roleType = NAMENODE

This should show a spike and then begin a slow but steady drop back to zero
over time, which you can monitor.
Highlighted

Re: Change hdfs replication on a existing large hdfs directory

Contributor
Thanks for the solution Harsha. There is always a backup for this large directory so it is good to have a 1 replication but I will definitely reconsider your suggestion.