Reply
Contributor
Posts: 47
Registered: ‎12-28-2015

Change hdfs replication on a existing large hdfs directory

I have a pre-existing hdfs directory which is 10 TB in size and I want to change the replication of this from 3 to 1. What is the best possible way I can achieve this?

Posts: 642
Topics: 3
Kudos: 118
Solutions: 67
Registered: ‎08-16-2016

Re: Change hdfs replication on a existing large hdfs directory

Try 

 

 

Single file
hadoop fs –setrep –w 2 /path/to/file

or 

Entire directory
hadoop fs –setrep –w 2 -R /path/to/dir

 

Posts: 1,826
Kudos: 406
Solutions: 292
Registered: ‎07-31-2013

Re: Change hdfs replication on a existing large hdfs directory

A replication factor of 1 isn't really recommended. Even a simple disk loss
somewhere in the cluster will cause a loss of such data.

In any case, the approach is to run the below (raise heap via
HADOOP_CLIENT_OPTS if the directory listing is coming up too large for the
CLI JVM to handle):

hadoop fs -setrep -R 1 /large/directory

Then monitor the over-replicated blocks in Cloudera Manager via the below
chart tsquery:

SELECT excess_blocks WHERE roleType = NAMENODE

This should show a spike and then begin a slow but steady drop back to zero
over time, which you can monitor.
Contributor
Posts: 47
Registered: ‎12-28-2015

Re: Change hdfs replication on a existing large hdfs directory

Thanks for the solution Harsha. There is always a backup for this large directory so it is good to have a 1 replication but I will definitely reconsider your suggestion.
Announcements