Community Articles

pardeep_kumar · ‎11-20-2015

To Fix under-replicated blocks in HDFS, below is quick instruction to use:

####Fix under-replicated blocks###

su - <$hdfs_user>

bash-4.1$ hdfs fsck / | grep 'Under replicated' | awk -F':' '{print $1}' >> /tmp/under_replicated_files 

-bash-4.1$ for hdfsfile in `cat /tmp/under_replicated_files`; do echo "Fixing $hdfsfile :" ;  hadoop fs -setrep 3 $hdfsfile; done

matthew_dailey1 · ‎01-26-2017

An improvement to this is to send multiple files to the setrep command at once.

-bash-4.1$ xargs -n 1000 hadoop fs -setrep 3 < /tmp/under_replicated_files

This will send 1000 paths to setrep at a time, which I found to be loads faster. You may also want to redirect the output since the assumption is that very many files need their replication set.

jarnold · ‎04-16-2017

Potentially silly question: When you set the rep count, do you count the "original" data block as well? For example, I have 3 data nodes and I want one block on each of those nodes (3 blocks total). Is that 2 replicas or 3?

daleb · ‎05-23-2017

@Pardeep

This code in theory runs perfectly for me with the hdfs stdout showing:

Replication 3 set: /apps/hive/warehouse....

however once the script has finished, the blocks still remain under replicated.

Any idea as to what else I could do?

james_jones · ‎06-19-2018

Thanks, Pardeep.

To make it 500x faster, do 500 files per call to the hadoop command. By changing the second line above, we can do this instead:

$ hdfs fsck / | grep 'Under replicated' | awk -F':' '{print $1}' >> /tmp/under_replicated_files


# Now using xargs -n 500 (or --max-args 500)
$ cat /tmp/under_replicated_files |xargs -n 500  hdfs dfs -setrep 1 /tmp/under_replicated_files<br>

sathishkr · ‎12-12-2024

Though one can do the manual intervention to fix the under replicated blocks, HDFS has matured a lot and the NameNode will take care of fixing the under replicated blocks on its own. The drawback for doing the manual step is that it may add additional load to the NameNode Operations and may cause performance degradation with existing jobs. So if you plan to do manually you may do it at least business hours or over the weekend.

Cloudera Community

Community Articles

Fix Under-replicated blocks in HDFS manually

Apache Hadoop

Re: Fix Under-replicated blocks in HDFS manually

Re: Fix Under-replicated blocks in HDFS manually

Re: Fix Under-replicated blocks in HDFS manually

Re: Fix Under-replicated blocks in HDFS manually

Re: Fix Under-replicated blocks in HDFS manually