Community Articles
Find and share helpful community-sourced technical articles
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Labels (1)

To Fix under-replicated blocks in HDFS, below is quick instruction to use:

####Fix under-replicated blocks###

su - <$hdfs_user>

bash-4.1$ hdfs fsck / | grep 'Under replicated' | awk -F':' '{print $1}' >> /tmp/under_replicated_files 

-bash-4.1$ for hdfsfile in `cat /tmp/under_replicated_files`; do echo "Fixing $hdfsfile :" ;  hadoop fs -setrep 3 $hdfsfile; done
Not applicable

An improvement to this is to send multiple files to the setrep command at once.

-bash-4.1$ xargs -n 1000 hadoop fs -setrep 3 < /tmp/under_replicated_files

This will send 1000 paths to setrep at a time, which I found to be loads faster. You may also want to redirect the output since the assumption is that very many files need their replication set.


Potentially silly question: When you set the rep count, do you count the "original" data block as well? For example, I have 3 data nodes and I want one block on each of those nodes (3 blocks total). Is that 2 replicas or 3?

Rising Star


This code in theory runs perfectly for me with the hdfs stdout showing:

Replication 3 set: /apps/hive/warehouse....

however once the script has finished, the blocks still remain under replicated.

Any idea as to what else I could do?

Super Collaborator

Thanks, Pardeep.

To make it 500x faster, do 500 files per call to the hadoop command. By changing the second line above, we can do this instead:

$ hdfs fsck / | grep 'Under replicated' | awk -F':' '{print $1}' >> /tmp/under_replicated_files

# Now using xargs -n 500 (or --max-args 500)
$ cat /tmp/under_replicated_files |xargs -n 500  hdfs dfs -setrep 1 /tmp/under_replicated_files<br>
Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
1 of 1
Last update:
‎11-20-2015 06:05 PM
Updated by:
Top Kudoed Authors