Created on 11-20-2015 06:05 PM
To Fix under-replicated blocks in HDFS, below is quick instruction to use:
####Fix under-replicated blocks###
su - <$hdfs_user> bash-4.1$ hdfs fsck / | grep 'Under replicated' | awk -F':' '{print $1}' >> /tmp/under_replicated_files -bash-4.1$ for hdfsfile in `cat /tmp/under_replicated_files`; do echo "Fixing $hdfsfile :" ; hadoop fs -setrep 3 $hdfsfile; done
Created on 01-26-2017 04:30 PM
An improvement to this is to send multiple files to the setrep command at once.
-bash-4.1$ xargs -n 1000 hadoop fs -setrep 3 < /tmp/under_replicated_files
This will send 1000 paths to setrep at a time, which I found to be loads faster. You may also want to redirect the output since the assumption is that very many files need their replication set.
Created on 04-16-2017 04:03 AM
Potentially silly question: When you set the rep count, do you count the "original" data block as well? For example, I have 3 data nodes and I want one block on each of those nodes (3 blocks total). Is that 2 replicas or 3?
Created on 05-23-2017 10:00 AM
@Pardeep
This code in theory runs perfectly for me with the hdfs stdout showing:
Replication 3 set: /apps/hive/warehouse....
however once the script has finished, the blocks still remain under replicated.
Any idea as to what else I could do?
Created on 06-19-2018 08:57 PM
Thanks, Pardeep.
To make it 500x faster, do 500 files per call to the hadoop command. By changing the second line above, we can do this instead:
$ hdfs fsck / | grep 'Under replicated' | awk -F':' '{print $1}' >> /tmp/under_replicated_files # Now using xargs -n 500 (or --max-args 500) $ cat /tmp/under_replicated_files |xargs -n 500 hdfs dfs -setrep 1 /tmp/under_replicated_files<br>