Created on 11-20-2015 06:05 PM
To Fix under-replicated blocks in HDFS, below is quick instruction to use:
####Fix under-replicated blocks###
su - <$hdfs_user> bash-4.1$ hdfs fsck / | grep 'Under replicated' | awk -F':' '{print $1}' >> /tmp/under_replicated_files -bash-4.1$ for hdfsfile in `cat /tmp/under_replicated_files`; do echo "Fixing $hdfsfile :" ; hadoop fs -setrep 3 $hdfsfile; done
Created on 01-26-2017 04:30 PM
An improvement to this is to send multiple files to the setrep command at once.
-bash-4.1$ xargs -n 1000 hadoop fs -setrep 3 < /tmp/under_replicated_files
This will send 1000 paths to setrep at a time, which I found to be loads faster. You may also want to redirect the output since the assumption is that very many files need their replication set.
Created on 04-16-2017 04:03 AM
Potentially silly question: When you set the rep count, do you count the "original" data block as well? For example, I have 3 data nodes and I want one block on each of those nodes (3 blocks total). Is that 2 replicas or 3?
Created on 05-23-2017 10:00 AM
@Pardeep
This code in theory runs perfectly for me with the hdfs stdout showing:
Replication 3 set: /apps/hive/warehouse....
however once the script has finished, the blocks still remain under replicated.
Any idea as to what else I could do?
Created on 06-19-2018 08:57 PM
Thanks, Pardeep.
To make it 500x faster, do 500 files per call to the hadoop command. By changing the second line above, we can do this instead:
$ hdfs fsck / | grep 'Under replicated' | awk -F':' '{print $1}' >> /tmp/under_replicated_files # Now using xargs -n 500 (or --max-args 500) $ cat /tmp/under_replicated_files |xargs -n 500 hdfs dfs -setrep 1 /tmp/under_replicated_files<br>
Created on 12-12-2024 09:40 AM
Though one can do the manual intervention to fix the under replicated blocks, HDFS has matured a lot and the NameNode will take care of fixing the under replicated blocks on its own. The drawback for doing the manual step is that it may add additional load to the NameNode Operations and may cause performance degradation with existing jobs. So if you plan to do manually you may do it at least business hours or over the weekend.