Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (1)
avatar

To Fix under-replicated blocks in HDFS, below is quick instruction to use:

####Fix under-replicated blocks###

su - <$hdfs_user>

bash-4.1$ hdfs fsck / | grep 'Under replicated' | awk -F':' '{print $1}' >> /tmp/under_replicated_files 

-bash-4.1$ for hdfsfile in `cat /tmp/under_replicated_files`; do echo "Fixing $hdfsfile :" ;  hadoop fs -setrep 3 $hdfsfile; done
69,055 Views
Comments
avatar
New Contributor

An improvement to this is to send multiple files to the setrep command at once.

-bash-4.1$ xargs -n 1000 hadoop fs -setrep 3 < /tmp/under_replicated_files

This will send 1000 paths to setrep at a time, which I found to be loads faster. You may also want to redirect the output since the assumption is that very many files need their replication set.

avatar
Contributor

Potentially silly question: When you set the rep count, do you count the "original" data block as well? For example, I have 3 data nodes and I want one block on each of those nodes (3 blocks total). Is that 2 replicas or 3?

avatar
Expert Contributor

@Pardeep

This code in theory runs perfectly for me with the hdfs stdout showing:

Replication 3 set: /apps/hive/warehouse....

however once the script has finished, the blocks still remain under replicated.

Any idea as to what else I could do?

avatar
Super Collaborator

Thanks, Pardeep.

To make it 500x faster, do 500 files per call to the hadoop command. By changing the second line above, we can do this instead:

$ hdfs fsck / | grep 'Under replicated' | awk -F':' '{print $1}' >> /tmp/under_replicated_files


# Now using xargs -n 500 (or --max-args 500)
$ cat /tmp/under_replicated_files |xargs -n 500  hdfs dfs -setrep 1 /tmp/under_replicated_files<br>