An improvement to this is to send multiple files to the setrep command at once. -bash-4.1$ xargs -n 1000 hadoop fs -setrep 3 < /tmp/under_replicated_files
This will send 1000 paths to setrep at a time, which I found to be loads faster. You may also want to redirect the output since the assumption is that very many files need their replication set.
... View more