In my HDFS status summary, I see the following messages about missing and under-replicated blocks:
2,114 missing blocks in the cluster. 5,114,551 total blocks in the cluster. Percentage missing blocks: 0.04%. Critical threshold: any.
On executing the command : hdfs fsck -list-corruptfileblocks
I got following output : The filesystem under path '/' has 2114 CORRUPT files
What is the best way to fix these corrupt files and also fix the underreplicated block problem?
You can try to recover some missing blocks by making sure that all your Data nodes and all disks on them are healthy and running. If they are, and you still have missing blocks the only way out is to delete files with missing blocks, either one by one or all of them at once using the "fsck <path> -delete" command.
Regarding under replicated blocks, HDFS is suppose to recover them automatically (by creating missing copies to fulfill the replication factor). If after a few days it doesn't, you can trigger the recovery by running the balancer, or as mentioned in another answer run the "setrep" command.
"The next step would be to determine the importance of the file, can it just be removed and copied back into place, or is there sensitive data that needs to be regenerated?
If it's easy enough just to replace the file, that's the route I would take."
@Pranshu Pranshu, You can use "setrep" command for setting replication factor for files and directories:
Usage: hadoop fs -setrep [-R] [-w] <numReplicas> <path>
Changes the replication factor of a file. If path is a directory then the command recursively changes the replication factor of all files under the directory tree rooted at path.
To set replication of an individual file to 3, you can use below command:
./bin/hadoop dfs -setrep -w 3 /path/to/file
You can also do this recursively. To change replication of entire HDFS to 3, you can use below command:
./bin/hadoop dfs -setrep -R -w 3 /
Returns 0 on success and -1 on error.
Hope this helps you to solve this problem?
I have a similar problem with a filesystem/namenode is safemode because of underreplicated blocks. My problem is that the "hdfs dfs -setrep -w 3 /path/to/file" fails because the filesystem is in safemode. If I am in safemode because of underreplicated blocks and the command to fix that doesn't work if you're in safemode, what can you do?
I've tried the command to leave safemode and it seems to work, but it goes back into safemode within a VERY short time.