Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to fix missing and under replicated blocks?

avatar
Contributor

In my HDFS status summary, I see the following messages about missing and under-replicated blocks:

2,114 missing blocks in the cluster. 5,114,551 total blocks in the cluster. Percentage missing blocks: 0.04%. Critical threshold: any.

On executing the command : hdfs fsck -list-corruptfileblocks

I got following output : The filesystem under path '/' has 2114 CORRUPT files

What is the best way to fix these corrupt files and also fix the underreplicated block problem?

1 ACCEPTED SOLUTION

avatar
Rising Star

Hi Pranshu,

You can follow the instructions in the link below:

https://community.hortonworks.com/articles/4427/fix-under-replicated-blocks-in-hdfs-manually.html

Regards,

Karthik Gopal

View solution in original post

9 REPLIES 9

avatar
Rising Star

Hi Pranshu,

You can follow the instructions in the link below:

https://community.hortonworks.com/articles/4427/fix-under-replicated-blocks-in-hdfs-manually.html

Regards,

Karthik Gopal

avatar
Rising Star

avatar
Master Guru

You can try to recover some missing blocks by making sure that all your Data nodes and all disks on them are healthy and running. If they are, and you still have missing blocks the only way out is to delete files with missing blocks, either one by one or all of them at once using the "fsck <path> -delete" command.

Regarding under replicated blocks, HDFS is suppose to recover them automatically (by creating missing copies to fulfill the replication factor). If after a few days it doesn't, you can trigger the recovery by running the balancer, or as mentioned in another answer run the "setrep" command.

avatar
Super Collaborator
Some more steps
  • get the full details of the files which are causing your problem using hdfs fsck / -files -blocks -locations
  • if it is not replicating on your own run a balancer
  • if you are SURE these files are not needed and would like to just eliminate the error, you can run this command to automatically delete the corrupted files hdfs fsck / -delete

avatar

@Pranshu Pranshu, You have 2 options ...Another link

"The next step would be to determine the importance of the file, can it just be removed and copied back into place, or is there sensitive data that needs to be regenerated?

If it's easy enough just to replace the file, that's the route I would take."

avatar

@Pranshu Pranshu, If the original question is answered then please accept the best answer.

avatar
Contributor

It seems like, the replication factor is 1 my case. How to get it recovered from DR cluster. ?

avatar

@Pranshu Pranshu, You can use "setrep" command for setting replication factor for files and directories:

setrep

Usage: hadoop fs -setrep [-R] [-w] <numReplicas> <path>

Changes the replication factor of a file. If path is a directory then the command recursively changes the replication factor of all files under the directory tree rooted at path.

Options:

  • The -w flag requests that the command wait for the replication to complete. This can potentially take a very long time.
  • The -R flag is accepted for backwards compatibility. It has no effect.

Example:

To set replication of an individual file to 3, you can use below command:

./bin/hadoop dfs -setrep -w 3 /path/to/file

You can also do this recursively. To change replication of entire HDFS to 3, you can use below command:

./bin/hadoop dfs -setrep -R -w 3 /

Exit Code:

Returns 0 on success and -1 on error.

-

Hope this helps you to solve this problem?

avatar
Explorer

I have a similar problem with a filesystem/namenode is safemode because of underreplicated blocks. My problem is that the "hdfs dfs -setrep -w 3 /path/to/file" fails because the filesystem is in safemode. If I am in safemode because of underreplicated blocks and the command to fix that doesn't work if you're in safemode, what can you do?

I've tried the command to leave safemode and it seems to work, but it goes back into safemode within a VERY short time.