Support Questions

Find answers, ask questions, and share your expertise

Best way of handling corrupt or missing blocks?

avatar

Hi,

What is best way of handling corrupt or missing blocks?

1 ACCEPTED SOLUTION

avatar
Master Mentor
19 REPLIES 19

avatar
Master Mentor

@Rushikesh Deshmukh find out what these blocks are using fsck command, if not critical just delete them

avatar

@Artem Ervits, thanks for your reply.

avatar
Master Mentor

avatar
Rising Star

You can use the command - hdfs fsck / -delete to list corrupt of missing blocks and then follow the article above to fix the same.

avatar

Is there any way for recovering corrupt blocks or we just have to delete them?

avatar
Master Mentor

@Rushikesh Deshmukh You have 2 options ...Another link

"The next step would be to determine the importance of the file, can it just be removed and copied back into place, or is there sensitive data that needs to be regenerated?

If it's easy enough just to replace the file, that's the route I would take."

avatar

@Neeraj Sabharwal

thanks for quick reply.

avatar
Master Mentor

@Rushikesh Deshmukh Welcome! Help me to close the thread by accepting the best answer.

avatar
Contributor

To identify "corrupt" or "missing" blocks, the command-line command 'hdfs fsck /path/to/file' can be used. Other tools also exist.

HDFS will attempt to recover the situation automatically. By default there are three replicas of any block in the cluster. so if HDFS detects that one replica of a block has become corrupt or damaged, HDFS will create a new replica of that block from a known-good replica, and will mark the damaged one for deletion.

The known-good state is determined by checksums which are recorded alongside the block by each DataNode.

The chances of two replicas of the same block becoming damaged is very small indeed. HDFS can - and does - recover from this situation because it has a third replica, with its checksum, from which further replicas can be created.

The chances of three replicas of the same block becoming damaged is so remote that it would suggest a significant failure somewhere else in the cluster. If this situation does occur, and all three replicas are damaged, then 'hdfs fsck' will report that block as "corrupt" - i.e. HDFS cannot self-heal the block from any of its replicas.

Rebuilding the data behind a corrupt block is a lengthy process (like any data recovery process). If this situation should arise, deep investigation of the health of the cluster as a whole should also be undertaken.