Created 02-18-2016 12:38 PM
Hi,
What is best way of handling corrupt or missing blocks?
Created 02-18-2016 12:43 PM
See this thread
http://stackoverflow.com/questions/19205057/how-to-fix-corrupt-hadoop-hdfs
Windeful explanation
Created 02-18-2016 12:40 PM
@Rushikesh Deshmukh find out what these blocks are using fsck command, if not critical just delete them
Created 02-21-2016 06:39 AM
@Artem Ervits, thanks for your reply.
Created 02-18-2016 12:43 PM
See this thread
http://stackoverflow.com/questions/19205057/how-to-fix-corrupt-hadoop-hdfs
Windeful explanation
Created 02-18-2016 12:46 PM
You can use the command - hdfs fsck / -delete to list corrupt of missing blocks and then follow the article above to fix the same.
Created 02-18-2016 12:56 PM
Is there any way for recovering corrupt blocks or we just have to delete them?
Created 02-18-2016 01:05 PM
@Rushikesh Deshmukh You have 2 options ...Another link
"The next step would be to determine the importance of the file, can it just be removed and copied back into place, or is there sensitive data that needs to be regenerated?
If it's easy enough just to replace the file, that's the route I would take."
Created 02-18-2016 01:09 PM
thanks for quick reply.
Created 02-18-2016 01:25 PM
@Rushikesh Deshmukh Welcome! Help me to close the thread by accepting the best answer.
Created 05-03-2016 05:07 PM
To identify "corrupt" or "missing" blocks, the command-line command 'hdfs fsck /path/to/file' can be used. Other tools also exist.
HDFS will attempt to recover the situation automatically. By default there are three replicas of any block in the cluster. so if HDFS detects that one replica of a block has become corrupt or damaged, HDFS will create a new replica of that block from a known-good replica, and will mark the damaged one for deletion.
The known-good state is determined by checksums which are recorded alongside the block by each DataNode.
The chances of two replicas of the same block becoming damaged is very small indeed. HDFS can - and does - recover from this situation because it has a third replica, with its checksum, from which further replicas can be created.
The chances of three replicas of the same block becoming damaged is so remote that it would suggest a significant failure somewhere else in the cluster. If this situation does occur, and all three replicas are damaged, then 'hdfs fsck' will report that block as "corrupt" - i.e. HDFS cannot self-heal the block from any of its replicas.
Rebuilding the data behind a corrupt block is a lengthy process (like any data recovery process). If this situation should arise, deep investigation of the health of the cluster as a whole should also be undertaken.