Support Questions

gulshad_ansari · ‎08-09-2016

What are the steps to remove corrupted blocks from HDFS

bandarusridhar1 · ‎08-09-2016

Perform below action as hdfs user:

The output of the fsck above will be very verbose, but it will mention which blocks are corrupt. We can do some grepping of the fsck above so that we aren't "reading through a firehose".

hdfs fsck / | egrep -v '^\.+ | grep -v replica | grep -v Replica

or

hdfs fsck hdfs://ip.or.host:50070/ | egrep -v '^\.+ | grep -v replica | grep -v Replica

This will list the affected files, and the output will not be a bunch of dots, and also files that might currently have under-replicated blocks (which isn't necessarily an issue). The output should include something like this with all your affected files.

/path/to/filename.fileextension: CORRUPT blockpool BP-1016133662-10.29.100.41-1415825958975 block blk_1073904305/path/to/filename.fileextension: MISSING 1 blocks of total size 15620361 B

The next step would be to determine the importance of the file, can it just be removed and copied back into place, or is there sensitive data that needs to be regenerated?

If it's easy enough just to replace the file, that's the route I would take.

Remove the corrupted file from your hadoop cluster

This command will move the corrupted file to the trash.

hdfs dfs -rm /path/to/filename.fileextension

hdfs dfs -rm hdfs://ip.or.hostname.of.namenode:50070/path/to/filename.fileextension

Or you can skip the trash to permanently delete (which is probably what you want to do)

hdfs dfs -rm -skipTrash /path/to/filename.fileextension

hdfs dfs -rm -skipTrash hdfs://ip.or.hostname.of.namenode:50070/path/to/filename.fileextension

Link

As a hdfs user

If you run below command it will delete all under replicated and corrupted blocks, instead of following above by doing individually.

hdfs fsck / -delete

View solution in original post

bandarusridhar1 · ‎08-09-2016

@Gulshad Ansari

Perform below action as hdfs user:

The output of the fsck above will be very verbose, but it will mention which blocks are corrupt. We can do some grepping of the fsck above so that we aren't "reading through a firehose".

hdfs fsck / | egrep -v '^\.+ | grep -v replica | grep -v Replica

or

hdfs fsck hdfs://ip.or.host:50070/ | egrep -v '^\.+ | grep -v replica | grep -v Replica

This will list the affected files, and the output will not be a bunch of dots, and also files that might currently have under-replicated blocks (which isn't necessarily an issue). The output should include something like this with all your affected files.

/path/to/filename.fileextension: CORRUPT blockpool BP-1016133662-10.29.100.41-1415825958975 block blk_1073904305/path/to/filename.fileextension: MISSING 1 blocks of total size 15620361 B

The next step would be to determine the importance of the file, can it just be removed and copied back into place, or is there sensitive data that needs to be regenerated?

If it's easy enough just to replace the file, that's the route I would take.

Remove the corrupted file from your hadoop cluster

This command will move the corrupted file to the trash.

hdfs dfs -rm /path/to/filename.fileextension

hdfs dfs -rm hdfs://ip.or.hostname.of.namenode:50070/path/to/filename.fileextension

Or you can skip the trash to permanently delete (which is probably what you want to do)

hdfs dfs -rm -skipTrash /path/to/filename.fileextension

hdfs dfs -rm -skipTrash hdfs://ip.or.hostname.of.namenode:50070/path/to/filename.fileextension

Link

As a hdfs user

If you run below command it will delete all under replicated and corrupted blocks, instead of following above by doing individually.

hdfs fsck / -delete

Cloudera Community

Support Questions

How to remove corrupted blocks from HDFS

Remove the corrupted file from your hadoop cluster

Remove the corrupted file from your hadoop cluster