Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to remove corrupted blocks from HDFS

Solved Go to solution

How to remove corrupted blocks from HDFS

New Contributor

What are the steps to remove corrupted blocks from HDFS

1 ACCEPTED SOLUTION

Accepted Solutions

Re: How to remove corrupted blocks from HDFS

@Gulshad Ansari

Perform below action as hdfs user:

The output of the fsck above will be very verbose, but it will mention which blocks are corrupt. We can do some grepping of the fsck above so that we aren't "reading through a firehose".

hdfs fsck / | egrep -v '^\.+ | grep -v replica | grep -v Replica

or

hdfs fsck hdfs://ip.or.host:50070/ | egrep -v '^\.+ | grep -v replica | grep -v Replica

This will list the affected files, and the output will not be a bunch of dots, and also files that might currently have under-replicated blocks (which isn't necessarily an issue). The output should include something like this with all your affected files.

/path/to/filename.fileextension: CORRUPT blockpool BP-1016133662-10.29.100.41-1415825958975 block blk_1073904305/path/to/filename.fileextension: MISSING 1 blocks of total size 15620361 B

The next step would be to determine the importance of the file, can it just be removed and copied back into place, or is there sensitive data that needs to be regenerated?

If it's easy enough just to replace the file, that's the route I would take.

Remove the corrupted file from your hadoop cluster

This command will move the corrupted file to the trash.

hdfs dfs -rm /path/to/filename.fileextension
hdfs dfs -rm hdfs://ip.or.hostname.of.namenode:50070/path/to/filename.fileextension

Or you can skip the trash to permanently delete (which is probably what you want to do)

hdfs dfs -rm -skipTrash /path/to/filename.fileextension
hdfs dfs -rm -skipTrash hdfs://ip.or.hostname.of.namenode:50070/path/to/filename.fileextension

Link

As a hdfs user

If you run below command it will delete all under replicated and corrupted blocks, instead of following above by doing individually.

hdfs fsck / -delete
1 REPLY 1

Re: How to remove corrupted blocks from HDFS

@Gulshad Ansari

Perform below action as hdfs user:

The output of the fsck above will be very verbose, but it will mention which blocks are corrupt. We can do some grepping of the fsck above so that we aren't "reading through a firehose".

hdfs fsck / | egrep -v '^\.+ | grep -v replica | grep -v Replica

or

hdfs fsck hdfs://ip.or.host:50070/ | egrep -v '^\.+ | grep -v replica | grep -v Replica

This will list the affected files, and the output will not be a bunch of dots, and also files that might currently have under-replicated blocks (which isn't necessarily an issue). The output should include something like this with all your affected files.

/path/to/filename.fileextension: CORRUPT blockpool BP-1016133662-10.29.100.41-1415825958975 block blk_1073904305/path/to/filename.fileextension: MISSING 1 blocks of total size 15620361 B

The next step would be to determine the importance of the file, can it just be removed and copied back into place, or is there sensitive data that needs to be regenerated?

If it's easy enough just to replace the file, that's the route I would take.

Remove the corrupted file from your hadoop cluster

This command will move the corrupted file to the trash.

hdfs dfs -rm /path/to/filename.fileextension
hdfs dfs -rm hdfs://ip.or.hostname.of.namenode:50070/path/to/filename.fileextension

Or you can skip the trash to permanently delete (which is probably what you want to do)

hdfs dfs -rm -skipTrash /path/to/filename.fileextension
hdfs dfs -rm -skipTrash hdfs://ip.or.hostname.of.namenode:50070/path/to/filename.fileextension

Link

As a hdfs user

If you run below command it will delete all under replicated and corrupted blocks, instead of following above by doing individually.

hdfs fsck / -delete