Below is the procedure to remove the corrupt blocks or files
Locate the files have blocks that are corrupt.
$ hdfs fsck / | egrep -v '^\.+
$ hdfs fsck hdfs://ip.or.host:50070/ | egrep -v '^\.+
This will be a list the affected files, and the output will not be a bunch of dots, the output should include something like this with all your affected files.
/path/to/filename.file_extension: CORRUPT blockpool BP-1016133662-10.29.100.41-1415825958975 block blk_1073904305
/path/to/filename.file_extension: MISSING 1 blocks of total size 15620361 B
The next step would be to determine the importance of the file, can it just be removed and copied back into place, or is there sensitive data that needs to be regenerated? you have a replication factor of 1 so analyze well.
Remove the corrupted file(s)
This command will move the corrupted file to the trash incase you realise the files is importantyou still have an option of recovering it .
$ hdfs dfs -rm /path/to/filename.file_extension
When you use skip the trash to permanently delete if you are sure you really don't need that file.
$ hdfs dfs -rm -skipTrash /path/to/filename.file_extension
How to repair a corrupted file if it was not easy to replace?
$ hdfs fsck /path/to/filename/file_extension -locations -blocks -files
$ hdfs fsck hdfs://ip.or.hostname.of.namenode:50070/path/to/filename/file_extension -locations -blocks -files
You can track down the datanode where the corruption is and look through logs and determine what the issue is.