Support Questions

Find answers, ask questions, and share your expertise

Blocks with corrupted replicas

avatar
Super Collaborator

Hello

I've noticed in Ambari under HDFS metrics that we have 2 blocks with corrupt replicas.
Running " hdfs fsck / " shows no corrupt blocks and system is healthy.
Running "hdfs dfsadmin-report" shows 2 corrupt replicas (same as Ambari dashboard)

I've restarted Ambari metrics & Ambari Agents on all nodes + Ambari-server as noted in one of the threads i came across but still - problem remains.

Ambari is 2.5.2

Any ideas how to fix this issue ?

Thanks

Adi

77524-snap-2018-06-03-at-183023.png


snap-2018-06-03-at-182322.png
3 REPLIES 3

avatar
Master Mentor

@Adi Jabkowsky

Below is the procedure to remove the corrupt blocks or files

Locate the files have blocks that are corrupt.

$ hdfs fsck / | egrep -v '^\.+ 

or

$ hdfs fsck hdfs://ip.or.host:50070/ | egrep -v '^\.+
This will be a list the affected files, and the output will not be a bunch of dots, the output should include something like this with all your affected files.
Sample output
/path/to/filename.file_extension: CORRUPT blockpool BP-1016133662-10.29.100.41-1415825958975 block blk_1073904305
/path/to/filename.file_extension: MISSING 1 blocks of total size 15620361 B 
The next step would be to determine the importance of the file, can it just be removed and copied back into place, or is there sensitive data that needs to be regenerated? you have a replication factor of 1 so analyze well.
Remove the corrupted file(s)
This command will move the corrupted file to the trash incase you realise the files is importantyou still have an option of recovering it .
$ hdfs dfs -rm /path/to/filename.file_extension 
When you use skip the trash to permanently delete if you are sure you really don't need that file.
$ hdfs dfs -rm -skipTrash /path/to/filename.file_extension 
How to repair a corrupted file if it was not easy to replace?
$ hdfs fsck /path/to/filename/file_extension -locations -blocks -files 
or
$ hdfs fsck hdfs://ip.or.hostname.of.namenode:50070/path/to/filename/file_extension -locations -blocks -files 
You can track down the datanode where the corruption is and look through logs and determine what the issue is.
Please revert.

avatar
Super Collaborator

Thank you @Geoffrey Shelton Okot

However the fsck shows no corrupt blocks. The problem is with corrupt replicas.
That said - after alert disappeared ...
Not sure if to be glad or suspicious 🙂

Adi

avatar
Expert Contributor

This response is NOT to fix "files with corrupt replicas" but to find and fix files that are completely corrupt, that is that there are not good replicas to recover the files.

The warning of files with corrupt replicas is when the file has at least one replica which is corrupt, but the file can still be recovered from the remaining replicas.
In this case

hdfs fsck /path ...

 will not show these files because it considere these healty.
These files and the corrupted replicas are only reported by the command

hdfs dfsadmin -report

 and as far as I known there is no direct command to fix this. Only way I have found I to wait for the Hadoop cluster to health itself by reallocating the bad replicas from the good ones.