Support Questions

rushikeshdeshmu · ‎02-18-2016

Hi,

What is best way of handling corrupt or missing blocks?

nsabharwal · ‎02-18-2016

@Rushikesh Deshmukh

See this thread

http://stackoverflow.com/questions/19205057/how-to-fix-corrupt-hadoop-hdfs

Windeful explanation

View solution in original post

christian_proko · ‎05-06-2016

Note, if you are running your cluster in the cloud or use virtualization you may end up in a situation where multiple VMs run on the same physical host. In that case, a physical failure may have the grave consequences that you lose data, e.g. if all replica are stored on the same physical host. The likelihood of this depends on the cloud provider and may be high or remote. Be aware of this risk and prepare with copies on highly durable (object) storage like S3 for DR.

pradeep_bhadani · ‎05-10-2016

Adding to above answers, hadoop fsck might not give latest corrupt report.

Hadoop periodically runs check to determine corrupt blocks or when a client tries to read a file.

For details , please refer : https://issues.apache.org/jira/browse/HDFS-8126

christian_proko · ‎05-10-2016

Good point @Pradeep Bhadani, if you want to 'force' a check of specific blocks you can read the corresponding files, e.g. via Hive or MR, and run check command afterwards to see if an error was found. The reasoning is the expense incurred from checking a whole filesystem that may be PBs across hundreds of nodes.

pradeep_bhadani · ‎05-10-2016

@Christian Prokopp True.

jayanta_das · ‎09-25-2016

Best way to find the list of missing blocks

Command :-

[hdfs@sandbox ~]$ hdfs fsck -list-corruptfileblocks

Output :-

Connecting to namenode via http://sandbox.hortonworks.com:50070/fsck?ugi=hdfs&listcorruptfileblocks=1&path=%2F

The filesystem under path '/' has 0 CORRUPT files

Thanks

Jay

dpexecute · ‎06-04-2018

Thanks for this, this is great!

kuldeephawks · ‎04-20-2017

command "hdfs fsck / -delete" worked for me.

madhavakumar_ch · ‎10-23-2017

Pls make sure before deleting any corrupted blocks that they should be replicated successfully.

shravan_sairi · ‎05-16-2018

hdfs fsck / -delete" worked for me. Thanks

LH · ‎01-03-2019

Hi, I'd like to share a situation we encountered where 99% of our HDFS blocks were reported missing and we were able to recover them.

We had a system with 2 namenodes with high availability enabled.

For some reason, under the data folders of the datanodes, i.e /data0x/hadoop/hdfs/data/current - we had 2 Block Pools folders listed (example of such folder is BP-1722964902-1.10.237.104-1541520732855).

There was one folder containing the IP of namenode1 and another containing the IP of namenode 2.

All the data was under the BlockPool of namenode 1, but inside the VERSION files of the namenodes (/data0x/hadoop/hdfs/namenode/current/) the BlockPool id and the namespace ID were of namenode 2 - the namenode was looking for blocks in the wrong block pool folder.

I don't know how we got to the point of having 2 block pools folders, but we did.

In order to fix the problem - and get HDFS healthy again - we just needed to update the VERSION file on all the namenode disks (on both NN machines) and on all the journal node disks (on all JN machines), to point to Namenode 1.

We then restarted HDFS and made sure all the blocks are reported and there's no more missing blocks.

Cloudera Community

Support Questions

Best way of handling corrupt or missing blocks?