<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Best way of handling corrupt or missing blocks? in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Best-way-of-handling-corrupt-or-missing-blocks/m-p/147915#M110444</link>
    <description>&lt;P&gt;To identify "corrupt" or "missing" blocks, the command-line command 'hdfs fsck /path/to/file' can be used. Other tools also exist.&lt;/P&gt;&lt;P&gt;HDFS will attempt to recover the situation automatically. By default there are three replicas of any block in the cluster. so if HDFS detects that one replica of a block has become corrupt or damaged, HDFS will create a new replica of that block from a known-good replica, and will mark the damaged one for deletion. &lt;/P&gt;&lt;P&gt;The known-good state is determined by checksums which are recorded alongside the block by each DataNode. &lt;/P&gt;&lt;P&gt;The chances of two replicas of the same block becoming damaged is very small indeed. HDFS can - and does - recover from this situation because it has a third replica, with its checksum, from which further replicas can be created.&lt;/P&gt;&lt;P&gt;The chances of three replicas of the same block becoming damaged is so remote that it would suggest a significant failure somewhere else in the cluster. If this situation does occur, and all three replicas are damaged, then 'hdfs fsck' will report that block as "corrupt" - i.e. HDFS cannot self-heal the block from any of its replicas. &lt;/P&gt;&lt;P&gt;Rebuilding the data behind a corrupt block is a lengthy process (like any data recovery process). If this situation should arise, deep investigation of the health of the cluster as a whole should also be undertaken. &lt;/P&gt;</description>
    <pubDate>Wed, 04 May 2016 00:07:05 GMT</pubDate>
    <dc:creator>Justin_Watkins</dc:creator>
    <dc:date>2016-05-04T00:07:05Z</dc:date>
    <item>
      <title>Best way of handling corrupt or missing blocks?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Best-way-of-handling-corrupt-or-missing-blocks/m-p/147906#M110435</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;What is best way of handling corrupt or missing blocks?&lt;/P&gt;</description>
      <pubDate>Thu, 18 Feb 2016 20:38:35 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Best-way-of-handling-corrupt-or-missing-blocks/m-p/147906#M110435</guid>
      <dc:creator>rushikeshdeshmu</dc:creator>
      <dc:date>2016-02-18T20:38:35Z</dc:date>
    </item>
    <item>
      <title>Re: Best way of handling corrupt or missing blocks?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Best-way-of-handling-corrupt-or-missing-blocks/m-p/147907#M110436</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/2769/rushikeshdeshmukh007.html" nodeid="2769"&gt;@Rushikesh Deshmukh&lt;/A&gt; find out what these blocks are using fsck command, if not critical just delete them&lt;/P&gt;</description>
      <pubDate>Thu, 18 Feb 2016 20:40:10 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Best-way-of-handling-corrupt-or-missing-blocks/m-p/147907#M110436</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-02-18T20:40:10Z</dc:date>
    </item>
    <item>
      <title>Re: Best way of handling corrupt or missing blocks?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Best-way-of-handling-corrupt-or-missing-blocks/m-p/147908#M110437</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/2769/rushikeshdeshmukh007.html" nodeid="2769"&gt;@Rushikesh Deshmukh&lt;/A&gt;&lt;/P&gt;&lt;P&gt;See this thread&lt;/P&gt;&lt;P&gt;&lt;A href="http://stackoverflow.com/questions/19205057/how-to-fix-corrupt-hadoop-hdfs" target="_blank"&gt;http://stackoverflow.com/questions/19205057/how-to-fix-corrupt-hadoop-hdfs&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Windeful explanation &lt;/P&gt;</description>
      <pubDate>Thu, 18 Feb 2016 20:43:00 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Best-way-of-handling-corrupt-or-missing-blocks/m-p/147908#M110437</guid>
      <dc:creator>nsabharwal</dc:creator>
      <dc:date>2016-02-18T20:43:00Z</dc:date>
    </item>
    <item>
      <title>Re: Best way of handling corrupt or missing blocks?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Best-way-of-handling-corrupt-or-missing-blocks/m-p/147909#M110438</link>
      <description>&lt;P&gt;You can use the command - hdfs fsck / -delete to list corrupt of missing blocks and then follow the article above to fix the same.&lt;/P&gt;</description>
      <pubDate>Thu, 18 Feb 2016 20:46:25 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Best-way-of-handling-corrupt-or-missing-blocks/m-p/147909#M110438</guid>
      <dc:creator>kgopal</dc:creator>
      <dc:date>2016-02-18T20:46:25Z</dc:date>
    </item>
    <item>
      <title>Re: Best way of handling corrupt or missing blocks?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Best-way-of-handling-corrupt-or-missing-blocks/m-p/147910#M110439</link>
      <description>&lt;P&gt;Is there any way for recovering corrupt blocks or we just have to delete them?&lt;/P&gt;</description>
      <pubDate>Thu, 18 Feb 2016 20:56:33 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Best-way-of-handling-corrupt-or-missing-blocks/m-p/147910#M110439</guid>
      <dc:creator>rushikeshdeshmu</dc:creator>
      <dc:date>2016-02-18T20:56:33Z</dc:date>
    </item>
    <item>
      <title>Re: Best way of handling corrupt or missing blocks?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Best-way-of-handling-corrupt-or-missing-blocks/m-p/147911#M110440</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/2769/rushikeshdeshmukh007.html" nodeid="2769"&gt;@Rushikesh Deshmukh&lt;/A&gt;  You have 2 options ...Another &lt;A target="_blank" href="http://centoshowtos.org/hadoop/fix-corrupt-blocks-on-hdfs/"&gt;link&lt;/A&gt; &lt;/P&gt;&lt;P&gt;"The next step would be to determine the importance of the file, can it just be removed and copied back into place, or is there sensitive data that needs to be regenerated?&lt;/P&gt;&lt;P&gt;If it's easy enough just to replace the file, that's the route I would take."&lt;/P&gt;</description>
      <pubDate>Thu, 18 Feb 2016 21:05:25 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Best-way-of-handling-corrupt-or-missing-blocks/m-p/147911#M110440</guid>
      <dc:creator>nsabharwal</dc:creator>
      <dc:date>2016-02-18T21:05:25Z</dc:date>
    </item>
    <item>
      <title>Re: Best way of handling corrupt or missing blocks?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Best-way-of-handling-corrupt-or-missing-blocks/m-p/147912#M110441</link>
      <description>&lt;P&gt;@&lt;A href="https://community.hortonworks.com/users/140/nsabharwal.html"&gt;Neeraj Sabharwal&lt;/A&gt;&lt;/P&gt;&lt;P&gt;thanks for quick reply.&lt;/P&gt;</description>
      <pubDate>Thu, 18 Feb 2016 21:09:29 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Best-way-of-handling-corrupt-or-missing-blocks/m-p/147912#M110441</guid>
      <dc:creator>rushikeshdeshmu</dc:creator>
      <dc:date>2016-02-18T21:09:29Z</dc:date>
    </item>
    <item>
      <title>Re: Best way of handling corrupt or missing blocks?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Best-way-of-handling-corrupt-or-missing-blocks/m-p/147913#M110442</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/2769/rushikeshdeshmukh007.html" nodeid="2769"&gt;@Rushikesh Deshmukh&lt;/A&gt;  Welcome!  Help me to close the thread by accepting the best answer.&lt;/P&gt;</description>
      <pubDate>Thu, 18 Feb 2016 21:25:18 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Best-way-of-handling-corrupt-or-missing-blocks/m-p/147913#M110442</guid>
      <dc:creator>nsabharwal</dc:creator>
      <dc:date>2016-02-18T21:25:18Z</dc:date>
    </item>
    <item>
      <title>Re: Best way of handling corrupt or missing blocks?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Best-way-of-handling-corrupt-or-missing-blocks/m-p/147914#M110443</link>
      <description>&lt;P&gt;@&lt;A href="https://community.hortonworks.com/users/393/aervits.html"&gt;Artem Ervits&lt;/A&gt;, thanks for your reply.&lt;/P&gt;</description>
      <pubDate>Sun, 21 Feb 2016 14:39:23 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Best-way-of-handling-corrupt-or-missing-blocks/m-p/147914#M110443</guid>
      <dc:creator>rushikeshdeshmu</dc:creator>
      <dc:date>2016-02-21T14:39:23Z</dc:date>
    </item>
    <item>
      <title>Re: Best way of handling corrupt or missing blocks?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Best-way-of-handling-corrupt-or-missing-blocks/m-p/147915#M110444</link>
      <description>&lt;P&gt;To identify "corrupt" or "missing" blocks, the command-line command 'hdfs fsck /path/to/file' can be used. Other tools also exist.&lt;/P&gt;&lt;P&gt;HDFS will attempt to recover the situation automatically. By default there are three replicas of any block in the cluster. so if HDFS detects that one replica of a block has become corrupt or damaged, HDFS will create a new replica of that block from a known-good replica, and will mark the damaged one for deletion. &lt;/P&gt;&lt;P&gt;The known-good state is determined by checksums which are recorded alongside the block by each DataNode. &lt;/P&gt;&lt;P&gt;The chances of two replicas of the same block becoming damaged is very small indeed. HDFS can - and does - recover from this situation because it has a third replica, with its checksum, from which further replicas can be created.&lt;/P&gt;&lt;P&gt;The chances of three replicas of the same block becoming damaged is so remote that it would suggest a significant failure somewhere else in the cluster. If this situation does occur, and all three replicas are damaged, then 'hdfs fsck' will report that block as "corrupt" - i.e. HDFS cannot self-heal the block from any of its replicas. &lt;/P&gt;&lt;P&gt;Rebuilding the data behind a corrupt block is a lengthy process (like any data recovery process). If this situation should arise, deep investigation of the health of the cluster as a whole should also be undertaken. &lt;/P&gt;</description>
      <pubDate>Wed, 04 May 2016 00:07:05 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Best-way-of-handling-corrupt-or-missing-blocks/m-p/147915#M110444</guid>
      <dc:creator>Justin_Watkins</dc:creator>
      <dc:date>2016-05-04T00:07:05Z</dc:date>
    </item>
    <item>
      <title>Re: Best way of handling corrupt or missing blocks?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Best-way-of-handling-corrupt-or-missing-blocks/m-p/147916#M110445</link>
      <description>&lt;P&gt;Note, if you are running your cluster in the cloud or use virtualization you may end up in a situation where multiple VMs run on the same physical host. In that case, a physical failure may have the grave consequences that you lose data, e.g. if all replica are stored on the same physical host. The likelihood of this depends on the cloud provider and may be high or remote. Be aware of this risk and prepare with copies on highly durable (object) storage like S3 for DR.&lt;/P&gt;</description>
      <pubDate>Fri, 06 May 2016 16:57:49 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Best-way-of-handling-corrupt-or-missing-blocks/m-p/147916#M110445</guid>
      <dc:creator>christian_proko</dc:creator>
      <dc:date>2016-05-06T16:57:49Z</dc:date>
    </item>
    <item>
      <title>Re: Best way of handling corrupt or missing blocks?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Best-way-of-handling-corrupt-or-missing-blocks/m-p/147917#M110446</link>
      <description>&lt;P&gt;Adding to above answers, hadoop fsck might not give latest corrupt report. &lt;/P&gt;&lt;P&gt;Hadoop periodically runs check to determine corrupt blocks or when a client tries to read a file.&lt;/P&gt;&lt;P&gt;For details , please refer : &lt;A href="https://issues.apache.org/jira/browse/HDFS-8126" target="_blank"&gt;https://issues.apache.org/jira/browse/HDFS-8126&lt;/A&gt; &lt;/P&gt;</description>
      <pubDate>Tue, 10 May 2016 20:23:40 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Best-way-of-handling-corrupt-or-missing-blocks/m-p/147917#M110446</guid>
      <dc:creator>pradeep_bhadani</dc:creator>
      <dc:date>2016-05-10T20:23:40Z</dc:date>
    </item>
    <item>
      <title>Re: Best way of handling corrupt or missing blocks?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Best-way-of-handling-corrupt-or-missing-blocks/m-p/147918#M110447</link>
      <description>&lt;P&gt;Good point &lt;A rel="user" href="https://community.cloudera.com/users/10300/pradeepbhadani.html" nodeid="10300"&gt;@Pradeep Bhadani&lt;/A&gt;, if you want to 'force' a check of specific blocks you can read the corresponding files, e.g. via Hive or MR, and run check command afterwards to see if an error was found. The reasoning is the expense incurred from checking a whole filesystem that may be PBs across hundreds of nodes.&lt;/P&gt;</description>
      <pubDate>Tue, 10 May 2016 20:32:45 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Best-way-of-handling-corrupt-or-missing-blocks/m-p/147918#M110447</guid>
      <dc:creator>christian_proko</dc:creator>
      <dc:date>2016-05-10T20:32:45Z</dc:date>
    </item>
    <item>
      <title>Re: Best way of handling corrupt or missing blocks?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Best-way-of-handling-corrupt-or-missing-blocks/m-p/147919#M110448</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/9793/christianprokopp.html" nodeid="9793"&gt;@Christian Prokopp &lt;/A&gt;True.&lt;/P&gt;</description>
      <pubDate>Tue, 10 May 2016 20:36:12 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Best-way-of-handling-corrupt-or-missing-blocks/m-p/147919#M110448</guid>
      <dc:creator>pradeep_bhadani</dc:creator>
      <dc:date>2016-05-10T20:36:12Z</dc:date>
    </item>
    <item>
      <title>Re: Best way of handling corrupt or missing blocks?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Best-way-of-handling-corrupt-or-missing-blocks/m-p/147920#M110449</link>
      <description>&lt;P&gt;Best way to find the list of missing blocks&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Command :-&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;[hdfs@sandbox ~]$ hdfs fsck -list-corruptfileblocks &lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Output :-&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Connecting to namenode via &lt;A href="http://sandbox.hortonworks.com:50070/fsck?ugi=hdfs&amp;amp;listcorruptfileblocks=1&amp;amp;path=%2F" target="_blank"&gt;http://sandbox.hortonworks.com:50070/fsck?ugi=hdfs&amp;amp;listcorruptfileblocks=1&amp;amp;path=%2F&lt;/A&gt;&lt;/P&gt;&lt;P&gt;The filesystem under path '/' has 0 CORRUPT files&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;Jay&lt;/P&gt;</description>
      <pubDate>Sun, 25 Sep 2016 12:33:23 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Best-way-of-handling-corrupt-or-missing-blocks/m-p/147920#M110449</guid>
      <dc:creator>jayanta_das</dc:creator>
      <dc:date>2016-09-25T12:33:23Z</dc:date>
    </item>
    <item>
      <title>Re: Best way of handling corrupt or missing blocks?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Best-way-of-handling-corrupt-or-missing-blocks/m-p/147921#M110450</link>
      <description>&lt;P&gt;command "hdfs fsck / -delete" worked for me.&lt;/P&gt;</description>
      <pubDate>Thu, 20 Apr 2017 15:45:11 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Best-way-of-handling-corrupt-or-missing-blocks/m-p/147921#M110450</guid>
      <dc:creator>kuldeephawks</dc:creator>
      <dc:date>2017-04-20T15:45:11Z</dc:date>
    </item>
    <item>
      <title>Re: Best way of handling corrupt or missing blocks?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Best-way-of-handling-corrupt-or-missing-blocks/m-p/147922#M110451</link>
      <description>&lt;P&gt;Pls make sure before deleting any corrupted blocks that they should be replicated successfully.&lt;/P&gt;</description>
      <pubDate>Mon, 23 Oct 2017 23:14:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Best-way-of-handling-corrupt-or-missing-blocks/m-p/147922#M110451</guid>
      <dc:creator>madhavakumar_ch</dc:creator>
      <dc:date>2017-10-23T23:14:31Z</dc:date>
    </item>
    <item>
      <title>Re: Best way of handling corrupt or missing blocks?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Best-way-of-handling-corrupt-or-missing-blocks/m-p/147923#M110452</link>
      <description>&lt;P&gt;hdfs fsck / -delete" worked for me. Thanks&lt;/P&gt;</description>
      <pubDate>Thu, 17 May 2018 02:45:56 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Best-way-of-handling-corrupt-or-missing-blocks/m-p/147923#M110452</guid>
      <dc:creator>shravan_sairi</dc:creator>
      <dc:date>2018-05-17T02:45:56Z</dc:date>
    </item>
    <item>
      <title>Re: Best way of handling corrupt or missing blocks?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Best-way-of-handling-corrupt-or-missing-blocks/m-p/147924#M110453</link>
      <description>&lt;P&gt;Thanks for this, this is great!&lt;/P&gt;</description>
      <pubDate>Tue, 05 Jun 2018 01:32:46 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Best-way-of-handling-corrupt-or-missing-blocks/m-p/147924#M110453</guid>
      <dc:creator>dpexecute</dc:creator>
      <dc:date>2018-06-05T01:32:46Z</dc:date>
    </item>
    <item>
      <title>Re: Best way of handling corrupt or missing blocks?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Best-way-of-handling-corrupt-or-missing-blocks/m-p/147925#M110454</link>
      <description>&lt;P&gt;Hi, I'd like to share a situation we encountered where 99% of our HDFS blocks were reported missing and we were able to recover them.&lt;/P&gt;&lt;P&gt;We had a system with 2 namenodes with high availability enabled.&lt;/P&gt;&lt;P&gt;For some reason, under the data folders of the datanodes, i.e /data0x/hadoop/hdfs/data/current - we had 2 Block Pools folders listed (example of such folder is BP-1722964902-1.10.237.104-1541520732855).&lt;/P&gt;&lt;P&gt;There was one folder containing the IP of namenode1 and another containing the IP of namenode 2.&lt;/P&gt;&lt;P&gt;All the data was under the BlockPool of namenode 1, but inside the VERSION files of the namenodes (/data0x/hadoop/hdfs/namenode/current/) the BlockPool id and the namespace ID were of namenode 2 - the namenode was looking for blocks in the wrong block pool folder.&lt;/P&gt;&lt;P&gt;I don't know how we got to the point of having 2 block pools folders, but we did.&lt;/P&gt;&lt;P&gt;In order to fix the problem - and get HDFS healthy again - we just needed to update the VERSION file on all the namenode disks (on both NN machines) and on all the journal node disks (on all JN machines), to point to Namenode 1. &lt;/P&gt;&lt;P&gt;We then restarted HDFS and made sure all the blocks are
reported and there's no more missing blocks.&lt;/P&gt;</description>
      <pubDate>Thu, 03 Jan 2019 21:25:57 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Best-way-of-handling-corrupt-or-missing-blocks/m-p/147925#M110454</guid>
      <dc:creator>LH</dc:creator>
      <dc:date>2019-01-03T21:25:57Z</dc:date>
    </item>
  </channel>
</rss>

