<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Unable to start namenode after failover in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Unable-to-start-namenode-after-failover/m-p/33499#M54331</link>
    <description>&lt;P&gt;We are running a CDH 5.4.7 cluster and after an automatic failover both Namename node now refuse to start.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Output :&lt;/P&gt;&lt;PRE&gt;Failed to start namenode.
java.lang.IllegalStateException
        at com.google.common.base.Preconditions.checkState(Preconditions.java:129)
        at org.apache.hadoop.hdfs.server.namenode.LeaseManager.getNumUnderConstructionBlocks(LeaseManager.java:119)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getCompleteBlocksTotal(FSNamesystem.java:6339)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startCommonServices(FSNamesystem.java:1149)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.startCommonServices(NameNode.java:677)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:663)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.&amp;lt;init&amp;gt;(NameNode.java:810)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.&amp;lt;init&amp;gt;(NameNode.java:794)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1487)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1553)
2015-10-28 01:07:56,579 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;It looks similar to &lt;A href="https://issues.apache.org/jira/browse/HDFS-8384" target="_blank"&gt;https://issues.apache.org/jira/browse/HDFS-8384&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;But we can see that it is supposed to be fixed in 5.3.8 : &lt;A href="http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/cdh_rn_fixed_in_538.htmlWe" target="_blank"&gt;http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/cdh_rn_fixed_in_538.htmlWe&lt;/A&gt; are not able to run hadoop namenode -recover with the same stack trace.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;15/10/28 01:33:39 INFO namenode.FSImage: Save namespace&lt;BR /&gt;15/10/28 01:33:43 ERROR namenode.FSImage: Unable to save image for /data/1/dfs/nn
java.lang.IllegalStateException
        at com.google.common.base.Preconditions.checkState(Preconditions.java:129)
        at org.apache.hadoop.hdfs.server.namenode.LeaseManager.getINodesUnderConstruction(LeaseManager.java:447)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFilesUnderConstruction(FSNamesystem.java:7264)
        at org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Saver.serializeFilesUCSection(FSImageFormatPBINode.java:508)
        at org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Saver.saveInodes(FSImageFormatProtobuf.java:431)
        at org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Saver.saveInternal(FSImageFormatProtobuf.java:474)
        at org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Saver.save(FSImageFormatProtobuf.java:410)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:958)
        at org.apache.hadoop.hdfs.server.namenode.FSImage$FSImageSaver.run(FSImage.java:1009)
        at java.lang.Thread.run(Thread.java:745)&lt;/PRE&gt;&lt;P&gt;Is there any workaround ?&lt;/P&gt;</description>
    <pubDate>Wed, 28 Oct 2015 06:28:07 GMT</pubDate>
    <dc:creator>codingtony</dc:creator>
    <dc:date>2015-10-28T06:28:07Z</dc:date>
    <item>
      <title>Unable to start namenode after failover</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Unable-to-start-namenode-after-failover/m-p/33499#M54331</link>
      <description>&lt;P&gt;We are running a CDH 5.4.7 cluster and after an automatic failover both Namename node now refuse to start.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Output :&lt;/P&gt;&lt;PRE&gt;Failed to start namenode.
java.lang.IllegalStateException
        at com.google.common.base.Preconditions.checkState(Preconditions.java:129)
        at org.apache.hadoop.hdfs.server.namenode.LeaseManager.getNumUnderConstructionBlocks(LeaseManager.java:119)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getCompleteBlocksTotal(FSNamesystem.java:6339)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startCommonServices(FSNamesystem.java:1149)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.startCommonServices(NameNode.java:677)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:663)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.&amp;lt;init&amp;gt;(NameNode.java:810)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.&amp;lt;init&amp;gt;(NameNode.java:794)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1487)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1553)
2015-10-28 01:07:56,579 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;It looks similar to &lt;A href="https://issues.apache.org/jira/browse/HDFS-8384" target="_blank"&gt;https://issues.apache.org/jira/browse/HDFS-8384&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;But we can see that it is supposed to be fixed in 5.3.8 : &lt;A href="http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/cdh_rn_fixed_in_538.htmlWe" target="_blank"&gt;http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/cdh_rn_fixed_in_538.htmlWe&lt;/A&gt; are not able to run hadoop namenode -recover with the same stack trace.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;15/10/28 01:33:39 INFO namenode.FSImage: Save namespace&lt;BR /&gt;15/10/28 01:33:43 ERROR namenode.FSImage: Unable to save image for /data/1/dfs/nn
java.lang.IllegalStateException
        at com.google.common.base.Preconditions.checkState(Preconditions.java:129)
        at org.apache.hadoop.hdfs.server.namenode.LeaseManager.getINodesUnderConstruction(LeaseManager.java:447)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFilesUnderConstruction(FSNamesystem.java:7264)
        at org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Saver.serializeFilesUCSection(FSImageFormatPBINode.java:508)
        at org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Saver.saveInodes(FSImageFormatProtobuf.java:431)
        at org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Saver.saveInternal(FSImageFormatProtobuf.java:474)
        at org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Saver.save(FSImageFormatProtobuf.java:410)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:958)
        at org.apache.hadoop.hdfs.server.namenode.FSImage$FSImageSaver.run(FSImage.java:1009)
        at java.lang.Thread.run(Thread.java:745)&lt;/PRE&gt;&lt;P&gt;Is there any workaround ?&lt;/P&gt;</description>
      <pubDate>Wed, 28 Oct 2015 06:28:07 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Unable-to-start-namenode-after-failover/m-p/33499#M54331</guid>
      <dc:creator>codingtony</dc:creator>
      <dc:date>2015-10-28T06:28:07Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to start namenode after failover</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Unable-to-start-namenode-after-failover/m-p/33500#M54332</link>
      <description>&lt;P&gt;I've looked at the code provided in hadoop-hdfs-2.6.0-cdh5.4.7.jar&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;  synchronized long getNumUnderConstructionBlocks() {
    assert this.fsnamesystem.hasReadLock() : "The FSNamesystem read lock wasn't"
      + "acquired before counting under construction blocks";
    long numUCBlocks = 0;
    for (Lease lease : sortedLeases) {
      for (String path : lease.getPaths()) {
        final INodeFile cons;
        try {
          cons = this.fsnamesystem.getFSDirectory().getINode(path).asFile();
            Preconditions.checkState(cons.isUnderConstruction());
        } catch (UnresolvedLinkException e) {
          throw new AssertionError("Lease files should reside on this FS");
        }
        BlockInfo[] blocks = cons.getBlocks();
        if(blocks == null)
          continue;
        for(BlockInfo b : blocks) {
          if(!b.isComplete())
            numUCBlocks++;
        }
      }
    }
    LOG.info("Number of blocks under construction: " + numUCBlocks);
    return numUCBlocks;
  }&lt;/PRE&gt;&lt;P&gt;And it looks like the patch from&amp;nbsp;&lt;A href="https://issues.apache.org/jira/browse/HDFS-8384" target="_blank" rel="nofollow"&gt;HDFS-8384&lt;/A&gt; was not applied to CDH 5.4.7 ??,&amp;nbsp; the commit of the patch is here :&lt;/P&gt;&lt;P&gt;&lt;A href="https://github.com/apache/hadoop/commit/8928729c80af0a154524e06fb13ed9b191986a78" target="_blank"&gt;https://github.com/apache/hadoop/commit/8928729c80af0a154524e06fb13ed9b191986a78&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 28 Oct 2015 05:52:36 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Unable-to-start-namenode-after-failover/m-p/33500#M54332</guid>
      <dc:creator>codingtony</dc:creator>
      <dc:date>2015-10-28T05:52:36Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to start namenode after failover</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Unable-to-start-namenode-after-failover/m-p/33572#M54333</link>
      <description>&lt;P&gt;We had to patch manually the jar to run the namenode again.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Then we were able to remove the problematic file.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Here is the chain of event :&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;- The secondary namenode tried to do a checkpoint but failed due to nodes under construction&lt;/P&gt;&lt;PRE&gt;ERROR org.apache.hadoop.hdfs.server.namenode.FSImage: Unable to save image for /data/1/dfs/nn
java.lang.IllegalStateException
        at com.google.common.base.Preconditions.checkState(Preconditions.java:129)
        at org.apache.hadoop.hdfs.server.namenode.LeaseManager.getINodesUnderConstruction(LeaseManager.java:447)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFilesUnderConstruction(FSNamesystem.java:7235)
        at org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Saver.serializeFilesUCSection(FSImageFormatPBINode.java:508)
        at org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Saver.saveInodes(FSImageFormatProtobuf.java:431)
        at org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Saver.saveInternal(FSImageFormatProtobuf.java:474)
        at org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Saver.save(FSImageFormatProtobuf.java:410)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:958)
        at org.apache.hadoop.hdfs.server.namenode.FSImage$FSImageSaver.run(FSImage.java:1009)
        at java.lang.Thread.run(Thread.java:745)&lt;/PRE&gt;&lt;P&gt;- Cloudera manager did warned us, with an email that we tought to be a system problem (disk related).&lt;/P&gt;&lt;P&gt;-&amp;nbsp; A bit after that we did a failover. then both namenode refused to start&lt;/P&gt;&lt;P&gt;- After looking around we found that it could be somehow related to HDFS-8384&lt;/P&gt;&lt;P&gt;- Since we tought that the patch HDFS-8384 was supposed to be applied to CDH 5.4.7 according to the relase notes, we looked elsewhere for the cause of the problem.&lt;/P&gt;&lt;P&gt;- We decided to take a look at the source code of hadoop-hdfs-2.6.0-cdh5.4.7.jar and realized that the patch was &lt;STRONG&gt;not applied&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;- We manually compiled the patch (just the method that was causing problem), repackaged the jar and we where able to restart the namenode, discover the faulty file and get back on our feet.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Shall I open a JIRA to mention that HDFS-8384 is not applied to CDH 5.4.7 ?&lt;/P&gt;&lt;P&gt;What can cause an INode to be under construction ?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 29 Oct 2015 15:12:51 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Unable-to-start-namenode-after-failover/m-p/33572#M54333</guid>
      <dc:creator>codingtony</dc:creator>
      <dc:date>2015-10-29T15:12:51Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to start namenode after failover</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Unable-to-start-namenode-after-failover/m-p/33577#M54334</link>
      <description>&lt;P&gt;HDFS-8384 is fixed in CDH 5.3.8 per the &lt;A href="http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/cdh_rn_fixed_in_538.html" target="_self"&gt;release notes&lt;/A&gt; but is not in CDH 5.4.7. &amp;nbsp; It should&amp;nbsp;be available in CDH 5.4.8 when it releases.&lt;/P&gt;</description>
      <pubDate>Thu, 29 Oct 2015 16:52:17 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Unable-to-start-namenode-after-failover/m-p/33577#M54334</guid>
      <dc:creator>denloe</dc:creator>
      <dc:date>2015-10-29T16:52:17Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to start namenode after failover</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Unable-to-start-namenode-after-failover/m-p/33579#M54335</link>
      <description>&lt;P&gt;The patches are not applied systematically between releases ?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 29 Oct 2015 17:09:14 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Unable-to-start-namenode-after-failover/m-p/33579#M54335</guid>
      <dc:creator>codingtony</dc:creator>
      <dc:date>2015-10-29T17:09:14Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to start namenode after failover</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Unable-to-start-namenode-after-failover/m-p/33580#M54336</link>
      <description>&lt;P&gt;They are. 5.3.8 (Oct 20th) happened after 5.4.7 (Sep 18th). The next release of 5.4 after the 5.3.8 release will have the fix.&lt;/P&gt;</description>
      <pubDate>Thu, 29 Oct 2015 17:12:47 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Unable-to-start-namenode-after-failover/m-p/33580#M54336</guid>
      <dc:creator>busbey</dc:creator>
      <dc:date>2015-10-29T17:12:47Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to start namenode after failover</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Unable-to-start-namenode-after-failover/m-p/33581#M54337</link>
      <description>&lt;P&gt;Thanks that explains why the patch was not applied!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any explanation (or a link where I can find the info) on what can cause a file to be under construction?&lt;/P&gt;</description>
      <pubDate>Thu, 29 Oct 2015 17:17:36 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Unable-to-start-namenode-after-failover/m-p/33581#M54337</guid>
      <dc:creator>codingtony</dc:creator>
      <dc:date>2015-10-29T17:17:36Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to start namenode after failover</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Unable-to-start-namenode-after-failover/m-p/33582#M54338</link>
      <description>&lt;P&gt;I don't know of any such documentation.&lt;/P&gt;</description>
      <pubDate>Thu, 29 Oct 2015 17:53:52 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Unable-to-start-namenode-after-failover/m-p/33582#M54338</guid>
      <dc:creator>busbey</dc:creator>
      <dc:date>2015-10-29T17:53:52Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to start namenode after failover</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Unable-to-start-namenode-after-failover/m-p/33633#M54339</link>
      <description>&lt;P&gt;5.4.8 has been released.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;A href="http://community.cloudera.com/t5/Release-Announcements/Announcing-Cloudera-Enterprise-5-4-8/m-p/33614#U33614" target="_blank"&gt;http://community.cloudera.com/t5/Release-Announcements/Announcing-Cloudera-Enterprise-5-4-8/m-p/33614#U33614&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 30 Oct 2015 16:58:05 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Unable-to-start-namenode-after-failover/m-p/33633#M54339</guid>
      <dc:creator>denloe</dc:creator>
      <dc:date>2015-10-30T16:58:05Z</dc:date>
    </item>
  </channel>
</rss>

