<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Cannot start an HA namenode with name dirs that need recovery. Dir: Storage Directory /data0/dfs in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Cannot-start-an-HA-namenode-with-name-dirs-that-need/m-p/61524#M55705</link>
    <description>&lt;P&gt;Based on this &lt;A href="https://community.cloudera.com/t5/CDH-Manual-Installation/Operation-category-READ-is-not-supported-in-state-standby/td-p/35008" target="_self"&gt;thread&lt;/A&gt;, it seems like the following command may be an option.&amp;nbsp; I will wait for further guidance, though.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;./hdfs haadmin -transitionToActive &amp;lt;nodename&amp;gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 03 Nov 2017 18:38:56 GMT</pubDate>
    <dc:creator>epowell</dc:creator>
    <dc:date>2017-11-03T18:38:56Z</dc:date>
    <item>
      <title>Cannot start an HA namenode with name dirs that need recovery. Dir: Storage Directory /data0/dfs/nn</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Cannot-start-an-HA-namenode-with-name-dirs-that-need/m-p/61468#M55695</link>
      <description>&lt;P&gt;After attempting a large "insert as select" operation, I returned this morning to find that the query had failed and I could not issue any commands to my cluster this morning (e.g. hdfs dfs -df -h).&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;When logging into CM, I noticed that most nodes had an health issue related to "clock offset".&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;At this point, I am only concerned about trying to recover the data on HDFS.&amp;nbsp; I am happy to build a new cluster (given that I am on CDH4, anyway) and migrate the data to that new cluster.&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I tried to restart the cluster but the start-up step failed.&amp;nbsp; Specifically, it failed to start the HDFS service and reported this error in Log Details:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;Exception in namenode join&lt;BR /&gt;java.io.IOException: Cannot start an HA namenode with name dirs that need recovery. Dir: Storage Directory /data0/dfs/nn state: NOT_FORMATTED&lt;BR /&gt;	at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:295)&lt;BR /&gt;	at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:207)&lt;BR /&gt;	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:741)&lt;BR /&gt;	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:531)&lt;BR /&gt;	at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:403)&lt;BR /&gt;	at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:445)&lt;BR /&gt;	at org.apache.hadoop.hdfs.server.namenode.NameNode.&amp;lt;init&amp;gt;(NameNode.java:621)&lt;BR /&gt;	at org.apache.hadoop.hdfs.server.namenode.NameNode.&amp;lt;init&amp;gt;(NameNode.java:606)&lt;BR /&gt;	at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1177)&lt;BR /&gt;	at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1241)&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Below are some more details that I have gathered about the situation.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;I am running CDH4&lt;/LI&gt;&lt;LI&gt;There are two namenodes in the cluster.&amp;nbsp; One reporting the errors above and another one which reports&lt;PRE&gt;Unable to trigger a roll of the active NN&lt;BR /&gt;java.net.ConnectException: Call From ip-10-0-0-154.ec2.internal/10.0.0.154 to ip-10-0-0-157.ec2.internal:8022 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused&lt;/PRE&gt;&lt;/LI&gt;&lt;LI&gt;If I log into the first name node, the one with the initial error, and try to look at the namenode directory, it is completely empty&lt;/LI&gt;&lt;/UL&gt;&lt;PRE&gt;ubuntu@ip-10-0-0-157:~$ sudo ls -a /data0/dfs/nn/
.  ..
ubuntu@ip-10-0-0-157:~$ sudo ls -a /data1/dfs/nn/
.  ..&lt;/PRE&gt;&lt;UL&gt;&lt;LI&gt;If I log into the other name node, it has data in those directories&lt;/LI&gt;&lt;/UL&gt;&lt;PRE&gt;ubuntu@ip-10-0-0-154:~$ sudo ls -lah  /data0/dfs/nn/
total 12K
drwx------ 3 hdfs hadoop 4.0K Nov  2 22:20 .
drwxr-xr-x 3 root root   4.0K Jun  6  2015 ..
drwxr-xr-x 2 hdfs hdfs   4.0K Nov  2 09:49 current
ubuntu@ip-10-0-0-154:~$ sudo ls -lah  /data1/dfs/nn/
total 12K
drwx------ 3 hdfs hadoop 4.0K Nov  2 22:20 .
drwxr-xr-x 3 root root   4.0K Jun  6  2015 ..
drwxr-xr-x 2 hdfs hdfs   4.0K Nov  2 09:49 current
ubuntu@ip-10-0-0-154:~$ sudo ls -lah  /data0/dfs/nn/current
total 13M
drwxr-xr-x 2 hdfs hdfs   4.0K Nov  2 09:49 .
drwx------ 3 hdfs hadoop 4.0K Nov  2 22:20 ..
-rw-r--r-- 1 hdfs hdfs    697 Jun  6  2015 edits_0000000000000000001-0000000000000000013
-rw-r--r-- 1 hdfs hdfs   1.0M Jun  6  2015 edits_0000000000000000014-0000000000000000913
-rw-r--r-- 1 hdfs hdfs    549 Jun  6  2015 edits_0000000000000000914-0000000000000000923
-rw-r--r-- 1 hdfs hdfs   1.3K Jun  6  2015 edits_0000000000000000924-0000000000000000937
-rw-r--r-- 1 hdfs hdfs   1.3K Jun  6  2015 edits_0000000000000000938-0000000000000000951
-rw-r--r-- 1 hdfs hdfs   1.3K Jun  6  2015 edits_0000000000000000952-0000000000000000965
-rw-r--r-- 1 hdfs hdfs   1.8K Jun  6  2015 edits_0000000000000000966-0000000000000000987
-rw-r--r-- 1 hdfs hdfs   1.3K Jun  6  2015 edits_0000000000000000988-0000000000000001001
-rw-r--r-- 1 hdfs hdfs   1.3K Jun  6  2015 edits_0000000000000001002-0000000000000001015
-rw-r--r-- 1 hdfs hdfs   1.3K Jun  6  2015 edits_0000000000000001016-0000000000000001029
-rw-r--r-- 1 hdfs hdfs   1.3K Jun  6  2015 edits_0000000000000001030-0000000000000001043
-rw-r--r-- 1 hdfs hdfs   1.3K Jun  6  2015 edits_0000000000000001044-0000000000000001057
-rw-r--r-- 1 hdfs hdfs   1.3K Jun  6  2015 edits_0000000000000001058-0000000000000001071
-rw-r--r-- 1 hdfs hdfs   1.3K Jun  6  2015 edits_0000000000000001072-0000000000000001085
-rw-r--r-- 1 hdfs hdfs   1.3K Jun  6  2015 edits_0000000000000001086-0000000000000001099
-rw-r--r-- 1 hdfs hdfs   1.3K Jun  6  2015 edits_0000000000000001100-0000000000000001113
-rw-r--r-- 1 hdfs hdfs   1.3K Jun  6  2015 edits_0000000000000001114-0000000000000001127
-rw-r--r-- 1 hdfs hdfs   1.3K Jun  6  2015 edits_0000000000000001128-0000000000000001141
-rw-r--r-- 1 hdfs hdfs   1.3K Jun  6  2015 edits_0000000000000001142-0000000000000001155
-rw-r--r-- 1 hdfs hdfs   1.3K Jun  6  2015 edits_0000000000000001156-0000000000000001169
-rw-r--r-- 1 hdfs hdfs   1.0M Jun  6  2015 edits_inprogress_0000000000000001170
-rw-r--r-- 1 hdfs hdfs   5.1M Nov  2 08:49 fsimage_0000000000024545561
-rw-r--r-- 1 hdfs hdfs     62 Nov  2 08:49 fsimage_0000000000024545561.md5
-rw-r--r-- 1 hdfs hdfs   5.1M Nov  2 09:49 fsimage_0000000000024545645
-rw-r--r-- 1 hdfs hdfs     62 Nov  2 09:49 fsimage_0000000000024545645.md5
-rw-r--r-- 1 hdfs hdfs      5 Jun  6  2015 seen_txid
-rw-r--r-- 1 hdfs hdfs    170 Nov  2 09:49 VERSION&lt;/PRE&gt;</description>
      <pubDate>Thu, 02 Nov 2017 22:17:02 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Cannot-start-an-HA-namenode-with-name-dirs-that-need/m-p/61468#M55695</guid>
      <dc:creator>epowell</dc:creator>
      <dc:date>2017-11-02T22:17:02Z</dc:date>
    </item>
    <item>
      <title>Re: Cannot start an HA namenode with name dirs that need recovery. Dir: Storage Directory /data0/dfs</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Cannot-start-an-HA-namenode-with-name-dirs-that-need/m-p/61470#M55696</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/21662"&gt;@epowell&lt;/a&gt;, I moved this from Cloudera Manager to HDFS Community since the error is coming out of HDFS itself.&amp;nbsp; This should help you get better and quicker assistance.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;-Ben&lt;/P&gt;</description>
      <pubDate>Thu, 02 Nov 2017 16:57:22 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Cannot-start-an-HA-namenode-with-name-dirs-that-need/m-p/61470#M55696</guid>
      <dc:creator>bgooley</dc:creator>
      <dc:date>2017-11-02T16:57:22Z</dc:date>
    </item>
    <item>
      <title>Re: Cannot start an HA namenode with name dirs that need recovery. Dir: Storage Directory /data0/dfs</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Cannot-start-an-HA-namenode-with-name-dirs-that-need/m-p/61494#M55697</link>
      <description>&lt;P&gt;First : save the namenode dir content.&lt;/P&gt;&lt;P&gt;Second : can you launch the second namenode only ? Does it start ?&lt;/P&gt;&lt;P&gt;If yes, you should be able to start the data-nodes and get access to the data.&lt;/P&gt;</description>
      <pubDate>Fri, 03 Nov 2017 13:32:39 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Cannot-start-an-HA-namenode-with-name-dirs-that-need/m-p/61494#M55697</guid>
      <dc:creator>mathieu.d</dc:creator>
      <dc:date>2017-11-03T13:32:39Z</dc:date>
    </item>
    <item>
      <title>Re: Cannot start an HA namenode with name dirs that need recovery. Dir: Storage Directory /data0/dfs</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Cannot-start-an-HA-namenode-with-name-dirs-that-need/m-p/61501#M55698</link>
      <description>&lt;P&gt;Thank you for your response.&amp;nbsp; I think the other namenode may be started but things are such a mess that I can't be sure.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I've attached a screenshot from CM.&amp;nbsp; This is after starting Zookeeper and HDFS.&amp;nbsp; (I didn't attempt to start the entire cluster this time but I'm pretty sure the result is the same.)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Screen Shot 2017-11-03 at 8.06.03 AM.png" style="width: 600px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/3551i5ABD9434F5EB56E6/image-size/large?v=v2&amp;amp;px=999" role="button" title="Screen Shot 2017-11-03 at 8.06.03 AM.png" alt="Screen Shot 2017-11-03 at 8.06.03 AM.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The first line seems to show the other namenode as 'Started' (as well as the data nodes).&amp;nbsp; However, if I go to that node and attempt to run any `hdfs` commands, here is what I get:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;ubuntu@ip-10-0-0-154:~$ hdfs dfs -ls /
17/11/03 14:20:15 WARN retry.RetryInvocationHandler: Exception while invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB after 1 fail over attempts. Trying to fail over after sleeping for 1126ms.
17/11/03 14:20:16 WARN retry.RetryInvocationHandler: Exception while invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB after 2 fail over attempts. Trying to fail over after sleeping for 1373ms.
17/11/03 14:20:17 WARN retry.RetryInvocationHandler: Exception while invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB after 3 fail over attempts. Trying to fail over after sleeping for 4470ms.&lt;/PRE&gt;</description>
      <pubDate>Fri, 03 Nov 2017 14:10:15 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Cannot-start-an-HA-namenode-with-name-dirs-that-need/m-p/61501#M55698</guid>
      <dc:creator>epowell</dc:creator>
      <dc:date>2017-11-03T14:10:15Z</dc:date>
    </item>
    <item>
      <title>Re: Cannot start an HA namenode with name dirs that need recovery. Dir: Storage Directory /data0/dfs</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Cannot-start-an-HA-namenode-with-name-dirs-that-need/m-p/61502#M55699</link>
      <description>&lt;P&gt;Just thought of two quick things to add to the discussion:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;First, I crossposted this issue on StackOverflow late yesterday prior to receiving any responses on this thread.&amp;nbsp; I will update both posts with the solution to prevent any duplicated effort.&amp;nbsp; (That thread has not received any responses so far).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Second, in the screenshot I noticed that in the 'Federation and High Availability' section is an item that controls 'Automatic Failover' and in my case it says it is not enabled.&amp;nbsp; This sheds some light on why my cluster is still down despite all the documentation on HA mentioning the automatic failover feature.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Should I just try clicking 'Enable' for Automatic Failover?&amp;nbsp; (I have made sure to back everything in the current/ dir of the other namenode.)&lt;/P&gt;</description>
      <pubDate>Fri, 03 Nov 2017 14:16:55 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Cannot-start-an-HA-namenode-with-name-dirs-that-need/m-p/61502#M55699</guid>
      <dc:creator>epowell</dc:creator>
      <dc:date>2017-11-03T14:16:55Z</dc:date>
    </item>
    <item>
      <title>Re: Cannot start an HA namenode with name dirs that need recovery. Dir: Storage Directory /data0/dfs</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Cannot-start-an-HA-namenode-with-name-dirs-that-need/m-p/61507#M55700</link>
      <description>&lt;P&gt;Before fixing the situation, I would try to start only one namenode (the one with data in its directory).&lt;/P&gt;&lt;P&gt;It should be considered as the active namenode if he is alone as long as it can start successfuly.&lt;/P&gt;</description>
      <pubDate>Fri, 03 Nov 2017 15:39:05 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Cannot-start-an-HA-namenode-with-name-dirs-that-need/m-p/61507#M55700</guid>
      <dc:creator>mathieu.d</dc:creator>
      <dc:date>2017-11-03T15:39:05Z</dc:date>
    </item>
    <item>
      <title>Re: Cannot start an HA namenode with name dirs that need recovery. Dir: Storage Directory /data0/dfs</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Cannot-start-an-HA-namenode-with-name-dirs-that-need/m-p/61510#M55701</link>
      <description>&lt;P&gt;Thanks a bunch for your help thus far, &lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/11415"&gt;@mathieu.d&lt;/a&gt;!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Based on your recommendation of starting only the good namenode, I have done the following:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Stop the cluster - In CM, I went to Services -&amp;gt; All Services, Actions -&amp;gt; Stop.&amp;nbsp; This stopped the only two running services, Zookeeper and HDFS.&lt;/LI&gt;&lt;LI&gt;Start all HDFS instances except the troubled namenode - I went back to the list of instances in the HDFS page, selected all instances except the troubled namenode and started them (see screenshot)&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Screen Shot 2017-11-03 at 9.46.42 AM.png" style="width: 246px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/3552iB426B0C56A4B86BA/image-size/large?v=v2&amp;amp;px=999" role="button" title="Screen Shot 2017-11-03 at 9.46.42 AM.png" alt="Screen Shot 2017-11-03 at 9.46.42 AM.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Things are looking much, much better after that (screenshot showing most instances in Good Health).&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Screen Shot 2017-11-03 at 9.48.55 AM.png" style="width: 600px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/3553iE42DD6B11694829A/image-size/large?v=v2&amp;amp;px=999" role="button" title="Screen Shot 2017-11-03 at 9.48.55 AM.png" alt="Screen Shot 2017-11-03 at 9.48.55 AM.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;However, when I return to the namenode (or any node for that matter) and attempt to run an hdfs command, I still get the same error:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;ubuntu@ip-10-0-0-154:~/backup/data1$ hdfs dfs -ls /
17/11/03 16:02:38 WARN retry.RetryInvocationHandler: Exception while invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB after 1 fail over attempts. Trying to fail over after sleeping for 1248ms.
17/11/03 16:02:39 WARN retry.RetryInvocationHandler: Exception while invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB after 2 fail over attempts. Trying to fail over after sleeping for 1968ms.
17/11/03 16:02:41 WARN retry.RetryInvocationHandler: Exception while invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB after 3 fail over attempts. Trying to fail over after sleeping for 2614ms.&lt;/PRE&gt;&lt;P&gt;Should I have tried to start all the services on the cluster (e.g. Zookeeper) as well as the HDFS service?&amp;nbsp; If so, I'm not sure which order the services should be started because I usually just use the overall cluster start action.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 03 Nov 2017 16:01:10 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Cannot-start-an-HA-namenode-with-name-dirs-that-need/m-p/61510#M55701</guid>
      <dc:creator>epowell</dc:creator>
      <dc:date>2017-11-03T16:01:10Z</dc:date>
    </item>
    <item>
      <title>Re: Cannot start an HA namenode with name dirs that need recovery. Dir: Storage Directory /data0/dfs</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Cannot-start-an-HA-namenode-with-name-dirs-that-need/m-p/61517#M55702</link>
      <description>&lt;P&gt;Thanks a bunch for your help thus far, &lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/11415"&gt;@mathieu.d&lt;/a&gt;!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Based on your recommendation of starting only the namenode with data, I have done the following:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Stop the cluster - In CM, I went to Services -&amp;gt; All Services, Action -&amp;gt; Stop.&amp;nbsp; This stopped the only two running services, Zookeeper and HDFS.&lt;/LI&gt;&lt;LI&gt;Start all the hdfs instances except the namenode without data - In the hdfs instances screen, I selected everything except the troubled namenode and started them (see screenshot).&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Screen Shot 2017-11-03 at 9.46.42 AM.png" style="width: 246px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/3554i6A106D5A104C17C6/image-size/large?v=v2&amp;amp;px=999" role="button" title="Screen Shot 2017-11-03 at 9.46.42 AM.png" alt="Screen Shot 2017-11-03 at 9.46.42 AM.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;After doing this, the situation seemed much better.&amp;nbsp; Most instances were now in Good Health.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Screen Shot 2017-11-03 at 9.48.55 AM.png" style="width: 600px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/3555i9E371C42A0BF97E5/image-size/large?v=v2&amp;amp;px=999" role="button" title="Screen Shot 2017-11-03 at 9.48.55 AM.png" alt="Screen Shot 2017-11-03 at 9.48.55 AM.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;However, hdfs commands still fail on the call to getInfo:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;ubuntu@ip-10-0-0-154:~/backup/data1$ hdfs dfs -ls /
17/11/03 17:19:50 WARN retry.RetryInvocationHandler: Exception while invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB after 1 fail over attempts. Trying to fail over after sleeping for 595ms.
17/11/03 17:19:50 WARN retry.RetryInvocationHandler: Exception while invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB after 2 fail over attempts. Trying to fail over after sleeping for 1600ms.
17/11/03 17:19:52 WARN retry.RetryInvocationHandler: Exception while invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB after 3 fail over attempts. Trying to fail over after sleeping for 4983ms.&lt;/PRE&gt;&lt;P&gt;I am also wondering whether I should have started all the services instead of just HDFS.&amp;nbsp; Normally, I do that through the overall cluster start action but that is starting the troubled namenode so I was trying to find a workaround.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks in advance!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 03 Nov 2017 17:12:05 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Cannot-start-an-HA-namenode-with-name-dirs-that-need/m-p/61517#M55702</guid>
      <dc:creator>epowell</dc:creator>
      <dc:date>2017-11-03T17:12:05Z</dc:date>
    </item>
    <item>
      <title>Re: Cannot start an HA namenode with name dirs that need recovery. Dir: Storage Directory /data0/dfs</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Cannot-start-an-HA-namenode-with-name-dirs-that-need/m-p/61518#M55703</link>
      <description>For the HDFS command try targeting explecitely the active namenode&lt;BR /&gt;&lt;BR /&gt;hdfs dfs -ls hdfs://host:8020/&lt;BR /&gt;</description>
      <pubDate>Fri, 03 Nov 2017 17:15:37 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Cannot-start-an-HA-namenode-with-name-dirs-that-need/m-p/61518#M55703</guid>
      <dc:creator>mathieu.d</dc:creator>
      <dc:date>2017-11-03T17:15:37Z</dc:date>
    </item>
    <item>
      <title>Re: Cannot start an HA namenode with name dirs that need recovery. Dir: Storage Directory /data0/dfs</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Cannot-start-an-HA-namenode-with-name-dirs-that-need/m-p/61520#M55704</link>
      <description>&lt;P&gt;10.0.0.154 is the namenode with data that is Started according to CM.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;From that node, I used 'localhost' as the host.&amp;nbsp; It returned connection refused.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;ubuntu@ip-10-0-0-154:~/backup/data1$ hdfs dfs -ls hdfs://localhost:8020/
ls: Call From ip-10-0-0-154.ec2.internal/10.0.0.154 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;EDIT:&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have just tried with the actual ip address and it was different.&amp;nbsp; I also get the same error when trying the command on other nodes.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;ubuntu@ip-10-0-0-154:~/backup/data1$ hdfs dfs -ls hdfs://10.0.0.154:8020/
ls: Operation category READ is not supported in state standby&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 03 Nov 2017 17:23:00 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Cannot-start-an-HA-namenode-with-name-dirs-that-need/m-p/61520#M55704</guid>
      <dc:creator>epowell</dc:creator>
      <dc:date>2017-11-03T17:23:00Z</dc:date>
    </item>
    <item>
      <title>Re: Cannot start an HA namenode with name dirs that need recovery. Dir: Storage Directory /data0/dfs</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Cannot-start-an-HA-namenode-with-name-dirs-that-need/m-p/61524#M55705</link>
      <description>&lt;P&gt;Based on this &lt;A href="https://community.cloudera.com/t5/CDH-Manual-Installation/Operation-category-READ-is-not-supported-in-state-standby/td-p/35008" target="_self"&gt;thread&lt;/A&gt;, it seems like the following command may be an option.&amp;nbsp; I will wait for further guidance, though.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;./hdfs haadmin -transitionToActive &amp;lt;nodename&amp;gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 03 Nov 2017 18:38:56 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Cannot-start-an-HA-namenode-with-name-dirs-that-need/m-p/61524#M55705</guid>
      <dc:creator>epowell</dc:creator>
      <dc:date>2017-11-03T18:38:56Z</dc:date>
    </item>
    <item>
      <title>Re: Cannot start an HA namenode with name dirs that need recovery. Dir: Storage Directory /data0/dfs</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Cannot-start-an-HA-namenode-with-name-dirs-that-need/m-p/61776#M55706</link>
      <description>&lt;P&gt;I continued the resolution of this issue in another &lt;A href="http://community.cloudera.com/t5/Storage-Random-Access-HDFS/ls-Operation-category-READ-is-not-supported-in-state-standby/m-p/61775#M3320" target="_self"&gt;thread&lt;/A&gt; specific to the error:&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;ls: Operation category READ is not supported in state standby&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The solution is marked on that thread, however, a quick summary was that I needed to add the Failover Controller role to a node in my cluster, enable Automatic Failover, and then restart the cluster for it all to kick in.&lt;/P&gt;</description>
      <pubDate>Mon, 13 Nov 2017 19:31:51 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Cannot-start-an-HA-namenode-with-name-dirs-that-need/m-p/61776#M55706</guid>
      <dc:creator>epowell</dc:creator>
      <dc:date>2017-11-13T19:31:51Z</dc:date>
    </item>
  </channel>
</rss>

