Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

ls: Operation category READ is not supported in state standby

avatar
Expert Contributor

I currently have one namenode in a 'stopped' state due to a node failure.  I am unable to access any data or services on the cluster, as this was the main namenode.

 

However, there is a second namenode that I am hoping can be used to recover.  I have been working on the issue in this thread and currently I all hdfs instances started except for the bad namenode.  This seems to have improved the situation as far as node health status but I still can't access any data.

 

Here is the relevant command and error:

 

ubuntu@ip-10-0-0-154:~/backup/data1$ hdfs dfs -ls hdfs://10.0.0.154:8020/
ls: Operation category READ is not supported in state standby

 

In the previous thread, I also pointed out that there was the option to enable automatic failure in CM.  I am wondering if that is the best course of action right now.  Any help is greatly appreciated.

1 ACCEPTED SOLUTION

avatar
Expert Contributor

As noted in the previous reply, I did not have any nodes with the Failover Controller role.  Importantly, I also had not enabled Automatic Failover despite running in an HA configuration.

 

I went ahead and added the Failover Controller role to both namenodes - the good one and the bad one.

 

After that, I attempted enable the Automatic Failover using the link shown in the screenshot from this post.  To do that, however, I needed to first start Zookeeper.  

 

At that point, If I recall correctly, the other namenode was still not active but I then restarted the entire cluster and the automatic failover kicked in, using the other namenode as the active one and leaving the bad namenode in a stopped state.

View solution in original post

8 REPLIES 8

avatar
Champion

@epowell

 

The issue might be related to the below jira which is opened a long back still in open status

 

https://issues.apache.org/jira/browse/HDFS-3447

 

as an alternate way to connect to hdfs, go to hdfs-site.xml and get dfs.nameservices and try to connect to hdfs using namespace as follows, it may help you

 

hdfs://<ClusterName>-ns/<hdfs_path>

 

 Note: I didn't get a chance to explore this... also not sure how it will respond in old cdh version

 

avatar
Expert Contributor

 

Thank you for your response.

 

I followed you advice below but I am getting the error below.  This is the same error as when I try a plain 'hdfs dfs -ls' command.

 

 

root@ip-10-0-0-154:/home/ubuntu/backup/data1# grep -B 1 -A 2 nameservices /var/run/cloudera-scm-agent/process/9908-hdfs-NAMENODE/hdfs-site.xml 
  <property>
    <name>dfs.nameservices</name>
    <value>nameservice1</value>
  </property>
ubuntu@ip-10-0-0-154:~/backup/data1$ hdfs dfs -ls hdfs://nameservice1/
17/11/08 04:29:50 WARN retry.RetryInvocationHandler: Exception while invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB after 1 fail over attempts. Trying to fail over after sleeping for 796ms.

 

Also, I should mention that when I go to CM, it shows that my one good namenode is in 'standby'.   Would it help to try a command like this?

 

./hdfs haadmin -transitionToActive <nodename>

 

A second thing is that CM shows Automatic Failover is not enabled but there is a link to 'Enable' (see screenshot).  Maybe this is another option to help the standby node get promoted to active?

 

Screenshot 2017-11-07 at 21.26.49.png

avatar
Mentor
Is the Failover Controller daemon running on the remaining NameNode? If
not, start it up so it may elect its local NameNode into the ACTIVE state

avatar
Expert Contributor

I do not know how to check if the "Failover Controller daemon running on the remaining NameNode".

 

Can you please tell me how to check?

avatar
Mentor
If you're using Cloudera Manager, you can see the Failover Controller role instances and their states under the HDFS -> Instances tab.

If you're managing CDH without Cloudera Manager, then you can check on the NameNode host(s) with the below command:

$ sudo service hadoop-hdfs-zkfc status

avatar
Mentor
If you're instead using tarball or an unmanaged installation, the command to run the failover controller is:

$ hadoop-daemon.sh start zkfc

Or for a more interactive style:

$ hdfs zkfc

avatar
Expert Contributor

It appears I do not have any nodes with the Failover Controller role.  The screenshot below shows the hdfs instances filtered by that role.

 

Screen Shot 2017-11-08 at 9.49.35 AM.png

 

 

avatar
Expert Contributor

As noted in the previous reply, I did not have any nodes with the Failover Controller role.  Importantly, I also had not enabled Automatic Failover despite running in an HA configuration.

 

I went ahead and added the Failover Controller role to both namenodes - the good one and the bad one.

 

After that, I attempted enable the Automatic Failover using the link shown in the screenshot from this post.  To do that, however, I needed to first start Zookeeper.  

 

At that point, If I recall correctly, the other namenode was still not active but I then restarted the entire cluster and the automatic failover kicked in, using the other namenode as the active one and leaving the bad namenode in a stopped state.