About epowell

epowell · ‎11-13-2017

I continued the resolution of this issue in another thread specific to the error: ls: Operation category READ is not supported in state standby The solution is marked on that thread, however, a quick summary was that I needed to add the Failover Controller role to a node in my cluster, enable Automatic Failover, and then restart the cluster for it all to kick in.

epowell · ‎11-13-2017

As noted in the previous reply, I did not have any nodes with the Failover Controller role. Importantly, I also had not enabled Automatic Failover despite running in an HA configuration. I went ahead and added the Failover Controller role to both namenodes - the good one and the bad one. After that, I attempted enable the Automatic Failover using the link shown in the screenshot from this post. To do that, however, I needed to first start Zookeeper. At that point, If I recall correctly, the other namenode was still not active but I then restarted the entire cluster and the automatic failover kicked in, using the other namenode as the active one and leaving the bad namenode in a stopped state.

epowell · ‎11-08-2017

It appears I do not have any nodes with the Failover Controller role. The screenshot below shows the hdfs instances filtered by that role.

epowell · ‎11-07-2017

I do not know how to check if the "Failover Controller daemon running on the remaining NameNode". Can you please tell me how to check?

epowell · ‎11-07-2017

Thank you for your response. I followed you advice below but I am getting the error below. This is the same error as when I try a plain 'hdfs dfs -ls' command. root@ip-10-0-0-154:/home/ubuntu/backup/data1# grep -B 1 -A 2 nameservices /var/run/cloudera-scm-agent/process/9908-hdfs-NAMENODE/hdfs-site.xml <property> <name>dfs.nameservices</name> <value>nameservice1</value> </property> ubuntu@ip-10-0-0-154:~/backup/data1$ hdfs dfs -ls hdfs://nameservice1/ 17/11/08 04:29:50 WARN retry.RetryInvocationHandler: Exception while invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB after 1 fail over attempts. Trying to fail over after sleeping for 796ms. Also, I should mention that when I go to CM, it shows that my one good namenode is in 'standby'. Would it help to try a command like this? ./hdfs haadmin -transitionToActive <nodename> A second thing is that CM shows Automatic Failover is not enabled but there is a link to 'Enable' (see screenshot). Maybe this is another option to help the standby node get promoted to active?

epowell · ‎11-07-2017

I currently have one namenode in a 'stopped' state due to a node failure. I am unable to access any data or services on the cluster, as this was the main namenode. However, there is a second namenode that I am hoping can be used to recover. I have been working on the issue in this thread and currently I all hdfs instances started except for the bad namenode. This seems to have improved the situation as far as node health status but I still can't access any data. Here is the relevant command and error: ubuntu@ip-10-0-0-154:~/backup/data1$ hdfs dfs -ls hdfs://10.0.0.154:8020/ ls: Operation category READ is not supported in state standby In the previous thread, I also pointed out that there was the option to enable automatic failure in CM. I am wondering if that is the best course of action right now. Any help is greatly appreciated.

epowell · ‎11-03-2017

Based on this thread, it seems like the following command may be an option. I will wait for further guidance, though. ./hdfs haadmin -transitionToActive <nodename>

epowell · ‎11-03-2017

10.0.0.154 is the namenode with data that is Started according to CM. From that node, I used 'localhost' as the host. It returned connection refused. ubuntu@ip-10-0-0-154:~/backup/data1$ hdfs dfs -ls hdfs://localhost:8020/ ls: Call From ip-10-0-0-154.ec2.internal/10.0.0.154 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused EDIT: I have just tried with the actual ip address and it was different. I also get the same error when trying the command on other nodes. ubuntu@ip-10-0-0-154:~/backup/data1$ hdfs dfs -ls hdfs://10.0.0.154:8020/ ls: Operation category READ is not supported in state standby

epowell · ‎11-03-2017

Thanks a bunch for your help thus far, @mathieu.d! Based on your recommendation of starting only the namenode with data, I have done the following: Stop the cluster - In CM, I went to Services -> All Services, Action -> Stop. This stopped the only two running services, Zookeeper and HDFS. Start all the hdfs instances except the namenode without data - In the hdfs instances screen, I selected everything except the troubled namenode and started them (see screenshot). After doing this, the situation seemed much better. Most instances were now in Good Health. However, hdfs commands still fail on the call to getInfo: ubuntu@ip-10-0-0-154:~/backup/data1$ hdfs dfs -ls / 17/11/03 17:19:50 WARN retry.RetryInvocationHandler: Exception while invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB after 1 fail over attempts. Trying to fail over after sleeping for 595ms. 17/11/03 17:19:50 WARN retry.RetryInvocationHandler: Exception while invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB after 2 fail over attempts. Trying to fail over after sleeping for 1600ms. 17/11/03 17:19:52 WARN retry.RetryInvocationHandler: Exception while invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB after 3 fail over attempts. Trying to fail over after sleeping for 4983ms. I am also wondering whether I should have started all the services instead of just HDFS. Normally, I do that through the overall cluster start action but that is starting the troubled namenode so I was trying to find a workaround. Thanks in advance!

epowell · ‎11-03-2017

Thanks a bunch for your help thus far, @mathieu.d! Based on your recommendation of starting only the good namenode, I have done the following: Stop the cluster - In CM, I went to Services -> All Services, Actions -> Stop. This stopped the only two running services, Zookeeper and HDFS. Start all HDFS instances except the troubled namenode - I went back to the list of instances in the HDFS page, selected all instances except the troubled namenode and started them (see screenshot) Things are looking much, much better after that (screenshot showing most instances in Good Health). However, when I return to the namenode (or any node for that matter) and attempt to run an hdfs command, I still get the same error: ubuntu@ip-10-0-0-154:~/backup/data1$ hdfs dfs -ls / 17/11/03 16:02:38 WARN retry.RetryInvocationHandler: Exception while invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB after 1 fail over attempts. Trying to fail over after sleeping for 1248ms. 17/11/03 16:02:39 WARN retry.RetryInvocationHandler: Exception while invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB after 2 fail over attempts. Trying to fail over after sleeping for 1968ms. 17/11/03 16:02:41 WARN retry.RetryInvocationHandler: Exception while invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB after 3 fail over attempts. Trying to fail over after sleeping for 2614ms. Should I have tried to start all the services on the cluster (e.g. Zookeeper) as well as the HDFS service? If so, I'm not sure which order the services should be started because I usually just use the overall cluster start action.

Online	Offline
Last Visited	‎03-29-2019 01:55 PM

Member Since	‎04-13-2017 04:09 PM
Last Visited	‎03-29-2019 01:55 PM
Posts	46
Kudos received	4

Cloudera Community

Re: Impala queries through impala JDBC is running ...

Re: Cannot start an HA namenode with name dirs tha...

Re: ls: Operation category READ is not supported i...

Re: Cannot start an HA namenode with name dirs tha...

Re: ls: Operation category READ is not supported i...

Re: ls: Operation category READ is not supported i...

Re: ls: Operation category READ is not supported i...

Re: ls: Operation category READ is not supported i...

ls: Operation category READ is not supported in st...

Re: Cannot start an HA namenode with name dirs tha...

Re: Cannot start an HA namenode with name dirs tha...

Re: Cannot start an HA namenode with name dirs tha...

Re: Cannot start an HA namenode with name dirs tha...