Member since
04-13-2017
46
Posts
4
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
13756 | 01-11-2019 06:26 AM | |
8832 | 11-13-2017 11:31 AM | |
79855 | 11-13-2017 11:27 AM |
11-13-2017
11:31 AM
I continued the resolution of this issue in another thread specific to the error: ls: Operation category READ is not supported in state standby The solution is marked on that thread, however, a quick summary was that I needed to add the Failover Controller role to a node in my cluster, enable Automatic Failover, and then restart the cluster for it all to kick in.
... View more
11-13-2017
11:27 AM
As noted in the previous reply, I did not have any nodes with the Failover Controller role. Importantly, I also had not enabled Automatic Failover despite running in an HA configuration. I went ahead and added the Failover Controller role to both namenodes - the good one and the bad one. After that, I attempted enable the Automatic Failover using the link shown in the screenshot from this post. To do that, however, I needed to first start Zookeeper. At that point, If I recall correctly, the other namenode was still not active but I then restarted the entire cluster and the automatic failover kicked in, using the other namenode as the active one and leaving the bad namenode in a stopped state.
... View more
11-08-2017
08:51 AM
It appears I do not have any nodes with the Failover Controller role. The screenshot below shows the hdfs instances filtered by that role.
... View more
11-07-2017
08:56 PM
I do not know how to check if the "Failover Controller daemon running on the remaining NameNode". Can you please tell me how to check?
... View more
11-07-2017
08:24 PM
Thank you for your response. I followed you advice below but I am getting the error below. This is the same error as when I try a plain 'hdfs dfs -ls' command. root@ip-10-0-0-154:/home/ubuntu/backup/data1# grep -B 1 -A 2 nameservices /var/run/cloudera-scm-agent/process/9908-hdfs-NAMENODE/hdfs-site.xml
<property>
<name>dfs.nameservices</name>
<value>nameservice1</value>
</property>
ubuntu@ip-10-0-0-154:~/backup/data1$ hdfs dfs -ls hdfs://nameservice1/
17/11/08 04:29:50 WARN retry.RetryInvocationHandler: Exception while invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB after 1 fail over attempts. Trying to fail over after sleeping for 796ms. Also, I should mention that when I go to CM, it shows that my one good namenode is in 'standby'. Would it help to try a command like this? ./hdfs haadmin -transitionToActive <nodename> A second thing is that CM shows Automatic Failover is not enabled but there is a link to 'Enable' (see screenshot). Maybe this is another option to help the standby node get promoted to active?
... View more
11-07-2017
08:37 AM
I currently have one namenode in a 'stopped' state due to a node failure. I am unable to access any data or services on the cluster, as this was the main namenode. However, there is a second namenode that I am hoping can be used to recover. I have been working on the issue in this thread and currently I all hdfs instances started except for the bad namenode. This seems to have improved the situation as far as node health status but I still can't access any data. Here is the relevant command and error: ubuntu@ip-10-0-0-154:~/backup/data1$ hdfs dfs -ls hdfs://10.0.0.154:8020/
ls: Operation category READ is not supported in state standby In the previous thread, I also pointed out that there was the option to enable automatic failure in CM. I am wondering if that is the best course of action right now. Any help is greatly appreciated.
... View more
Labels:
- Labels:
-
HDFS
11-03-2017
11:38 AM
Based on this thread, it seems like the following command may be an option. I will wait for further guidance, though. ./hdfs haadmin -transitionToActive <nodename>
... View more
11-03-2017
10:19 AM
10.0.0.154 is the namenode with data that is Started according to CM. From that node, I used 'localhost' as the host. It returned connection refused. ubuntu@ip-10-0-0-154:~/backup/data1$ hdfs dfs -ls hdfs://localhost:8020/
ls: Call From ip-10-0-0-154.ec2.internal/10.0.0.154 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused EDIT: I have just tried with the actual ip address and it was different. I also get the same error when trying the command on other nodes. ubuntu@ip-10-0-0-154:~/backup/data1$ hdfs dfs -ls hdfs://10.0.0.154:8020/
ls: Operation category READ is not supported in state standby
... View more
11-03-2017
10:12 AM
Thanks a bunch for your help thus far, @mathieu.d! Based on your recommendation of starting only the namenode with data, I have done the following: Stop the cluster - In CM, I went to Services -> All Services, Action -> Stop. This stopped the only two running services, Zookeeper and HDFS. Start all the hdfs instances except the namenode without data - In the hdfs instances screen, I selected everything except the troubled namenode and started them (see screenshot). After doing this, the situation seemed much better. Most instances were now in Good Health. However, hdfs commands still fail on the call to getInfo: ubuntu@ip-10-0-0-154:~/backup/data1$ hdfs dfs -ls /
17/11/03 17:19:50 WARN retry.RetryInvocationHandler: Exception while invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB after 1 fail over attempts. Trying to fail over after sleeping for 595ms.
17/11/03 17:19:50 WARN retry.RetryInvocationHandler: Exception while invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB after 2 fail over attempts. Trying to fail over after sleeping for 1600ms.
17/11/03 17:19:52 WARN retry.RetryInvocationHandler: Exception while invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB after 3 fail over attempts. Trying to fail over after sleeping for 4983ms. I am also wondering whether I should have started all the services instead of just HDFS. Normally, I do that through the overall cluster start action but that is starting the troubled namenode so I was trying to find a workaround. Thanks in advance!
... View more
11-03-2017
09:01 AM
Thanks a bunch for your help thus far, @mathieu.d! Based on your recommendation of starting only the good namenode, I have done the following: Stop the cluster - In CM, I went to Services -> All Services, Actions -> Stop. This stopped the only two running services, Zookeeper and HDFS. Start all HDFS instances except the troubled namenode - I went back to the list of instances in the HDFS page, selected all instances except the troubled namenode and started them (see screenshot) Things are looking much, much better after that (screenshot showing most instances in Good Health). However, when I return to the namenode (or any node for that matter) and attempt to run an hdfs command, I still get the same error: ubuntu@ip-10-0-0-154:~/backup/data1$ hdfs dfs -ls /
17/11/03 16:02:38 WARN retry.RetryInvocationHandler: Exception while invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB after 1 fail over attempts. Trying to fail over after sleeping for 1248ms.
17/11/03 16:02:39 WARN retry.RetryInvocationHandler: Exception while invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB after 2 fail over attempts. Trying to fail over after sleeping for 1968ms.
17/11/03 16:02:41 WARN retry.RetryInvocationHandler: Exception while invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB after 3 fail over attempts. Trying to fail over after sleeping for 2614ms. Should I have tried to start all the services on the cluster (e.g. Zookeeper) as well as the HDFS service? If so, I'm not sure which order the services should be started because I usually just use the overall cluster start action.
... View more