Member since
04-13-2017
46
Posts
4
Kudos Received
3
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 86704 | 11-13-2017 11:27 AM |
11-13-2017
11:27 AM
As noted in the previous reply, I did not have any nodes with the Failover Controller role. Importantly, I also had not enabled Automatic Failover despite running in an HA configuration.
I went ahead and added the Failover Controller role to both namenodes - the good one and the bad one.
After that, I attempted enable the Automatic Failover . To do that, however, I needed to first start Zookeeper.
At that point, If I recall correctly, the other namenode was still not active but I then restarted the entire cluster and the automatic failover kicked in, using the other namenode as the active one and leaving the bad namenode in a stopped state.
... View more
11-08-2017
08:51 AM
It appears I do not have any nodes with the Failover Controller role. The screenshot below shows the hdfs instances filtered by that role.
... View more
11-07-2017
08:56 PM
I do not know how to check if the "Failover Controller daemon running on the remaining NameNode". Can you please tell me how to check?
... View more
11-07-2017
08:24 PM
Thank you for your response. I followed you advice below but I am getting the error below. This is the same error as when I try a plain 'hdfs dfs -ls' command. root@ip-10-0-0-154:/home/ubuntu/backup/data1# grep -B 1 -A 2 nameservices /var/run/cloudera-scm-agent/process/9908-hdfs-NAMENODE/hdfs-site.xml
<property>
<name>dfs.nameservices</name>
<value>nameservice1</value>
</property>
ubuntu@ip-10-0-0-154:~/backup/data1$ hdfs dfs -ls hdfs://nameservice1/
17/11/08 04:29:50 WARN retry.RetryInvocationHandler: Exception while invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB after 1 fail over attempts. Trying to fail over after sleeping for 796ms. Also, I should mention that when I go to CM, it shows that my one good namenode is in 'standby'. Would it help to try a command like this? ./hdfs haadmin -transitionToActive <nodename> A second thing is that CM shows Automatic Failover is not enabled but there is a link to 'Enable' (see screenshot). Maybe this is another option to help the standby node get promoted to active?
... View more
11-07-2017
08:37 AM
I currently have one namenode in a 'stopped' state due to a node failure. I am unable to access any data or services on the cluster, as this was the main namenode.
However, there is a second namenode that I am hoping can be used to recover. I have been working on the issue in this thread and currently I all hdfs instances started except for the bad namenode. This seems to have improved the situation as far as node health status but I still can't access any data.
Here is the relevant command and error:
ubuntu@ip-10-0-0-154:~/backup/data1$ hdfs dfs -ls hdfs://10.0.0.154:8020/
ls: Operation category READ is not supported in state standby
In the previous thread, I also pointed out that there was the option to enable automatic failure in CM. I am wondering if that is the best course of action right now. Any help is greatly appreciated.
... View more
Labels:
- Labels:
-
Apache Zookeeper
-
HDFS