- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
two name nodes are stand by after configuring HA
- Labels:
-
Apache Hadoop
Created ‎04-03-2017 01:01 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
i have configured high availability in my cluster which consists of three nodes
hadoop-master(192.168.4.128)(name node)
hadoop-slave-1(192.168.4.111) (another name node )
hadoop-slave-2 (192.168.4.106) (data node)
without formatting name node ( converting a non-HA-enabled cluster to be HA-enabled) as described here https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.ht...
but i got two name nodes working as standby so i tried to move the transition of one of these two nodes to active by applying the following command
hdfs haadmin -transitionToActive mycluster --forcemanual
with the following out put
17/04/03 08:07:35 WARN ha.HAAdmin: Proceeding with manual HA state management even though
automatic failover is enabled for NameNode at hadoop-master/192.168.4.128:8020
17/04/03 08:07:36 WARN ha.HAAdmin: Proceeding with manual HA state management even though
automatic failover is enabled for NameNode at hadoop-slave-1/192.168.4.111:8020
Illegal argument: Unable to determine service address for namenode 'mycluster'
my core-site is
<property>
<name>dfs.tmp.dir</name>
<value>/opt/hadoop/data15</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://hadoop-master:8020</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/usr/local/journal/node/local/data</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/tmp</value>
</property>
my hdfs-site.xml is
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/opt/hadoop/data16</value>
<final>true</final>
</property>
<property>
<name>dfs.data.dir</name>
<value>/opt/hadoop/data17</value>
<final>true</final>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop-slave-1:50090</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
<final>true</final>
</property>
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>hadoop-master,hadoop-slave-1</value>
<final>true</final>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.hadoop-master</name>
<value>hadoop-master:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.hadoop-slave-1</name>
<value>hadoop-slave-1:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.hadoop-master</name>
<value>hadoop-master:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.hadoop-slave-1</name>
<value>hadoop-slave-1:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://hadoop-master:8485;hadoop-slave-2:8485;hadoop-slave-1:8485/mycluster</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>hadoop-master:2181,hadoop-slave-1:2181,hadoop-slave-2:2181</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>root/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>3000</value>
</property>
what should the service address value be ? and what are possible solutions i can apply in order to turn on one name node of the two nodes to active state ?
note the zookeeper server on all three nodes is stopped
Created ‎04-03-2017 01:10 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You need to start zookeeper server in order to make ZKFailover controller up. ZKFailover controller is the one who manages the active and standby state of namenode.
Created ‎04-03-2017 01:29 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
even though i started zookeper server and i get a leader mode in one of two namenodes and follower mode in the other name node and data node, i still get same problem that both of two name nodes are stand by ,also there are no log files under log directory that is configured in zoo.cfg ,so i can't know zoo keeper errors but i think when .zkServer.sh status gives a status(followe or leader) it indicates that every thing with zookeeper is all right isn't it ?
Created ‎04-03-2017 02:03 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
running ./zkCli on two name nodes shows the same error
Welcome to ZooKeeper! JLine support is enabled [zk: localhost:2181(CONNECTING) 0] 2017-04-03 09:57:34,141 [myid:] - INFO [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@1032] - Opening socket connection to server 127.0.0.1/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 2017-04-03 09:57:34,148 [myid:] - WARN [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@1162] - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)
Created ‎04-03-2017 02:04 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
running ./zkCli on both namenodes shows same error
Welcome to ZooKeeper! JLine support is enabled [zk: localhost:2181(CONNECTING) 0] 2017-04-03 09:57:34,141 [myid:] - INFO [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@1032] - Opening socket connection to server 127.0.0.1/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 2017-04-03 09:57:34,148 [myid:] - WARN [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@1162] - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)
Created ‎04-03-2017 06:23 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Are you using HDP and did you enable NameNode HA using Ambari? If so then you should have automatic failover configured. Automatic Failover requires the ZooKeeper service instances and ZooKeeper FailoverControllers to be up and running.
If you setup HA manually, then you may need to transition one of the NNs to active status manually as described here:
Created ‎04-04-2017 05:47 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
iam using hadoop apache 2.7.1 and i have followed the link you applied
and finally tried to force one of the two name nodes to be active manually by applying
hdfs haadmin -transitionToActive hadoop-master
with the following response
- 17/04/04 03:13:06 WARN ha.HAAdmin: Proceeding with manual HA state management even though
- automatic failover is enabled for NameNode at hadoop-slave-1/192.168.4.111:8020
- 17/04/04 03:13:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
- 17/04/04 03:13:07 WARN ha.HAAdmin: Proceeding with manual HA state management even though
- automatic failover is enabled for NameNode at hadoop-master/192.168.4.128:8020
- Operation failed: End of File Exception between local host is: "hadoop-master/192.168.4.128"; destination host is: "hadoop-master":8020; : java.io.EOFException; For more details see:http://wiki.apache.org/hadoop/EOFException
what should i do with two stand by name nodes should i apply name node format on one of these two name nodes
Created ‎04-05-2017 01:41 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ok looks like you have automatic failover enabled. I am not sure why you get the EOFException.
Look through your NameNode logs to see if there are any errors.
Created ‎03-02-2021 08:32 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
I had similar issue, while building new cluster and enabling HA. Both NN were is standby and error in NN log {1}.
Fix was in CM, we need to "Initialize HA state in ZK" under "Federation and High Availability. There then restart cluster.
{1}Caused by: java.net.ConnectException: Call From <NN1> to <NN2>:8022 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
Created ‎03-02-2021 09:09 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello @kolli_sandeep , it seems the failover controllers are down in the cluster. Please follow the steps here [1] and start the Failover Controller roles which will transition the NameNdoes to Active/Standby state.
You need to follow below steps;
- Stop the FailoverController Roles under HDFS > Instances page
- Remove the HA state from ZK. On a ZooKeeper server host, run zookeeper-client.
- Execute the following to remove the configured nameservice. This example assumes the name of the nameservice is nameservice1. You can identify the nameservice from the Federation and High Availability section on the HDFS Instances tab:
rmr /hadoop-ha/nameservice1
(If you don't see any znode /hadoop-ha in ZK znode list, skip the step)
- Execute the following to remove the configured nameservice. This example assumes the name of the nameservice is nameservice1. You can identify the nameservice from the Federation and High Availability section on the HDFS Instances tab:
- After removing the HA znode in ZK, Go to CM and Click the HDFS > Instances > Federation and High Availability > Actions
- Under Actions menu, Select Actions > Initialize High Availability State in ZooKeeper.
- Then start the Failover Controllers role ( CM > Instances > Select FailoverControllers > Actions for selected > Start)
- Verify the NameNdoe State and if you don't see the active/standby state of NN, If any failure, just Restart the HDFS service
[1] https://docs.cloudera.com/documentation/enterprise/latest/topics/cdh_hag_hdfs_ha_enabling.html
