Created 04-03-2017 01:01 PM
i have configured high availability in my cluster which consists of three nodes
hadoop-master(192.168.4.128)(name node)
hadoop-slave-1(192.168.4.111) (another name node )
hadoop-slave-2 (192.168.4.106) (data node)
without formatting name node ( converting a non-HA-enabled cluster to be HA-enabled) as described here https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.ht...
but i got two name nodes working as standby so i tried to move the transition of one of these two nodes to active by applying the following command
hdfs haadmin -transitionToActive mycluster --forcemanual
with the following out put
17/04/03 08:07:35 WARN ha.HAAdmin: Proceeding with manual HA state management even though
automatic failover is enabled for NameNode at hadoop-master/192.168.4.128:8020
17/04/03 08:07:36 WARN ha.HAAdmin: Proceeding with manual HA state management even though
automatic failover is enabled for NameNode at hadoop-slave-1/192.168.4.111:8020
Illegal argument: Unable to determine service address for namenode 'mycluster'
my core-site is
<property>
<name>dfs.tmp.dir</name>
<value>/opt/hadoop/data15</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://hadoop-master:8020</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/usr/local/journal/node/local/data</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/tmp</value>
</property>
my hdfs-site.xml is
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/opt/hadoop/data16</value>
<final>true</final>
</property>
<property>
<name>dfs.data.dir</name>
<value>/opt/hadoop/data17</value>
<final>true</final>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop-slave-1:50090</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
<final>true</final>
</property>
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>hadoop-master,hadoop-slave-1</value>
<final>true</final>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.hadoop-master</name>
<value>hadoop-master:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.hadoop-slave-1</name>
<value>hadoop-slave-1:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.hadoop-master</name>
<value>hadoop-master:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.hadoop-slave-1</name>
<value>hadoop-slave-1:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://hadoop-master:8485;hadoop-slave-2:8485;hadoop-slave-1:8485/mycluster</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>hadoop-master:2181,hadoop-slave-1:2181,hadoop-slave-2:2181</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>root/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>3000</value>
</property>
what should the service address value be ? and what are possible solutions i can apply in order to turn on one name node of the two nodes to active state ?
note the zookeeper server on all three nodes is stopped
Created 04-03-2017 01:10 PM
You need to start zookeeper server in order to make ZKFailover controller up. ZKFailover controller is the one who manages the active and standby state of namenode.
Created 04-03-2017 01:29 PM
even though i started zookeper server and i get a leader mode in one of two namenodes and follower mode in the other name node and data node, i still get same problem that both of two name nodes are stand by ,also there are no log files under log directory that is configured in zoo.cfg ,so i can't know zoo keeper errors but i think when .zkServer.sh status gives a status(followe or leader) it indicates that every thing with zookeeper is all right isn't it ?
Created 04-03-2017 02:03 PM
running ./zkCli on two name nodes shows the same error
Welcome to ZooKeeper! JLine support is enabled [zk: localhost:2181(CONNECTING) 0] 2017-04-03 09:57:34,141 [myid:] - INFO [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@1032] - Opening socket connection to server 127.0.0.1/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 2017-04-03 09:57:34,148 [myid:] - WARN [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@1162] - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)
Created 04-03-2017 02:04 PM
running ./zkCli on both namenodes shows same error
Welcome to ZooKeeper! JLine support is enabled [zk: localhost:2181(CONNECTING) 0] 2017-04-03 09:57:34,141 [myid:] - INFO [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@1032] - Opening socket connection to server 127.0.0.1/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 2017-04-03 09:57:34,148 [myid:] - WARN [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@1162] - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)
Created 04-03-2017 06:23 PM
Are you using HDP and did you enable NameNode HA using Ambari? If so then you should have automatic failover configured. Automatic Failover requires the ZooKeeper service instances and ZooKeeper FailoverControllers to be up and running.
If you setup HA manually, then you may need to transition one of the NNs to active status manually as described here:
Created 04-04-2017 05:47 AM
iam using hadoop apache 2.7.1 and i have followed the link you applied
and finally tried to force one of the two name nodes to be active manually by applying
hdfs haadmin -transitionToActive hadoop-master
with the following response
what should i do with two stand by name nodes should i apply name node format on one of these two name nodes
Created 04-05-2017 01:41 PM
Ok looks like you have automatic failover enabled. I am not sure why you get the EOFException.
Look through your NameNode logs to see if there are any errors.