Support Questions
Find answers, ask questions, and share your expertise

two name nodes are stand by after configuring HA

Highlighted

two name nodes are stand by after configuring HA

i have configured high availability in my cluster which consists of three nodes

hadoop-master(192.168.4.128)(name node)

hadoop-slave-1(192.168.4.111) (another name node )

hadoop-slave-2 (192.168.4.106) (data node)

without formatting name node ( converting a non-HA-enabled cluster to be HA-enabled) as described here https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.ht...

but i got two name nodes working as standby so i tried to move the transition of one of these two nodes to active by applying the following command

 hdfs haadmin -transitionToActive mycluster --forcemanual

with the following out put

17/04/03 08:07:35 WARN ha.HAAdmin: Proceeding with manual HA state management even though
automatic failover is enabled for NameNode at hadoop-master/192.168.4.128:8020
17/04/03 08:07:36 WARN ha.HAAdmin: Proceeding with manual HA state management even though
automatic failover is enabled for NameNode at hadoop-slave-1/192.168.4.111:8020
Illegal argument: Unable to determine service address for namenode 'mycluster'

my core-site is

<property>
                 <name>dfs.tmp.dir</name>
                 <value>/opt/hadoop/data15</value>
       </property>
        <property>
           <name>fs.default.name</name>
           <value>hdfs://hadoop-master:8020</value>
       </property>
       <property>
           <name>dfs.permissions</name>
           <value>false</value>
       </property>
       <property>
           <name>dfs.journalnode.edits.dir</name>
           <value>/usr/local/journal/node/local/data</value>
       </property>

        <property>

                <name>fs.defaultFS</name>

                <value>hdfs://mycluster</value>

        </property>
        <property>

                <name>hadoop.tmp.dir</name>

                <value>/tmp</value>
  </property>

my hdfs-site.xml is

<property>
                 <name>dfs.replication</name>
                 <value>2</value>
        </property>
        <property>
                 <name>dfs.name.dir</name>
                 <value>/opt/hadoop/data16</value>
                 <final>true</final>
        </property>
        <property>
                 <name>dfs.data.dir</name>
                 <value>/opt/hadoop/data17</value>
                 <final>true</final>
        </property>

        <property>
                <name>dfs.webhdfs.enabled</name>
                <value>true</value>
        </property>
        <property>
                <name>dfs.namenode.secondary.http-address</name>
                <value>hadoop-slave-1:50090</value>
        </property>

       <property>

        <name>dfs.nameservices</name>

        <value>mycluster</value>

        <final>true</final>

    </property>

    <property>

        <name>dfs.ha.namenodes.mycluster</name>

        <value>hadoop-master,hadoop-slave-1</value>

        <final>true</final>

    </property>

    <property>

        <name>dfs.namenode.rpc-address.mycluster.hadoop-master</name>

        <value>hadoop-master:8020</value>

    </property>

    <property>

        <name>dfs.namenode.rpc-address.mycluster.hadoop-slave-1</name>

        <value>hadoop-slave-1:8020</value>

    </property>

    <property>

        <name>dfs.namenode.http-address.mycluster.hadoop-master</name>

        <value>hadoop-master:50070</value>

    </property>

    <property>

        <name>dfs.namenode.http-address.mycluster.hadoop-slave-1</name>

        <value>hadoop-slave-1:50070</value>

    </property>

    <property>

        <name>dfs.namenode.shared.edits.dir</name>

        <value>qjournal://hadoop-master:8485;hadoop-slave-2:8485;hadoop-slave-1:8485/mycluster</value>

    </property>

    <property>

        <name>dfs.ha.automatic-failover.enabled</name>

        <value>true</value>

    </property>

    <property>

        <name>ha.zookeeper.quorum</name>
        <value>hadoop-master:2181,hadoop-slave-1:2181,hadoop-slave-2:2181</value>

    </property>

    <property>

        <name>dfs.ha.fencing.methods</name>

        <value>sshfence</value>

    </property>

    <property>

        <name>dfs.ha.fencing.ssh.private-key-files</name>

        <value>root/.ssh/id_rsa</value>

    </property>
    <property>

        <name>dfs.ha.fencing.ssh.connect-timeout</name>

        <value>3000</value>

    </property>

what should the service address value be ? and what are possible solutions i can apply in order to turn on one name node of the two nodes to active state ?

note the zookeeper server on all three nodes is stopped

7 REPLIES 7
Highlighted

Re: two name nodes are stand by after configuring HA

Contributor

You need to start zookeeper server in order to make ZKFailover controller up. ZKFailover controller is the one who manages the active and standby state of namenode.

Highlighted

Re: two name nodes are stand by after configuring HA

even though i started zookeper server and i get a leader mode in one of two namenodes and follower mode in the other name node and data node, i still get same problem that both of two name nodes are stand by ,also there are no log files under log directory that is configured in zoo.cfg ,so i can't know zoo keeper errors but i think when .zkServer.sh status gives a status(followe or leader) it indicates that every thing with zookeeper is all right isn't it ?

Highlighted

Re: two name nodes are stand by after configuring HA

running ./zkCli on two name nodes shows the same error

Welcome to ZooKeeper! JLine support is enabled [zk: localhost:2181(CONNECTING) 0] 2017-04-03 09:57:34,141 [myid:] - INFO [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@1032] - Opening socket connection to server 127.0.0.1/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 2017-04-03 09:57:34,148 [myid:] - WARN [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@1162] - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)

Re: two name nodes are stand by after configuring HA

running ./zkCli on both namenodes shows same error

Welcome to ZooKeeper! JLine support is enabled [zk: localhost:2181(CONNECTING) 0] 2017-04-03 09:57:34,141 [myid:] - INFO [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@1032] - Opening socket connection to server 127.0.0.1/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 2017-04-03 09:57:34,148 [myid:] - WARN [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@1162] - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)

Highlighted

Re: two name nodes are stand by after configuring HA

Are you using HDP and did you enable NameNode HA using Ambari? If so then you should have automatic failover configured. Automatic Failover requires the ZooKeeper service instances and ZooKeeper FailoverControllers to be up and running.

If you setup HA manually, then you may need to transition one of the NNs to active status manually as described here:

https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.ht...

Highlighted

Re: two name nodes are stand by after configuring HA

iam using hadoop apache 2.7.1 and i have followed the link you applied

https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.ht...

and finally tried to force one of the two name nodes to be active manually by applying

hdfs haadmin -transitionToActive hadoop-master

with the following response

  • 17/04/04 03:13:06 WARN ha.HAAdmin: Proceeding with manual HA state management even though
  • automatic failover is enabled for NameNode at hadoop-slave-1/192.168.4.111:8020
  • 17/04/04 03:13:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  • 17/04/04 03:13:07 WARN ha.HAAdmin: Proceeding with manual HA state management even though
  • automatic failover is enabled for NameNode at hadoop-master/192.168.4.128:8020
  • Operation failed: End of File Exception between local host is: "hadoop-master/192.168.4.128"; destination host is: "hadoop-master":8020; : java.io.EOFException; For more details see:http://wiki.apache.org/hadoop/EOFException

what should i do with two stand by name nodes should i apply name node format on one of these two name nodes

Highlighted

Re: two name nodes are stand by after configuring HA

Ok looks like you have automatic failover enabled. I am not sure why you get the EOFException.

Look through your NameNode logs to see if there are any errors.

Don't have an account?