Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

two name nodes are stand by after configuring HA

avatar
Rising Star

i have configured high availability in my cluster which consists of three nodes

hadoop-master(192.168.4.128)(name node)

hadoop-slave-1(192.168.4.111) (another name node )

hadoop-slave-2 (192.168.4.106) (data node)

without formatting name node ( converting a non-HA-enabled cluster to be HA-enabled) as described here https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.ht...

but i got two name nodes working as standby so i tried to move the transition of one of these two nodes to active by applying the following command

 hdfs haadmin -transitionToActive mycluster --forcemanual

with the following out put

17/04/03 08:07:35 WARN ha.HAAdmin: Proceeding with manual HA state management even though
automatic failover is enabled for NameNode at hadoop-master/192.168.4.128:8020
17/04/03 08:07:36 WARN ha.HAAdmin: Proceeding with manual HA state management even though
automatic failover is enabled for NameNode at hadoop-slave-1/192.168.4.111:8020
Illegal argument: Unable to determine service address for namenode 'mycluster'

my core-site is

<property>
                 <name>dfs.tmp.dir</name>
                 <value>/opt/hadoop/data15</value>
       </property>
        <property>
           <name>fs.default.name</name>
           <value>hdfs://hadoop-master:8020</value>
       </property>
       <property>
           <name>dfs.permissions</name>
           <value>false</value>
       </property>
       <property>
           <name>dfs.journalnode.edits.dir</name>
           <value>/usr/local/journal/node/local/data</value>
       </property>

        <property>

                <name>fs.defaultFS</name>

                <value>hdfs://mycluster</value>

        </property>
        <property>

                <name>hadoop.tmp.dir</name>

                <value>/tmp</value>
  </property>

my hdfs-site.xml is

<property>
                 <name>dfs.replication</name>
                 <value>2</value>
        </property>
        <property>
                 <name>dfs.name.dir</name>
                 <value>/opt/hadoop/data16</value>
                 <final>true</final>
        </property>
        <property>
                 <name>dfs.data.dir</name>
                 <value>/opt/hadoop/data17</value>
                 <final>true</final>
        </property>

        <property>
                <name>dfs.webhdfs.enabled</name>
                <value>true</value>
        </property>
        <property>
                <name>dfs.namenode.secondary.http-address</name>
                <value>hadoop-slave-1:50090</value>
        </property>

       <property>

        <name>dfs.nameservices</name>

        <value>mycluster</value>

        <final>true</final>

    </property>

    <property>

        <name>dfs.ha.namenodes.mycluster</name>

        <value>hadoop-master,hadoop-slave-1</value>

        <final>true</final>

    </property>

    <property>

        <name>dfs.namenode.rpc-address.mycluster.hadoop-master</name>

        <value>hadoop-master:8020</value>

    </property>

    <property>

        <name>dfs.namenode.rpc-address.mycluster.hadoop-slave-1</name>

        <value>hadoop-slave-1:8020</value>

    </property>

    <property>

        <name>dfs.namenode.http-address.mycluster.hadoop-master</name>

        <value>hadoop-master:50070</value>

    </property>

    <property>

        <name>dfs.namenode.http-address.mycluster.hadoop-slave-1</name>

        <value>hadoop-slave-1:50070</value>

    </property>

    <property>

        <name>dfs.namenode.shared.edits.dir</name>

        <value>qjournal://hadoop-master:8485;hadoop-slave-2:8485;hadoop-slave-1:8485/mycluster</value>

    </property>

    <property>

        <name>dfs.ha.automatic-failover.enabled</name>

        <value>true</value>

    </property>

    <property>

        <name>ha.zookeeper.quorum</name>
        <value>hadoop-master:2181,hadoop-slave-1:2181,hadoop-slave-2:2181</value>

    </property>

    <property>

        <name>dfs.ha.fencing.methods</name>

        <value>sshfence</value>

    </property>

    <property>

        <name>dfs.ha.fencing.ssh.private-key-files</name>

        <value>root/.ssh/id_rsa</value>

    </property>
    <property>

        <name>dfs.ha.fencing.ssh.connect-timeout</name>

        <value>3000</value>

    </property>

what should the service address value be ? and what are possible solutions i can apply in order to turn on one name node of the two nodes to active state ?

note the zookeeper server on all three nodes is stopped

10 REPLIES 10

avatar
New Contributor

Please add zkcli command to login in znode and remove directory. Hope you understand.

zookeeper-client -server <zookeeper-server-host>:2181
(May use sudo if permission issue or login from HDFS User)
ls / or ls /hadoop-ha
(If you don't see any znode /hadoop-ha in ZK znode list, skip the step below)
rmr /hadoop-ha/nameservice1