Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Automatic Failover is doesn't work

Highlighted

Automatic Failover is doesn't work

New Contributor

I'm trying to build a 3 node cluster (2 Namenode(nn1,nn2) and 1 datanode(dn1)) .Using Namenode WEBUI, I'm able to view that nn1 is active and nn2 is standby. however, when I kill the active nn1, standby nn2 is not going active. Please help me what am I doing wrong or what needs to be modified

nn1 /etc/hosts

127.0.0.1 localhost
192.168.10.153 nn1
192.168.10.154 dn1
192.168.10.155 nn2

nn2 /etc/hosts

127.0.0.1       localhost nn2
127.0.1.1       ubuntu

    # The following lines are desirable for IPv6 capable hosts
    ::1     ip6-localhost ip6-loopback
    fe00::0 ip6-localnet
    ff00::0 ip6-mcastprefix
    ff02::1 ip6-allnodes
    ff02::2 ip6-allrouters

core-site.xml (nn1,nn2)

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://192.168.10.153:8020</value>
</property>

<property>
<name>dfs.journalnode.edits.dir</name>
<value>/usr/local/hadoop/hdfs/data/jn</value>
</property>
 <property>
 <name>ha.zookeeper.quorum</name>
 <value>192.168.10.153:2181,192.168.10.155:2181,192.168.10.154:2181</value>
 </property>

</configuration>

hdfs-site.xml(nn1,nn2,dn1)

<property>
 <name>dfs.replication</name>
 <value>1</value>
 </property>
 <property>
 <name>dfs.permissions</name>
 <value>false</value>
 </property>
 <property>
 <name>dfs.nameservices</name>
 <value>ha-cluster</value>
 </property>
 <property>
 <name>dfs.ha.namenodes.ha-cluster</name>
 <value>nn1,nn2</value>
 </property>
 <property>
 <name>dfs.namenode.rpc-address.ha-cluster.nn1</name>
 <value>192.168.10.153:9000</value>
 </property>
 <property>
 <name>dfs.namenode.rpc-address.ha-cluster.nn2</name>
 <value>192.168.10.155:9000</value>
 </property>
 <property>/usr/local/hadoop/hdfs/datanode</value>
 <name>dfs.namenode.http-address.ha-cluster.nn1</name>
 <value>192.168.10.153:50070</value>
 </property>
 <property>
 <name>dfs.namenode.http-address.ha-cluster.nn2</name>
 <value>192.168.10.155:50070</value>
 </property>
 <property>
 <name>dfs.namenode.shared.edits.dir</name>
 <value>qjournal://192.168.10.153:8485;192.168.10.155:8485;192.168.10.154:8485/ha-cluster</value>
 </property>
 <property>
 <name>dfs.client.failover.proxy.provider.ha-cluster</name>
 <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
 </property>
 <property>
 <name>dfs.ha.automatic-failover.enabled</name>
 <value>true</value>
 </property>
 <property>
 <name>ha.zookeeper.quorum</name>
 <value>192.168.10.153:2181,192.168.10.155:2181,192.168.10.154:2181</value>
 </property>

<property>
 <name>dfs.ha.fencing.methods</name>
 <value>sshfence</value>
 </property>
 <property>
 <name>dfs.ha.fencing.ssh.private-key-files</name>
 <value>/home/ci/.ssh/id_rsa</value></property></configuration>
LOGS :(zkfc nn1,nn2)(namenode nn1,nn2) on stopping nn1(active node)https://pastebin.com/bWvfnanQ
1 REPLY 1

Re: Automatic Failover is doesn't work

Hi @Raaj M, your fs.defaultFS should point to a nameservice. Since your nameservice is `ha-cluster`, fs.defaulFS should be:

<property>
    <name>fs.defaultFS</name>
    <value>hdfs://ha-cluster</value>
</property>

After fixing this, try stopping all services and reformatting your ZK node as described here:

https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.h...