Support Questions

phaothu · ‎09-05-2018

Hi everyone,

My system have 2 datanode, 2 namenode, 3 journalnode, 3 zookeeper service

I had config cluster namenode ok , when browsing the admin page namenode:50070 , I had see 1 name node status (active) and one namenode status (standby). => OK

When I stop active namenode the other with standby become active . => OK

But the problem is how start the namenode which I had stop again ?

I do the following :

sudo -u hdfs hdfs namenode -bootstrapStandby -force

/etc/init.d/hadoop-hdfs-namenode start

With above process sometime namenode start ok with standby mode , but sometime it start with active mode and then I have 2 active node (split brain !!)

So what I have wrong , what is the right process to start a namenode had stop again

Thanks you

Harsh J · ‎09-10-2018

The fencing config requirement still exists, and you could configure a valid fencer if you wish to, but with Journal Nodes involved you can simply use the following as your fencer, as the QJMs fence the NameNodes by crashing them due to a single elected writer model:

<property>
<name>dfs.ha.fencing.methods</name>
<value>shell(/bin/true)</value>
</property>

View solution in original post

Harsh J · ‎09-05-2018

A HA HDFS installation requires you to run Failover Controllers on each of
the NameNode, along with a ZooKeeper service. These controllers take care
of transitioning NameNodes such that only one is active and the other
becomes standby.

It appears that you're using a CDH package based (non-CM) installation
here, so please follow the guide starting at
https://www.cloudera.com/documentation/enterprise/5-14-x/topics/cdh_hag_hdfs_ha_intro.html#topic_2_1...,
following instructions that are under the 'Command-Line' parts instead of
Cloudera Manager ones.

@phaothu wrote:
But the problem is how start the namenode which I had stop again ?

I do the following :
sudo -u hdfs hdfs namenode -bootstrapStandby -force

/etc/init.d/hadoop-hdfs-namenode start
With above process sometime namenode start ok with standby mode , but sometime it start with active mode and then I have 2 active node (split brain !!)

So what I have wrong , what is the right process to start a namenode had stop again

Just simply start it up. The bootstrap command must only be run if its a fresh new NameNode, not every restart of a previously running NameNode.

Its worth noting that Standby and Active are just states of the very same NameNode. The StandbyNameNode is not a special daemon, its just a state of the NameNode.

saranvisa · ‎09-06-2018

@phaothu

To do via CM,

Login as admin to CM -> HDFS -> Instances -> 'Federation and high availability' button -> Action -> Manual Failover

phaothu · ‎09-08-2018

Thanks @saranvisa but I am not using CM , just install by command line.

Harsh J · ‎09-08-2018

@phaothu,

> My system have 2 datanode, 2 namenode, 3 journalnode, 3 zookeeper service

To repeat, you need to run the ZKFailoverController daemons in addition to this setup. Please see the guide linked in my previous post and follow it entirely for the command-line setup.

Running just ZK will not grant you a HDFS HA solution - you are missing a crucial daemon that interfaces between ZK and HDFS.

phaothu · ‎09-10-2018

Dear @Harsh J ,

Does 'Automatic Failover Configuration' need config 'Fencing Configuration' , It is 2 dependent section or I need both to config Automatic Failover .

Because I met this error

ou must configure a fencing method before using automatic failover.
org.apache.hadoop.ha.BadFencingConfigurationException: No fencer configured for NameNode at node1/x.x.x.x:8020
	at org.apache.hadoop.hdfs.tools.NNHAServiceTarget.checkFencingConfigured(NNHAServiceTarget.java:132)
	at org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:225)
	at org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:60)
	at org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:171)
	at org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:167)
	at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:444)
	at org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:167)
	at org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:192)
2018-09-10 15:56:53,262 INFO org.apache.zookeeper.ZooKeeper: Session: 0x365c2b22a1e0000 closed
2018-09-10 15:56:53,262 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down

If I need both of them , so

<property>
  <name>dfs.ha.fencing.methods</name>
  <value>shell(/path/to/my/script.sh --nameservice=$target_nameserviceid $target_host:$target_port)</value>
</property>

What is the

/path/to/my/script.sh

The contain of this script I am not clear about this , pls explain and may be give me an example .

Thanks you

Harsh J · ‎09-10-2018

The fencing config requirement still exists, and you could configure a valid fencer if you wish to, but with Journal Nodes involved you can simply use the following as your fencer, as the QJMs fence the NameNodes by crashing them due to a single elected writer model:

<property>
<name>dfs.ha.fencing.methods</name>
<value>shell(/bin/true)</value>
</property>

phaothu · ‎09-10-2018

@Harsh J yeap with

<property>
  <name>dfs.ha.fencing.methods</name>
  <value>shell(/bin/true)</value>
</property>

It working perfect now

Thanks you very much

phaothu · ‎09-07-2018

Yeap as @Harsh J said I am using a CDH package based (non-CM) installation. I will show more about my config

I have 3 nodes: node 1 , node 2 , node 3

zookeeper in 3 nodes: 3 nodes with the same config

maxClientCnxns=50
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/var/lib/zookeeper
clientPort=2181
dataLogDir=/var/lib/zookeeper

hdfs-site.xml in 3 nodes:

<property>
  <name>dfs.nameservices</name>
  <value>mycluster</value>
</property>

<property>
  <name>dfs.ha.namenodes.mycluster</name>
  <value>node1,node2</value>
</property>

<property>
  <name>dfs.namenode.rpc-address.mycluster.node1</name>
  <value>node1:8020</value>
</property>

<property>
  <name>dfs.namenode.rpc-address.mycluster.node2</name>
  <value>node2:8020</value>
</property>


<property>
  <name>dfs.namenode.http-address.mycluster.node1</name>
  <value>node1:50070</value>
</property>
<property>
  <name>dfs.namenode.http-address.mycluster.node2</name>
  <value>node2:50070</value>
</property>


<property>
  <name>dfs.namenode.shared.edits.dir</name>
  <value>qjournal://node1:8485;node2:8485;node3:8485/mycluster</value>
</property>

<property>
  <name>dfs.client.failover.proxy.provider.mycluster</name>
  <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>

<property>
  <name>dfs.ha.automatic-failover.enabled</name>
  <value>true</value>
</property>

<property>
  <name>dfs.journalnode.edits.dir</name>
  <value>/namenode/dfs/jn</value>
</property>

core-site.xml in 3 nodes:

<property>
  <name>fs.defaultFS</name>
  <value>hdfs://mycluster</value>
</property>

<property>
  <name>ha.zookeeper.quorum</name>
  <value>node1:2181,node2:2181,node3:2181</value>
</property>

Above is config relate to cluster namnode. I doupt about zookeeper config , does is it enough ?

Service install on each node

Node 1: 

hadoop-hdfs-journalnode  hadoop-hdfs-namenode    hadoop-hdfs-zkfc  zookeeper-server

Node 2:

hadoop-hdfs-journalnode  hadoop-hdfs-namenode    hadoop-hdfs-zkfc  zookeeper-server

Node 3: 

hadoop-hdfs-journalnode  zookeeper-server

The firt time initial is ok:

Node 1 active , Node 2 standby

Stop namenode service on Node 1 => node 2 active => OK

But when start service NameNode on Node 1 again

node 1 active And node 2 active too => fail

Cloudera Community

Support Questions

Process to Start StandBy NameNode

hadoop cluster + Unable to start standby Namenode

cant start Standby NameNode

unable to start namenode

Standby NameNode cant start in ambari cluster

Unable to start/restart standby namenode smoothly ...

Standby NameNode process failing on start-up due t...

Unable to restrat standby Namenode

Start process group using nifi REST API

Resource Managers are starting up both in standby.

ambari cluster + both namenode are standby