- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Process to Start StandBy NameNode
- Labels:
-
HDFS
-
Manual Installation
Created on 09-05-2018 09:01 PM - edited 09-16-2022 06:40 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi everyone,
My system have 2 datanode, 2 namenode, 3 journalnode, 3 zookeeper service
I had config cluster namenode ok , when browsing the admin page namenode:50070 , I had see 1 name node status (active) and one namenode status (standby). => OK
When I stop active namenode the other with standby become active . => OK
But the problem is how start the namenode which I had stop again ?
I do the following :
sudo -u hdfs hdfs namenode -bootstrapStandby -force /etc/init.d/hadoop-hdfs-namenode start
With above process sometime namenode start ok with standby mode , but sometime it start with active mode and then I have 2 active node (split brain !!)
So what I have wrong , what is the right process to start a namenode had stop again
Thanks you
Created 09-10-2018 08:30 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
<property>
<name>dfs.ha.fencing.methods</name>
<value>shell(/bin/true)</value>
</property>
Created on 09-05-2018 09:13 PM - edited 09-06-2018 03:16 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
A HA HDFS installation requires you to run Failover Controllers on each of
the NameNode, along with a ZooKeeper service. These controllers take care
of transitioning NameNodes such that only one is active and the other
becomes standby.
It appears that you're using a CDH package based (non-CM) installation
here, so please follow the guide starting at
https://www.cloudera.com/documentation/enterprise/5-14-x/topics/cdh_hag_hdfs_ha_intro.html#topic_2_1...,
following instructions that are under the 'Command-Line' parts instead of
Cloudera Manager ones.
@phaothu wrote:But the problem is how start the namenode which I had stop again ?
I do the following :
sudo -u hdfs hdfs namenode -bootstrapStandby -force /etc/init.d/hadoop-hdfs-namenode startWith above process sometime namenode start ok with standby mode , but sometime it start with active mode and then I have 2 active node (split brain !!)
So what I have wrong , what is the right process to start a namenode had stop again
Just simply start it up. The bootstrap command must only be run if its a fresh new NameNode, not every restart of a previously running NameNode.
Its worth noting that Standby and Active are just states of the very same NameNode. The StandbyNameNode is not a special daemon, its just a state of the NameNode.
Created 09-06-2018 02:39 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
To do via CM,
Login as admin to CM -> HDFS -> Instances -> 'Federation and high availability' button -> Action -> Manual Failover
Created 09-08-2018 02:44 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks @saranvisa but I am not using CM , just install by command line.
Created 09-08-2018 05:29 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
> My system have 2 datanode, 2 namenode, 3 journalnode, 3 zookeeper service
To repeat, you need to run the ZKFailoverController daemons in addition to this setup. Please see the guide linked in my previous post and follow it entirely for the command-line setup.
Running just ZK will not grant you a HDFS HA solution - you are missing a crucial daemon that interfaces between ZK and HDFS.
Created 09-10-2018 02:29 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear @Harsh J ,
Does 'Automatic Failover Configuration' need config 'Fencing Configuration' , It is 2 dependent section or I need both to config Automatic Failover .
Because I met this error
ou must configure a fencing method before using automatic failover. org.apache.hadoop.ha.BadFencingConfigurationException: No fencer configured for NameNode at node1/x.x.x.x:8020 at org.apache.hadoop.hdfs.tools.NNHAServiceTarget.checkFencingConfigured(NNHAServiceTarget.java:132) at org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:225) at org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:60) at org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:171) at org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:167) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:444) at org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:167) at org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:192) 2018-09-10 15:56:53,262 INFO org.apache.zookeeper.ZooKeeper: Session: 0x365c2b22a1e0000 closed 2018-09-10 15:56:53,262 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
If I need both of them , so
<property> <name>dfs.ha.fencing.methods</name> <value>shell(/path/to/my/script.sh --nameservice=$target_nameserviceid $target_host:$target_port)</value> </property>
What is the
/path/to/my/script.sh
The contain of this script I am not clear about this , pls explain and may be give me an example .
Thanks you
Created 09-10-2018 08:30 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
<property>
<name>dfs.ha.fencing.methods</name>
<value>shell(/bin/true)</value>
</property>
Created 09-10-2018 09:36 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Harsh J yeap with
<property> <name>dfs.ha.fencing.methods</name> <value>shell(/bin/true)</value> </property>
It working perfect now
Thanks you very much
Created 09-07-2018 04:10 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yeap as @Harsh J said I am using a CDH package based (non-CM) installation. I will show more about my config
I have 3 nodes: node 1 , node 2 , node 3
zookeeper in 3 nodes: 3 nodes with the same config
maxClientCnxns=50 tickTime=2000 initLimit=10 syncLimit=5 dataDir=/var/lib/zookeeper clientPort=2181 dataLogDir=/var/lib/zookeeper
hdfs-site.xml in 3 nodes:
<property> <name>dfs.nameservices</name> <value>mycluster</value> </property> <property> <name>dfs.ha.namenodes.mycluster</name> <value>node1,node2</value> </property> <property> <name>dfs.namenode.rpc-address.mycluster.node1</name> <value>node1:8020</value> </property> <property> <name>dfs.namenode.rpc-address.mycluster.node2</name> <value>node2:8020</value> </property> <property> <name>dfs.namenode.http-address.mycluster.node1</name> <value>node1:50070</value> </property> <property> <name>dfs.namenode.http-address.mycluster.node2</name> <value>node2:50070</value> </property> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://node1:8485;node2:8485;node3:8485/mycluster</value> </property> <property> <name>dfs.client.failover.proxy.provider.mycluster</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> <property> <name>dfs.journalnode.edits.dir</name> <value>/namenode/dfs/jn</value> </property>
core-site.xml in 3 nodes:
<property> <name>fs.defaultFS</name> <value>hdfs://mycluster</value> </property> <property> <name>ha.zookeeper.quorum</name> <value>node1:2181,node2:2181,node3:2181</value> </property>
Above is config relate to cluster namnode. I doupt about zookeeper config , does is it enough ?
Service install on each node
Node 1: hadoop-hdfs-journalnode hadoop-hdfs-namenode hadoop-hdfs-zkfc zookeeper-server Node 2: hadoop-hdfs-journalnode hadoop-hdfs-namenode hadoop-hdfs-zkfc zookeeper-server Node 3: hadoop-hdfs-journalnode zookeeper-server
The firt time initial is ok:
Node 1 active , Node 2 standby
Stop namenode service on Node 1 => node 2 active => OK
But when start service NameNode on Node 1 again
node 1 active And node 2 active too => fail
