Created on 09-05-2018 09:01 PM - edited 09-16-2022 06:40 AM
Hi everyone,
My system have 2 datanode, 2 namenode, 3 journalnode, 3 zookeeper service
I had config cluster namenode ok , when browsing the admin page namenode:50070 , I had see 1 name node status (active) and one namenode status (standby). => OK
When I stop active namenode the other with standby become active . => OK
But the problem is how start the namenode which I had stop again ?
I do the following :
sudo -u hdfs hdfs namenode -bootstrapStandby -force /etc/init.d/hadoop-hdfs-namenode start
With above process sometime namenode start ok with standby mode , but sometime it start with active mode and then I have 2 active node (split brain !!)
So what I have wrong , what is the right process to start a namenode had stop again
Thanks you
Created 09-10-2018 08:30 PM
Created on 09-05-2018 09:13 PM - edited 09-06-2018 03:16 AM
A HA HDFS installation requires you to run Failover Controllers on each of
the NameNode, along with a ZooKeeper service. These controllers take care
of transitioning NameNodes such that only one is active and the other
becomes standby.
It appears that you're using a CDH package based (non-CM) installation
here, so please follow the guide starting at
https://www.cloudera.com/documentation/enterprise/5-14-x/topics/cdh_hag_hdfs_ha_intro.html#topic_2_1...,
following instructions that are under the 'Command-Line' parts instead of
Cloudera Manager ones.
@phaothu wrote:But the problem is how start the namenode which I had stop again ?
I do the following :
sudo -u hdfs hdfs namenode -bootstrapStandby -force /etc/init.d/hadoop-hdfs-namenode startWith above process sometime namenode start ok with standby mode , but sometime it start with active mode and then I have 2 active node (split brain !!)
So what I have wrong , what is the right process to start a namenode had stop again
Just simply start it up. The bootstrap command must only be run if its a fresh new NameNode, not every restart of a previously running NameNode.
Its worth noting that Standby and Active are just states of the very same NameNode. The StandbyNameNode is not a special daemon, its just a state of the NameNode.
Created 09-06-2018 02:39 AM
To do via CM,
Login as admin to CM -> HDFS -> Instances -> 'Federation and high availability' button -> Action -> Manual Failover
Created 09-08-2018 02:44 AM
Thanks @saranvisa but I am not using CM , just install by command line.
Created 09-08-2018 05:29 AM
Created 09-10-2018 02:29 AM
Dear @Harsh J ,
Does 'Automatic Failover Configuration' need config 'Fencing Configuration' , It is 2 dependent section or I need both to config Automatic Failover .
Because I met this error
ou must configure a fencing method before using automatic failover. org.apache.hadoop.ha.BadFencingConfigurationException: No fencer configured for NameNode at node1/x.x.x.x:8020 at org.apache.hadoop.hdfs.tools.NNHAServiceTarget.checkFencingConfigured(NNHAServiceTarget.java:132) at org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:225) at org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:60) at org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:171) at org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:167) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:444) at org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:167) at org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:192) 2018-09-10 15:56:53,262 INFO org.apache.zookeeper.ZooKeeper: Session: 0x365c2b22a1e0000 closed 2018-09-10 15:56:53,262 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
If I need both of them , so
<property> <name>dfs.ha.fencing.methods</name> <value>shell(/path/to/my/script.sh --nameservice=$target_nameserviceid $target_host:$target_port)</value> </property>
What is the
/path/to/my/script.sh
The contain of this script I am not clear about this , pls explain and may be give me an example .
Thanks you
Created 09-10-2018 08:30 PM
Created 09-10-2018 09:36 PM
@Harsh J yeap with
<property> <name>dfs.ha.fencing.methods</name> <value>shell(/bin/true)</value> </property>
It working perfect now
Thanks you very much
Created 09-07-2018 04:10 AM
Yeap as @Harsh J said I am using a CDH package based (non-CM) installation. I will show more about my config
I have 3 nodes: node 1 , node 2 , node 3
zookeeper in 3 nodes: 3 nodes with the same config
maxClientCnxns=50 tickTime=2000 initLimit=10 syncLimit=5 dataDir=/var/lib/zookeeper clientPort=2181 dataLogDir=/var/lib/zookeeper
hdfs-site.xml in 3 nodes:
<property> <name>dfs.nameservices</name> <value>mycluster</value> </property> <property> <name>dfs.ha.namenodes.mycluster</name> <value>node1,node2</value> </property> <property> <name>dfs.namenode.rpc-address.mycluster.node1</name> <value>node1:8020</value> </property> <property> <name>dfs.namenode.rpc-address.mycluster.node2</name> <value>node2:8020</value> </property> <property> <name>dfs.namenode.http-address.mycluster.node1</name> <value>node1:50070</value> </property> <property> <name>dfs.namenode.http-address.mycluster.node2</name> <value>node2:50070</value> </property> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://node1:8485;node2:8485;node3:8485/mycluster</value> </property> <property> <name>dfs.client.failover.proxy.provider.mycluster</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> <property> <name>dfs.journalnode.edits.dir</name> <value>/namenode/dfs/jn</value> </property>
core-site.xml in 3 nodes:
<property> <name>fs.defaultFS</name> <value>hdfs://mycluster</value> </property> <property> <name>ha.zookeeper.quorum</name> <value>node1:2181,node2:2181,node3:2181</value> </property>
Above is config relate to cluster namnode. I doupt about zookeeper config , does is it enough ?
Service install on each node
Node 1: hadoop-hdfs-journalnode hadoop-hdfs-namenode hadoop-hdfs-zkfc zookeeper-server Node 2: hadoop-hdfs-journalnode hadoop-hdfs-namenode hadoop-hdfs-zkfc zookeeper-server Node 3: hadoop-hdfs-journalnode zookeeper-server
The firt time initial is ok:
Node 1 active , Node 2 standby
Stop namenode service on Node 1 => node 2 active => OK
But when start service NameNode on Node 1 again
node 1 active And node 2 active too => fail