Support Questions

KuldeepK · ‎03-15-2016

Hey Guys,

Consider below scenario:

1. I have NN HA configured on my cluster

2. I have configured ssh fencing

3. My active NN went down and automated failover did not work

4. I had to failover manually using -forcemanual flag.

What fencing method we can use so that in case of power failure/physical server crash/OS reboot there would be automated failover? is it possible ?

jyadav · ‎03-15-2016

In this case you need to configure two fencing methods and your last method should give success always so that automatic failover can happen successfully.

Please refer below link.

https://www.packtpub.com/books/content/setting-namenode-ha

View solution in original post

nsabharwal · ‎03-15-2016

@Kuldeep Kulkarni

https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.ht...

You can try this

shell - run an arbitrary shell command to fence the Active NameNode

The shell fencing method runs an arbitrary shell command. It may be configured like so:

    <property>
      <name>dfs.ha.fencing.methods</name>
      <value>shell(/path/to/my/script.sh arg1 arg2 ...)</value>
    </property>

The string between ‘(’ and ‘)’ is passed directly to a bash shell and may not include any closing parentheses.

The shell command will be run with an environment set up to contain all of the current Hadoop configuration variables, with the ‘_’ character replacing any ‘.’ characters in the configuration keys. The configuration used has already had any namenode-specific configurations promoted to their generic forms – for example dfs_namenode_rpc-address will contain the RPC address of the target node, even though the configuration may specify that variable asdfs.namenode.rpc-address.ns1.nn1.

You can write your own custom scripts. Also, you can check with the OS vendor like "https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/Cluster_Suite_Overview/s2-fencing-overview-CSO.html"

KuldeepK · ‎03-16-2016

@Neeraj Sabharwal - Thank you!

jyadav · ‎03-15-2016

In this case you need to configure two fencing methods and your last method should give success always so that automatic failover can happen successfully.

Please refer below link.

https://www.packtpub.com/books/content/setting-namenode-ha

KuldeepK · ‎03-16-2016

@Jitendra Yadav

Can you please check https://community.hortonworks.com/questions/23151/how-to-specify-multiple-fencing-methods-in-dfshafe... and comment your views?

KuldeepK · ‎03-16-2016

@Jitendra Yadav - I have tested and it's working! Thank you.

https://community.hortonworks.com/questions/23151/how-to-specify-multiple-fencing-methods-in-dfshafe...

bleonhardi · ‎03-15-2016

One little extra comment: You do not need any fencing method for the failover. The QJM and zookeeper quorums make sure only the active namenode can write to the fsimage. However it is possible that a zombie active namenode might still give outdated read-only requests to connected clients.

That's where fencing comes in. However if configured the active namenode will wait for the fencing method to return success. So you need to be sure that your method does not block (by configuring a timeout in your ssh action for example ) and that in the end it returns success. I.e. either use a script that returns success in any case or have multiple non-blocking methods that end with one that returns true in any case

KuldeepK · ‎03-16-2016

@Benjamin Leonhardi - Thank you!

Cloudera Community

Support Questions

Question on hdfs automatic failover