Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Question on hdfs automatic failover

Solved Go to solution

Question on hdfs automatic failover

Super Guru

Hey Guys,

Consider below scenario:

1. I have NN HA configured on my cluster

2. I have configured ssh fencing

3. My active NN went down and automated failover did not work

4. I had to failover manually using -forcemanual flag.

What fencing method we can use so that in case of power failure/physical server crash/OS reboot there would be automated failover? is it possible ?

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Question on hdfs automatic failover

In this case you need to configure two fencing methods and your last method should give success always so that automatic failover can happen successfully.

Please refer below link.

https://www.packtpub.com/books/content/setting-namenode-ha

View solution in original post

7 REPLIES 7
Highlighted

Re: Question on hdfs automatic failover

@Kuldeep Kulkarni

https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.ht...

You can try this

shell - run an arbitrary shell command to fence the Active NameNode

The shell fencing method runs an arbitrary shell command. It may be configured like so:

    <property>
      <name>dfs.ha.fencing.methods</name>
      <value>shell(/path/to/my/script.sh arg1 arg2 ...)</value>
    </property>

The string between ‘(’ and ‘)’ is passed directly to a bash shell and may not include any closing parentheses.

The shell command will be run with an environment set up to contain all of the current Hadoop configuration variables, with the ‘_’ character replacing any ‘.’ characters in the configuration keys. The configuration used has already had any namenode-specific configurations promoted to their generic forms – for example dfs_namenode_rpc-address will contain the RPC address of the target node, even though the configuration may specify that variable asdfs.namenode.rpc-address.ns1.nn1.

You can write your own custom scripts. Also, you can check with the OS vendor like "https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/Cluster_Suite_Overview/s2-fencing-overview-CSO.html"

Highlighted

Re: Question on hdfs automatic failover

Super Guru

@Neeraj Sabharwal - Thank you!

Highlighted

Re: Question on hdfs automatic failover

In this case you need to configure two fencing methods and your last method should give success always so that automatic failover can happen successfully.

Please refer below link.

https://www.packtpub.com/books/content/setting-namenode-ha

View solution in original post

Highlighted

Re: Question on hdfs automatic failover

Super Guru
Highlighted

Re: Question on hdfs automatic failover

Super Guru
Highlighted

Re: Question on hdfs automatic failover

One little extra comment: You do not need any fencing method for the failover. The QJM and zookeeper quorums make sure only the active namenode can write to the fsimage. However it is possible that a zombie active namenode might still give outdated read-only requests to connected clients.

That's where fencing comes in. However if configured the active namenode will wait for the fencing method to return success. So you need to be sure that your method does not block (by configuring a timeout in your ssh action for example ) and that in the end it returns success. I.e. either use a script that returns success in any case or have multiple non-blocking methods that end with one that returns true in any case

Highlighted

Re: Question on hdfs automatic failover

Super Guru

@Benjamin Leonhardi - Thank you!

Don't have an account?
Coming from Hortonworks? Activate your account here