Support Questions

Find answers, ask questions, and share your expertise

How to do a cluster failover from active namenode to standby namenode from Ambari console?

avatar
Rising Star

HI

How to do a cluster failover from active namenode to standby namenode from Ambari console?

Could someone please help with the exact steps to be done?

1 ACCEPTED SOLUTION

avatar

If you have setup automatic failover with ZooKeeper Failover Controllers then the ZKFC processes will automatically transition the Standby NN to Active status if the current active is unresponsive. The decision about which NN should be made active is taken by the ZKFC instances (coordinating via ZooKeeper). Ambari does not decide which NN should be active.

If you wish to perform a manual failover then you can use the hdfs dfsadmin command as @Sagar Shimpi suggested.

Both alternatives are described in the HDP documentation:

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_hadoop-ha/content/ha-nn-deploy-nn-cluste...

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_hadoop-ha/content/nn-ha-auto-failover.ht...

If you want to better understand the internals of automatic NN failover (recommended if you are administering a Hadoop cluster with HA), I recommend reading the Apache docs, specifically the section on Automatic Failover.

https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.ht...

View solution in original post

5 REPLIES 5

avatar
Expert Contributor

@Muthukumar S The cluster failover is automatically done by ambari. If you want to make your standby namenode the active one, you can stop the current active namenode (ensuring the current standby namenode is alive) and the current standby namenode will take over as the active namenode.

See also : https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithNFS.ht...

To manually change it check : https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithNFS.ht...

avatar
Rising Star

@sbhat Thank you will try and let you know, once i bring down active name node service, standby which is active will become active namenode. After that if I start the name node service on the one where is stopped will be become standby? Also do I need to do any manual steps like making active name node is safe mode and run savenamespace and then stop the service? or putting into maintenance mode the active name node will take care of this?

avatar
Expert Contributor

@Muthukumar S Yes. That is correct. You will not have to do all of that. HDFS is built to be fault tolerant. So it should work seemlessly. But I would still prefer the second method if it is a production box. Do update your findings and upvote the answer if it works. Thanks!

avatar
Super Guru
@Muthukumar S

In addition to @sbhat yes ambari will take care of failover, but if you intend to do the failover manually using cli the you can check for below command -

[hdfs@test ~]$ hdfs haadmin

Usage: haadmin

[-transitionToActive [--forceactive] <serviceId>]

[-transitionToStandby <serviceId>]

[-failover [--forcefence] [--forceactive] <serviceId> <serviceId>]

[-getServiceState <serviceId>]

[-checkHealth <serviceId>]

[-help <command>]

avatar

If you have setup automatic failover with ZooKeeper Failover Controllers then the ZKFC processes will automatically transition the Standby NN to Active status if the current active is unresponsive. The decision about which NN should be made active is taken by the ZKFC instances (coordinating via ZooKeeper). Ambari does not decide which NN should be active.

If you wish to perform a manual failover then you can use the hdfs dfsadmin command as @Sagar Shimpi suggested.

Both alternatives are described in the HDP documentation:

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_hadoop-ha/content/ha-nn-deploy-nn-cluste...

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_hadoop-ha/content/nn-ha-auto-failover.ht...

If you want to better understand the internals of automatic NN failover (recommended if you are administering a Hadoop cluster with HA), I recommend reading the Apache docs, specifically the section on Automatic Failover.

https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.ht...