Support Questions
Find answers, ask questions, and share your expertise

Why all regionservers shutdown when active namenode failed?

I have a hbase cluster running on top of hadoop. I also have two namenode(active, standby), when i shutdown active namenode all regionserver starts to shutdown automatically, By theory standby namenode shoud take control right but why it is not happend. Anyone knows why this happens...

4 REPLIES 4

Re: Why all regionservers shutdown when active namenode failed?

Expert Contributor

Check whether Standby is becoming active on restarting already active namenode.

Region server logs?

Re: Why all regionservers shutdown when active namenode failed?

Mentor

@karthik nedunchezhiyan

To ensure successful NameNode switch from Active to standby the automatic failover setup adds following components to an HDFS deployment

  • ZooKeeper quorum
  • ZKFailoverController process (abbreviated as ZKFC).

The ZKFailoverController (ZKFC) is a ZooKeeper client that monitors and manages the state of the NameNode. Each of the machines which run NameNode service also runs a ZKFC.

The NameNode HA switch is ZooKeeper-based election, in the event of a NameNode shutdown to simulate a failure the ZKFC will try to acquire the lock. If ZKFC succeeds, then it has "won the election" and will be responsible for running a failover to make its local NameNode active.

Having said that ,ensure that you have at least 3 zookeepers and all are in a running state. Does the NameNodes switch roles (Active/Standby) despite the hbase failure?

HTH

Re: Why all regionservers shutdown when active namenode failed?

@Geoffrey Shelton Okot @schhabra

Issue was due to standby nameode was unable to become active even after active namenode fails. Issue has been solved after giving ssh password less access to both namenode.

Re: Why all regionservers shutdown when active namenode failed?

Mentor

@karthik nedunchezhiyan

Always stick to the Hortonworks documentation when preparing the environment,people tend to go out off script and encounter problems.
Our assumption is that the HW documentation was followed and the issues encountered are software integration related but when like now we learn it was the passwordless config that was ignored then what else you have under the hood.