Support Questions
Find answers, ask questions, and share your expertise

Does Performing a Manual Failover Ever Disruptive?

Solved Go to solution

Does Performing a Manual Failover Ever Disruptive?

Contributor

Does performing a:

hdfs haadmin -failover nn1 nn2

Ever a disruptive procedure? Suppose all services are up and running (zookeeper, zkfc, JN), it's my understanding that this should be a safe procedure and wouldn't cause jobs to fail, but wanted to know in what circumstances would this potentially be problematic.

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Does Performing a Manual Failover Ever Disruptive?

Explorer

No, it is a non-disruptive procedure, provided that the cluster is healthy and is not under a heavy load. One of reasons to do so is for upgrading namenode, either software or hardware.

During a namenode failover, the jobs and clients application will be redirected from the old active namenode to the new active namenode. Of course, they have to wait until the new active namenode becomes ready so that they are slowed down. In this sense, we are better to perform the failover operation when the cluster is idea or under a small load.

Hope it helps.

View solution in original post

2 REPLIES 2
Highlighted

Re: Does Performing a Manual Failover Ever Disruptive?

Explorer

No, it is a non-disruptive procedure, provided that the cluster is healthy and is not under a heavy load. One of reasons to do so is for upgrading namenode, either software or hardware.

During a namenode failover, the jobs and clients application will be redirected from the old active namenode to the new active namenode. Of course, they have to wait until the new active namenode becomes ready so that they are slowed down. In this sense, we are better to perform the failover operation when the cluster is idea or under a small load.

Hope it helps.

View solution in original post

Highlighted

Re: Does Performing a Manual Failover Ever Disruptive?

New Contributor

Before performing failover, it is a good idea to also check following:

  • The other namenode is alive and not in safemode.
  • The other namenode shows all expected datanodes alive.