Support Questions

Find answers, ask questions, and share your expertise

How does HDFS checkpointing work in a HA cluster ?

avatar
Explorer

Hello

 

There is something I do not fully understand about HDFS and I would be glad if someone here can clarify it for me.

In regular setup, where there are a namenode and a secondary namenode, the secondary namenode is responsibe for checkpoints (merging edits file into fsimage).

 

In a High availability setup, where there are active namenode and a standby namenode, the standby namenode is doing the checkpointing.

But wht happens in a High availability setup when the active namenode is down or destroyed ? The standby namenode is promoted to be active but it is alone now. There is no standby/secondary NN.

And still the cluster should continue functioning as long as the remaining NN is up.

 

Who is doing the checkpointing in this case ? Is it the surviving Namenode ? Or maybe checkpointing halts unyil someone brings the secod namenode up ? How does it work ?

 

Thanks you

 

Guy

1 ACCEPTED SOLUTION

avatar
Explorer

No checkpointing takes place. Periodic checkpointing is suspended in an HA setup when the Standby NameNode is down.

View solution in original post

5 REPLIES 5

avatar
Explorer

Hello @ni4ni

As per my understanding if only one Namenode is there, checkpointing can't be achieved until and unless you restart the namenode machine.

 

For performing checkpoint you either need HA enable or SNN configured to the cluster.

 

If only one Nanenode can we able to perform the checkpoint process then from starting only there will be only one namenode and no secondary namenode. As initially the role of secondary namenode is to keep the Namenode updated.

As now we are increasing the number the of worker nodes to suffice our needs the standby namenode comes into picture, which can perform the checkpoint process and also act as active namenode and cater the incoming request as and when needed.

 

I hope this will give you some more clarity
Thanks.

avatar
Explorer

No checkpointing takes place. Periodic checkpointing is suspended in an HA setup when the Standby NameNode is down.

avatar
Master Mentor

@ni4ni @Masood 

 

Checkpointing is a process that takes an fsimage and edit log and compacts them into a new fsimage. This way, instead of replaying a potentially unbounded edit log, the NameNode can load the final in-memory state directly from the fsimage. This is a far more efficient operation and reduces NameNode startup time.

Checkpointing is one of the most important activites of the standby or secondary Namenode in a cluster.
In an HA cluster, all connections and cluster activity is managed by the Active namenode and the Standby NameNode takes the responsibility of compacting the edits logs and fsimage it does also performs checkpoints of the namespace state, and thus it is not necessary to run a Secondary NameNode

 

Hope that helps

avatar
Master Mentor

@ni4ni 
@Masood 

Unfortunateéy I will dispute  @Masood resèponse with a reference to hadoop.apache.org documentation see link below

 

In a HA setup, the standby does effectively do the checkpointing reference, to maintain correct documentation as a community reference please un-accept the answer 

www.hadoop.org 

See extract a quote from the above website

"Note that, in an HA cluster, the Standby NameNodes also performs checkpoints of the namespace state, and thus it is not necessary to run a Secondary NameNode, CheckpointNode, or BackupNode in an HA cluster. In fact, to do so would be an error. This also allows one who is reconfiguring a non-HA-enabled HDFS cluster to be HA-enabled to reuse the hardware which they had previously dedicated to the Secondary NameNode."

Happy hadooping

 

 

avatar
Explorer

@ni4ni  mentions,

There is no standby/secondary NN.