Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How does HDFS checkpointing work in a HA cluster ?

How does HDFS checkpointing work in a HA cluster ?




There is something I do not fully understand about HDFS and I would be glad if someone here can clarify it for me.

In regular setup, where there are a namenode and a secondary namenode, the secondary namenode is responsibe for checkpoints (merging edits file into fsimage).


In a High availability setup, where there are active namenode and a standby namenode, the standby namenode is doing the checkpointing.

But wht happens in a High availability setup when the active namenode is down or destroyed ? The standby namenode is promoted to be active but it is alone now. There is no standby/secondary NN.

And still the cluster should continue functioning as long as the remaining NN is up.


Who is doing the checkpointing in this case ? Is it the surviving Namenode ? Or maybe checkpointing halts unyil someone brings the secod namenode up ? How does it work ?


Thanks you




Re: How does HDFS checkpointing work in a HA cluster ?


Hello @ni4ni

As per my understanding if only one Namenode is there, checkpointing can't be achieved until and unless you restart the namenode machine.


For performing checkpoint you either need HA enable or SNN configured to the cluster.


If only one Nanenode can we able to perform the checkpoint process then from starting only there will be only one namenode and no secondary namenode. As initially the role of secondary namenode is to keep the Namenode updated.

As now we are increasing the number the of worker nodes to suffice our needs the standby namenode comes into picture, which can perform the checkpoint process and also act as active namenode and cater the incoming request as and when needed.


I hope this will give you some more clarity