There is something I do not fully understand about HDFS and I would be glad if someone here can clarify it for me.
In regular setup, where there are a namenode and a secondary namenode, the secondary namenode is responsibe for checkpoints (merging edits file into fsimage).
In a High availability setup, where there are active namenode and a standby namenode, the standby namenode is doing the checkpointing.
But wht happens in a High availability setup when the active namenode is down or destroyed ? The standby namenode is promoted to be active but it is alone now. There is no standby/secondary NN.
And still the cluster should continue functioning as long as the remaining NN is up.
Who is doing the checkpointing in this case ? Is it the surviving Namenode ? Or maybe checkpointing halts unyil someone brings the secod namenode up ? How does it work ?
As per my understanding if only one Namenode is there, checkpointing can't be achieved until and unless you restart the namenode machine.
For performing checkpoint you either need HA enable or SNN configured to the cluster.
If only one Nanenode can we able to perform the checkpoint process then from starting only there will be only one namenode and no secondary namenode. As initially the role of secondary namenode is to keep the Namenode updated.
As now we are increasing the number the of worker nodes to suffice our needs the standby namenode comes into picture, which can perform the checkpoint process and also act as active namenode and cater the incoming request as and when needed.
I hope this will give you some more clarity
Checkpointing is a process that takes an fsimage and edit log and compacts them into a new fsimage. This way, instead of replaying a potentially unbounded edit log, the NameNode can load the final in-memory state directly from the fsimage. This is a far more efficient operation and reduces NameNode startup time.
Checkpointing is one of the most important activites of the standby or secondary Namenode in a cluster.
In an HA cluster, all connections and cluster activity is managed by the Active namenode and the Standby NameNode takes the responsibility of compacting the edits logs and fsimage it does also performs checkpoints of the namespace state, and thus it is not necessary to run a Secondary NameNode
Hope that helps
Unfortunateéy I will dispute @Masood resèponse with a reference to hadoop.apache.org documentation see link below
In a HA setup, the standby does effectively do the checkpointing reference, to maintain correct documentation as a community reference please un-accept the answer
See extract a quote from the above website
"Note that, in an HA cluster, the Standby NameNodes also performs checkpoints of the namespace state, and thus it is not necessary to run a Secondary NameNode, CheckpointNode, or BackupNode in an HA cluster. In fact, to do so would be an error. This also allows one who is reconfiguring a non-HA-enabled HDFS cluster to be HA-enabled to reuse the hardware which they had previously dedicated to the Secondary NameNode."