When in case Standby Namenode is down and we have only Active Namenode currently up, who does the checkpoint operations of FSIMAGE and EDITS logs and can you please explain how it happens?
Which version of Hadoop are you using? In an HA cluster between Namenode and Standby namenode, there is Quorum journal manager (usually three nodes - one disk each). Assume everything is up to date. Now if a namespace change occurs, namenode is going to write to Quorum journal manager the same change. Standby Namenode is also watching this Quorum journal manager and promptly apply the changes to its own copy of namespace.
To ensure fast failover, datanodes are also configured with the location of both namenode and standby namenode and they send block information and heartbeats to both namenodes (although only one is active and other is standby).
For additional prevention of corruption of data, administrators use some fencing mechanism to prevent what is called a "split-brain-scenario". One way to achieve this is having journal nodes allow write operation by only one namenode. When your active namenode goes down, your standby will become the writer of journal nodes.
Now if standby is down, when it comes back up, it reads the journal nodes to bring itself up to date.
thanks mqureshi for the response. I have HDP 2.4 Cluster with Active & Standby Namenode. Standby Namenode is down from few days due to hardware issue. I do understand that once standby namenode is back up, it updates itself with the edits logs from the journal nodes. In addition to that I believe standby namenode also performs checkpoint operation (merging of fsimage and edits) and uploads it to Active Namenode based on the property dfs.namenode.checkpoint.period. (which will be done by Secondary Namenode in non HA environment). Now in this case when Standby namenode is down, does check pointing happen or not or any solution for that? Appreciate your help