Support Questions
Find answers, ask questions, and share your expertise

How Secondary name node architecture works in deep inside?

New Contributor

In cloudera blog we read the following:

"The SecondaryNameNode periodically compacts the EditLog into a “checkpoint;” the EditLog is then cleared. A restart of the NameNode then involves loading the most recent checkpoint and a shorter EditLog containing only events since the checkpoint"

Also in other web page:

"when checkpoint will be created, the secondary node sends fsimage and edit files to name node"

I have some questions and maybe you can link me some resources/books to read the following details:

  • Question 1: I don't understand why SecodnaryName node sends edit files back to NameNode? the NameNode needs only the checkpoint during startup, right? NameNode already has the same edit files (checkpoint are created based on this edit files). Maybe it's required to cleanup edit files? Maybe that's the "shorter EditLog" that is mentioned in the documentation?


  • Question 2: How NameNode cleans up edit files? Original edit files are replaced by new ones that it retrieved from Secondary-name Node? Or maybe there are other processes that cleansup edit fiels based on chec-kpoint content?

  • Question 3: After edit files are sent from NameNode to SecondaryNode, can this edit files modified in the NameNode? Or it's required to create new edit files after it will be sent to secondary name node for backup?

  • Question 4: I also read that, Secondary NameNode requires as much RAM as NameNode. It's strange. Why? The main task for Secondary name node is to create snapshots that require mostly I/O, is not it?