09-15-2015 09:47 AM
09-15-2015 12:23 PM
You are correct about the Primary NameNode responsibilities. During an operation (such as creating or moving a file), it is recorded in the edit log. After the edit log has been modified, the NameNode also has an in-memory representation of the file system metadata is used to serve read requests.
The edit log is rolled after every write. The fsimage file is a checkpoint of the filesystem metadata. It is not updated for every filesystem write operation because due to its size it would take a while and take resources. If the NameNode fails or restarts, the latest state of its metadata can be reconstructed by loading the fsimage from disk into memory and applying each of the operations in the edit log. However, the edits log can files grow very large and it would take a while for the NameNode to reconstruct lots of edit files. Meanwhile, the file system will be offline. The Secondary NameNode alleviates the issue by helping the primary produce fsimage check-points of the primary’s in-memory filesystem metadata. Below is an diagram for further clarification:
09-15-2015 01:59 PM
Thanks for the detailed explanation and the diagram.
So from your description and from the diagram, the Cloudera training document's statement, "When a client performs a write operation, The NameNode’s in memory representation of the file system metadata is also updated" is either incorrect or is misleading. It sounds like the in memory representation of the metadata is updated right away.
10-03-2018 09:26 AM
I think here secondary namenode does not have capabilities to push newly created fsimage to primary namenode. checkpoint node has that capability. in case of secondary namnode its primary namenode's responsibilites to pull that updated fsimage while start up.