Support Questions

Find answers, ask questions, and share your expertise

Question on Secondary name node and edits log

avatar
Explorer
When an HDFS client performs a write operation,it is recorded in the Primary NameNode’s edit log – The"edits"file" – The NameNode’s in memory representation of the file system metadata is also updated Then why do we need a separate edits log and a need to combine it with the snapshot of the fsimage later? Wouldn't the in memory representation which has the new edits as well, be the new fsimage? Unless I am missing some important fact here. Thanks!
1 ACCEPTED SOLUTION

avatar
Cloudera Employee

You are correct about the Primary NameNode responsibilities. During an operation (such as creating or moving a file), it is recorded in the edit log. After the edit log has been modified, the NameNode also has an in-memory representation of the file system metadata is used to serve read requests.

 

The edit log is rolled after every write. The fsimage file is a checkpoint of the filesystem metadata. It is not updated for every filesystem write operation because due to its size it would take a while and take resources. If the NameNode fails or restarts, the latest state of its metadata can be reconstructed by loading the fsimage from disk into memory and applying each of the operations in the edit log. However, the edits log can files grow very large and it would take a while for the NameNode to reconstruct lots of edit files. Meanwhile, the file system will be offline. The Secondary NameNode alleviates the issue by helping the primary produce fsimage check-points of the primary’s in-memory filesystem metadata.  Below is an diagram for further clarification:

Secondary NameNode Process.001.jpg

 

View solution in original post

3 REPLIES 3

avatar
Cloudera Employee

You are correct about the Primary NameNode responsibilities. During an operation (such as creating or moving a file), it is recorded in the edit log. After the edit log has been modified, the NameNode also has an in-memory representation of the file system metadata is used to serve read requests.

 

The edit log is rolled after every write. The fsimage file is a checkpoint of the filesystem metadata. It is not updated for every filesystem write operation because due to its size it would take a while and take resources. If the NameNode fails or restarts, the latest state of its metadata can be reconstructed by loading the fsimage from disk into memory and applying each of the operations in the edit log. However, the edits log can files grow very large and it would take a while for the NameNode to reconstruct lots of edit files. Meanwhile, the file system will be offline. The Secondary NameNode alleviates the issue by helping the primary produce fsimage check-points of the primary’s in-memory filesystem metadata.  Below is an diagram for further clarification:

Secondary NameNode Process.001.jpg

 

avatar
Explorer

Thanks for the detailed explanation and the diagram.

So from your description and from the diagram, the Cloudera training document's statement, "When a client performs a write operation, The NameNode’s in memory representation of the file system metadata is also updated" is either incorrect or is misleading. It sounds like the in memory representation of the metadata is updated right away.

 

 

avatar

I think here secondary namenode does not have capabilities to push newly created fsimage to primary namenode. checkpoint node has that capability. in case of secondary namnode its primary namenode's responsibilites to pull that updated fsimage while start up.