Support Questions
Find answers, ask questions, and share your expertise

checkpoint node vs Secondary namenode

Expert Contributor

How is checkpoint node different from secondary namenode? what is the role played by checkpoint node?

1 ACCEPTED SOLUTION

Accepted Solutions

Re: checkpoint node vs Secondary namenode

@Viswa, the Apache documentation provides a good description.

https://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Secondary_N...

I have never seen Checkpoint/Backup node being used in practice and these should be considered deprecated. I recommend using the Secondary NameNode. Ideally you should use NameNode HA which eliminates the single point of failure.

https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.ht...

If you use Ambari to install HDP, SecondaryNameNode is enabled by default and NameNode HA can be enabled using a wizard.

View solution in original post

2 REPLIES 2

Re: checkpoint node vs Secondary namenode

Contributor

@Sreeviswa Athikala

Namenode doesn't have capabilities to merge both fsimage and edit logs. After setup the brand new hadoop cluster, namenode will have empty fsimage and one editlog file.All the changes are writing to edit log files and if namenode runs like this edit logs gets huge and in any case if namenode needs to restart then it will take longer time because more changes needs to be applied to the last state of the metadata.

To avoid this Checkpoint service has launched and it will work based on below two configurations dfs.namenode.checkpoint.period, set to 1 hour by default, specifies the maximum delay between two consecutive checkpoints or dfs.namenode.checkpoint.txns, set to 1 million by default, defines the number of uncheckpointed transactions on the NameNode which will force an urgent checkpoint, even if the checkpoint period has not been reached.

if any one of the above conditions met then immediately checkpoint service will happen. here checkpoint nothing but merging editlog files and generating new FS image file and uploads it to Namenode.

Like checkpoint node we have secondary node, the only difference is checkpoint node will directly upload fsimage to namenode where as secondary namenode doesn't have fsimage upload feature to namenode.

Re: checkpoint node vs Secondary namenode

@Viswa, the Apache documentation provides a good description.

https://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Secondary_N...

I have never seen Checkpoint/Backup node being used in practice and these should be considered deprecated. I recommend using the Secondary NameNode. Ideally you should use NameNode HA which eliminates the single point of failure.

https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.ht...

If you use Ambari to install HDP, SecondaryNameNode is enabled by default and NameNode HA can be enabled using a wizard.

View solution in original post