Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Is Secondary Name node is mandatory for any distributed hadoop cluster?

avatar
New Contributor
 
1 ACCEPTED SOLUTION

avatar

If you are running an HA NameNode using Quorum Journal Manager, then running the SecondaryNameNode is not required. Actually, it would be incorrect to deploy a SecondaryNameNode alongside an HA NameNode pair.

Before implementation of HA with Quorum Journal Manager, the function of the SecondaryNameNode was to create a periodic checkpoint (a new fsimage file) of the NameNode metadata and upload it back to the NameNode. Without checkpointing, the NameNode's edit log would grow continuously. A very large edit log is problematic, because it slows down NameNode restarts. Replaying a large edit log is much more time-consuming than loading a recent metadata checkpoint and applying a small edit log on top of it.

With an HA deployment, the standby NameNode in the pair takes over the responsibility of periodic checkpointing previously performed by the SecondaryNameNode. Therefore, it is unnecessary (and invalid) to run a SecondaryNameNode. If you choose not to deploy with HA for some reason, then the SecondaryNameNode is recommended so that you get periodic checkpoints.

There is more discussion of this in the Apache documentation on NameNode HA using Quorum Journal Manager, particularly the section on hardware selection.

http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.htm...

View solution in original post

5 REPLIES 5

avatar

A HA namenode is not mandatory but highly encouraged. You cannot perform rolling upgrades without a HA namenode configured.

avatar
Rising Star

Secondary NameNode is mandatory in every Hadoop installation. HA NameNode is higly suggested, but is something different from Seconday NameNode. Since Hadoop v2 release you can easily deploy a HA config of the NameNode with an Active and Standby NameNode (requires ZooKeeper). If you've a really small lab cluster you can avoid the HA NameNode config.

avatar

for really small test environments, you can disable secondary namenode, our sandbox does not have secondary namenode running.

avatar

If you are running an HA NameNode using Quorum Journal Manager, then running the SecondaryNameNode is not required. Actually, it would be incorrect to deploy a SecondaryNameNode alongside an HA NameNode pair.

Before implementation of HA with Quorum Journal Manager, the function of the SecondaryNameNode was to create a periodic checkpoint (a new fsimage file) of the NameNode metadata and upload it back to the NameNode. Without checkpointing, the NameNode's edit log would grow continuously. A very large edit log is problematic, because it slows down NameNode restarts. Replaying a large edit log is much more time-consuming than loading a recent metadata checkpoint and applying a small edit log on top of it.

With an HA deployment, the standby NameNode in the pair takes over the responsibility of periodic checkpointing previously performed by the SecondaryNameNode. Therefore, it is unnecessary (and invalid) to run a SecondaryNameNode. If you choose not to deploy with HA for some reason, then the SecondaryNameNode is recommended so that you get periodic checkpoints.

There is more discussion of this in the Apache documentation on NameNode HA using Quorum Journal Manager, particularly the section on hardware selection.

http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.htm...

avatar
New Contributor

Thank you all for your answers.