Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Planning hardware for NameNode/Active/Secondary namenode for HA of Namenode

avatar
Expert Contributor

Hi Team,

My hadoop namenode servers are without HBA and but servers are RAID 10

so do i need NFS point to save namenode metada file edits etc on NFS location as well if i have active namenode as well in cluster

also my question is if my hardware is without HBA storage and RAID 10 so can i connect to NFS point from such hardware ?

basically what are the recommendations for namenode HA ?

1 ACCEPTED SOLUTION

avatar

Hi @ripunjay godhani, we no longer recommend setting up NameNode HA with NFS. Instead please use the Quorum Journal Manager setup. The Apache HA with QJM documentation is a good start: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.ht...

NameNode image files will be stored on two nodes (active and standby NN) in this setup. The latest edit logs will be on the active NameNode and at least two journal nodes (usually all three, unless one Journal Node has an extended downtime). The NameNodes can optionally be configured to write their edit logs to separate NFS shares if you really want but it is not necessary.

You don't need RAID 10. HDFS HA with QJM provides good durability and availability with commodity hardware.

View solution in original post

2 REPLIES 2

avatar

As always, your specific situation, hardware, risk profile, etc are unique to others, but let's revisit a couple of things first. HDFS metadata is physically persisted in "image" and "edits" files. In a HA configuration, the two NN processes are the ones who write out the image files and the JournalNodes (JN) are the processes that persist the edits files. Even without any kind of HBA/RAID configuration (not a bad place to be, but they aren't necessarily bad/wrong either) we get some pretty good spread of recording this information in multiple places (2x copies of image files and 3+x copies of the edits files).

The historical rule of thumb was to make sure the NN data (especially before HA) was to write to two local disk and one soft-mounted NFS directory as we simply did not want to ever have a "bunker scene". I, again my strong personal believe, would still suggest that you record to at least two local disks (maybe you decide your RAID approach satisfies this) as well as NFS directory. I'd even follow this along with periodic backups of both the image and edits data files. My thinking on this topic is captured in a blog posting at https://martin.atlassian.net/wiki/x/EoC3Ag. Good luck!

avatar

Hi @ripunjay godhani, we no longer recommend setting up NameNode HA with NFS. Instead please use the Quorum Journal Manager setup. The Apache HA with QJM documentation is a good start: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.ht...

NameNode image files will be stored on two nodes (active and standby NN) in this setup. The latest edit logs will be on the active NameNode and at least two journal nodes (usually all three, unless one Journal Node has an extended downtime). The NameNodes can optionally be configured to write their edit logs to separate NFS shares if you really want but it is not necessary.

You don't need RAID 10. HDFS HA with QJM provides good durability and availability with commodity hardware.