Support Questions

Find answers, ask questions, and share your expertise

Possible scenarios for 'online' backups of HDFS metadata?

avatar
Contributor

I'm looking into the possibility of performing an 'online' backup of HDFS metadata without having to take down HDFS or NameNodes and wanted to find out if the following plan is doable or not.

General assumptions:

  • Regardless of the solution, we'll never have a full, up-to-date continuous backup of the namespace – we’ll always loose some of the most recent data. It’s not an OLTP system, most of the data can be easily recreated (re-run ETL or processing jobs).
  • Normal NN failures are handled by the Standby NN. The goal here is to have a procedure in place for a very unlikely case where both master nodes fail.
  • In the case of both NN failures, the NN service can be started up with the most recent image of the namespace we have.

The understanding of how the Name Nodes maintain the namespace, in short, is:

  • Standby NN keeps a namespace image in memory based on edits available in a storage ensemble in Journal Nodes.
  • Based on pre-conditions (No of transactions or period), standby NN makes a namespace checkpoint and saves a “fsimage_*” to disk.
  • Standby NN transfers the fsimage to the primary NN over http.

The understanding is that both NN write fsimages to disk in the following sequence:

  • NN writes the namespace to a file “fsimage.ckpt_*” on disk
  • NN creates a “fsimage_*.md5” file
  • NN moves the file “fsimage.ckpt_*” to “fsimage_.*”

The above means that:

  • The most recent namespace image on disk in in a “fsimage_*” file is on the Standby NN.
  • Any “fsimage_*” file on disk is finalized and won’t receive more updates.

Based on the above, a proposed, simple procedure what won’t affect the availability of NN is as follows:

  • Make sure the Standby NN checkpoints the namespace to “fsimage_” once per hour.
  • Backup the most recent “fsimage_*” and “fsimage_*.md5” from the Standby NN periodically. We can try to keep the latest version of the file on another machine in the cluster.

Are there any issues or potential pitfalls with this approach that anyone can see?

1 ACCEPTED SOLUTION

avatar

The proposal looks basically sound. Here are a few other factors to consider.

  1. In addition to the files you mentioned, there is also a file named VERSION. This file is important, because it uniquely identifies the cluster and declares the version of the metadata format stored on disk. Without this file, it is impossible to restart the NameNode, so plan on including it in your backup strategy.
  2. Deploy monitoring on both NameNodes to confirm that checkpoints are triggering regularly. This helps reduce the amount of missing transactions in the event that you need to restore from a backup containing only fsimage files without subsequent edit logs. It's good practice to monitor this anyway, because huge uncheckpointed edit logs can cause long delays after a NameNode restart while it replays those transactions.
  3. For some additional background, here is a blog post I wrote a while ago explaining the HDFS metadata directories in more detail. http://hortonworks.com/blog/hdfs-metadata-directories-explained/

View solution in original post

4 REPLIES 4

avatar

The proposal looks basically sound. Here are a few other factors to consider.

  1. In addition to the files you mentioned, there is also a file named VERSION. This file is important, because it uniquely identifies the cluster and declares the version of the metadata format stored on disk. Without this file, it is impossible to restart the NameNode, so plan on including it in your backup strategy.
  2. Deploy monitoring on both NameNodes to confirm that checkpoints are triggering regularly. This helps reduce the amount of missing transactions in the event that you need to restore from a backup containing only fsimage files without subsequent edit logs. It's good practice to monitor this anyway, because huge uncheckpointed edit logs can cause long delays after a NameNode restart while it replays those transactions.
  3. For some additional background, here is a blog post I wrote a while ago explaining the HDFS metadata directories in more detail. http://hortonworks.com/blog/hdfs-metadata-directories-explained/

avatar
Expert Contributor

You do realize that what you are trying to do is a poor man's HA. I understand that you might have business continuity requirements and cannot bring down Namenode, but just wanted to flag it.

avatar
Rising Star

@Kent Baxley: We have incorporated this and related into into a new chapter in the HDFS Administration Guide, called Backing Up HDFS Metadata. See http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_hdfs-administration/content/back_up_hdfs_...

avatar

Hi @Kent Baxley, Looks like the doc is missing the plan for backing up the VERSION file.