Support Questions

Find answers, ask questions, and share your expertise

Could not determine the age of the last HDFS checkpoint while shutting down services using Ambari

avatar
Contributor

Could not determine the age of the last HDFS checkpoint. Please ensure that you have a recent checkpoint. Otherwise, the NameNode(s) can take a very long time to start up.

1 ACCEPTED SOLUTION

avatar
Master Guru

I saw this happening on a relatively idle cluster. You can create a checkpoint manually, I think the instructions are given on the dialog showing the warning, but here they are: Login to the active Namenode and run

su - hdfs
hdfs dfsadmin -safemode enter
hdfs dfsadmin -saveNamespace
hdfs dfsadmin -safemode leave

View solution in original post

3 REPLIES 3

avatar
Master Guru

I saw this happening on a relatively idle cluster. You can create a checkpoint manually, I think the instructions are given on the dialog showing the warning, but here they are: Login to the active Namenode and run

su - hdfs
hdfs dfsadmin -safemode enter
hdfs dfsadmin -saveNamespace
hdfs dfsadmin -safemode leave

avatar

HDFS metadata consists of two parts:

  • Base filesystem table (stored in a file called fsimage)
  • The edit logs which lists changes made to the base table stored in files called edits.

Checkpointing is a process of reconciling fsimage with edits to produce a new version of fsimage. There are two benefits arising out of this:

  • A more recent version of fsimage, and a truncated edit logs.

The following properties can help to set how often Checkpointing happens:

  • dfs.namenode.checkpoint.period - Controls how often this reconciliation will be triggered. The number of seconds between two periodic checkpoints; fsimage will be updated and edit log truncated. Checkpiont is not cheap, so there is a balance between running it too often and letting the edit log grow too large. This parameter should be set to get a good balance assuming typical filesystem use in your cluster.
  • dfs.namenode.checkpoint.edits.dir - Determines where on the local filesystem the DFS secondary name node should store the temporary edits to merge. If this is a comma-delimited list of directories then the edits is replicated in all of the directories for redundancy. Default value is same as dfs.namenode.checkpoint.dir
  • dfs.namenode.checkpoint.txns - The Secondary NameNode or CheckpointNode will create a checkpoint of the namespace every 'dfs.namenode.checkpoint.txns' transactions, regardless of whether 'dfs.namenode.checkpoint.period' has expired.
  • dfs.ha.standby.checkpoints - If true, a NameNode in Standby state periodically takes a checkpoint of the namespace, saves it to its local storage and then upload to the remote NameNode.

Also, if you would like to manually checkpoint you can follow:

https://community.hortonworks.com/content/supportkb/49438/how-to-manually-checkpoint.html

avatar
Super Collaborator
@Sachin Ambardekar

From HDFS perspective, in some rare circumstances it was noticed that secondary (or standby) namenode fails to consume edit log. This results in more complicated situations if active namenode is restarted meanwhile (unconsumed edit logs will have to be ignored). The simpler solution to handle such scenario more gracefully is to always make sure that fsimage is updated before stopping namenode.

So as precautionary measure work was done in Ambari to check and warn user if user tries to stop NameNode that has a checkpoint older than 12 hours. [1]

HDFS-3.0.0.0 has implemented this check natively and going forward Ambari might skip this warning. [2]

Following Jira's and their description are used as references for this answer:

[1] https://issues.apache.org/jira/browse/AMBARI-12951

[2] https://issues.apache.org/jira/browse/HDFS-6353