Support Questions

Find answers, ask questions, and share your expertise

secondary NN failing to checkpoint

Explorer

The filesystem that is used for the secondary namenode filled up. I cleaned up the filesystem, but since then it has failed to do any checkpoints. It looks like its trying over and over again and failing, but I don't know why. Here are the logs:

2017-07-31 16:23:34,711 ERROR namenode.SecondaryNameNode (SecondaryNameNode.java:doWork(392)) - Exception in doCheckpoint java.io.IOException: Unable to download to any storage directory at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.receiveFile(TransferFsImage.java:472) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.doGetUrl(TransferFsImage.java:398) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:362) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.downloadEditsToStorage(TransferFsImage.java:166) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$2.run(SecondaryNameNode.java:458) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$2.run(SecondaryNameNode.java:437) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.downloadCheckpointFiles(SecondaryNameNode.java:436) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:532) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:388) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$1.run(SecondaryNameNode.java:354) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:415) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:350) at java.lang.Thread.run(Thread.java:744)

2 REPLIES 2

Mentor

@Alex Eifler

You can manually force a check point see doc . Is it linked to the secondary namenode's local storage problem. Can you check value of dfs.namenode.checkpoint.dir and see if any issues like RO mount or storage full or bad disk maybe?

As the hdfs user.

hadoop secondarynamenode -checkpoint force

Explorer

Check your mounted disk status whether or not to write on disk.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.