Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Namenode exception in doCheckpoint - NullPointerException and IOException

Namenode exception in doCheckpoint - NullPointerException and IOException

Explorer

We have been seeing errors consistently in the NN logs related to checkpointing.  Our NNs are not able to automatically perform a checkpoint - the only way is for us to put them in Safe Mode and manually run a Save Namespace command.  We see these errors over and over in the logs:

 

Exception in doCheckpoint
java.io.IOException: Exception during image upload: org.apache.hadoop.hdfs.server.namenode.TransferFsImage$HttpPutFailedException: org.apache.hadoop.security.authentication.util.SignerException: Invalid signature
	at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.doCheckpoint(StandbyCheckpointer.java:221)
	at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.access$1400(StandbyCheckpointer.java:62)
	at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.doWork(StandbyCheckpointer.java:353)
	at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.access$700(StandbyCheckpointer.java:260)
	at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread$1.run(StandbyCheckpointer.java:280)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:360)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1651)
	at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:410)
	at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.run(StandbyCheckpointer.java:276)
Caused by: org.apache.hadoop.hdfs.server.namenode.TransferFsImage$HttpPutFailedException: org.apache.hadoop.security.authentication.util.SignerException: Invalid signature
	at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:294)
	at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:222)
	at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:207)
	at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:204)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

 

Exception in doCheckpoint
java.lang.NullPointerException
	at org.apache.hadoop.io.Text.encode(Text.java:450)
	at org.apache.hadoop.io.Text.encode(Text.java:431)
	at org.apache.hadoop.io.Text.writeString(Text.java:491)
	at org.apache.hadoop.fs.permission.PermissionStatus.write(PermissionStatus.java:117)
	at org.apache.hadoop.hdfs.server.namenode.FSImageSerialization.writePermissionStatus(FSImageSerialization.java:99)
	at org.apache.hadoop.hdfs.server.namenode.FSImageSerialization.writeINodeFileAttributes(FSImageSerialization.java:216)
	at org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiff.write(FileDiff.java:81)
	at org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotFSImageFormat.saveINodeDiffs(SnapshotFSImageFormat.java:89)
	at org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotFSImageFormat.saveFileDiffList(SnapshotFSImageFormat.java:102)
	at org.apache.hadoop.hdfs.server.namenode.FSImageSerialization.writeINodeFile(FSImageSerialization.java:196)
	at org.apache.hadoop.hdfs.server.namenode.FSImageSerialization.saveINode2Image(FSImageSerialization.java:332)
	at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Saver.saveINode2Image(FSImageFormat.java:1433)
	at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Saver.saveChildren(FSImageFormat.java:1335)
	at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Saver.saveImage(FSImageFormat.java:1393)
	at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Saver.saveImage(FSImageFormat.java:1408)
	at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Saver.saveImage(FSImageFormat.java:1408)
	at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Saver.saveImage(FSImageFormat.java:1408)
	at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Saver.saveImage(FSImageFormat.java:1408)
	at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Saver.saveImage(FSImageFormat.java:1408)
	at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Saver.save(FSImageFormat.java:1279)
	at org.apache.hadoop.hdfs.server.namenode.FSImage.saveLegacyOIVImage(FSImage.java:973)
	at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.doCheckpoint(StandbyCheckpointer.java:193)
	at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.access$1400(StandbyCheckpointer.java:62)
	at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.doWork(StandbyCheckpointer.java:353)
	at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.access$700(StandbyCheckpointer.java:260)
	at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread$1.run(StandbyCheckpointer.java:280)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:360)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1651)
	at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:410)
	at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.run(StandbyCheckpointer.java:276)

Has anyone seen this or found a solution for it?

 

 

We are running CM 5.4.7 and CDH 5.4.0

5 REPLIES 5

Re: Namenode exception in doCheckpoint - NullPointerException and IOException

Community Manager
Are you getting a timeout in Cloudera Manager when running the Save
Namespace command? It may be the command is taking too longer to complete
than the timeout in Cloudera Manager.

The manual workaround is to perform the following:


1) Shut down the standby. Shut down the failover controller on the standby
node.
2) Run the following from the command line:

sudo -u hdfs hdfs dfsadmin -safemode enter
sudo -u hdfs hdfs dfsadmin -saveNamespace
sudo -u hdfs dfsadmin -safemode leave

3) Start the standby, and the failover controller.




David Wilder, Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

Terms of Service

Community Guidelines

How to use the forum

Re: Namenode exception in doCheckpoint - NullPointerException and IOException

Community Manager

Tyler,

 

During normal operation, every hour the Standby NameNode will send an http (or https) ping to the Active NameNode to let it know a new checkpoint is ready.  The Active NameNode will make an http (or https) request back to the Standby and download the checkpoint file.

 

From you stack trace it appears there is an issue in this communication flow.



David Wilder, Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

Terms of Service

Community Guidelines

How to use the forum

Re: Namenode exception in doCheckpoint - NullPointerException and IOException

Explorer

Thanks, David.

 

It turns out the fix for the error we were seeing wasn't included in the version of CDH we are running.  Once we upgrade to this version, we should no longer see this issue.

Re: Namenode exception in doCheckpoint - NullPointerException and IOException

New Contributor

Hi Tyler,

 

Could you plese indicate to what CDH version did you upgrade to have this issue fixed?

 

Thanks and regards,

 

Javier.

Re: Namenode exception in doCheckpoint - NullPointerException and IOException

Explorer

@Javier - I don't know the exact version this was released in, but I think the JIRA that we were hitting was HDFS-7798