Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Who agreed with this topic

NN stopped and cannot recover with error "There appears to be a gap in the edit log"

New Contributor

Hi there,

 

I deployed a single node for CDH and CM for testing, however once I added some services like Hub, the NN stopped and cannot start it with eror: There appears to be a gap in the edit log.

 

2013-11-14 15:00:01,431 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system shutdown complete.
2013-11-14 15:00:01,432 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join
java.io.IOException: There appears to be a gap in the edit log.  We expected txid 8364, but got txid 27381.
	at org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:94)
	at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:158)
	at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:92)
	at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:744)
	at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:660)
	at org.apache.hadoop.hdfs.server.namenode.FSImage.doUpgrade(FSImage.java:349)
	at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:261)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:639)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:476)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:403)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:437)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:613)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:598)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1169)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1233)
2013-11-14 15:00:01,445 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2013-11-14 15:00:01,448 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ubcdh/10.0.0.4
************************************************************/

 

I tied to run "./bin/hadoop namenode -recover" however it was returned another error:

 

13/11/14 14:52:17 INFO hdfs.StateChange: STATE* Safe mode is ON.
Use "hdfs dfsadmin -safemode leave" to turn safe mode off.

 However the command to leave safemode with error:

 

safemode: Call From ubcdh/10.0.0.4 to ubcdh:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused

 

 

All my services are running porperly and only one node configured, so I don't think the connection is in failure status.

 

How can I get this issue fixed?

Who agreed with this topic