cluster include management of two name-node ( one is active and the secondary is standby )
and 65 datanode machines
we have problem with the standby name-node that not started and from the namenode logs we can see the following
2021-01-01 15:19:43,269 ERROR namenode.NameNode (NameNode.java:main(1783)) - Failed to start namenode. java.io.IOException: There appears to be a gap in the edit log. We expected txid 90247527115, but got txid 90247903412. at org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:94) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:215)
for now the active namenode is up but the standby name node is down
java.io.IOException: There appears to be a gap in the edit log. We expected txid 90247527115, but got txid 90247903412.
what is the preferred solution to fix this problem?
@mike_bronson7 There is a solution which can help. Run the following command on the Standby NameNode: # su hdfs -l -c 'hdfs namenode -recover'
Following message can be seen:You have selected Metadata Recovery mode.Thismode isintended torecoverlost metadata onacorrupt filesystem.Metadata recovery mode oftenpermanently deletes data from your HDFS filesystem.Please back up your editlog andfsimage before trying this!Are you ready toproceed?(Y/N)(YorN) To proceed further, select option "yes", the recovery process will read as much of the edit log as possible. When there is an error or an ambiguity, it will prompt how to proceed. There will be further options prompted as Continue, Stop, Quit, and Always.
Mostly the data loss ( due to transaction skip/miss ) is possible when using this method. This method is therefore not to be used, if data/transaction losses has to be avoided.
Cheers! Was your question answered? Make sure to mark the answer as the accepted solution. If you find a reply useful, say thanks by clicking on the thumbs up button.