Cluster include management of two name-node ( one is active and the secondary is standby )
And 65 datanode machines
We have problem with the standby name-node that not started and from the namenode logs we Can see the following
2021-01-01 15:19:43,269 ERROR namenode.NameNode (NameNode.java:main(1783)) - Failed to start namenode. java.io.IOException: There appears to be a gap in the edit log. We expected txid 90247527115, but got txid 90247903412.
from ambari we can see
For now the active namenode is up but the standby name node is down , and the root cause for This issue is because **namenode matadata is damaged/corrupted.**
So we have two solution - A or B
run the following recover on standby namenode
su hadoop namenode -recover
Put Active NN in safemode
su hdfs hdfs dfsadmin -safemode enter
Do a savenamespace operation on Active NN
su hdfs hdfs dfsadmin -saveNamespace
su hdfs hdfs dfsadmin -safemode leave
Login to Standby NN
Run below command on Standby namenode to get latest fsimage that we saved in above steps.
su hdfs hdfs namenode -bootstrapStandby -force
what is the preferred solution ( solution A or Solution B ) for our problem?