Support Questions

mike_bronson7 · ‎01-20-2021

We have ambari cluster , HDP version 2.6.5

Cluster include management of two name-node ( one is active and the secondary is standby )

And 65 datanode machines

We have problem with the standby name-node that not started and from the namenode logs we Can see the following

2021-01-01 15:19:43,269 ERROR namenode.NameNode (NameNode.java:main(1783)) - Failed to start namenode.
java.io.IOException: There appears to be a gap in the edit log. We expected txid 90247527115, but got txid 90247903412.

from ambari we can see

For now the active namenode is up but the standby name node is down , and the root cause for This issue is because **namenode matadata is damaged/corrupted.**

So we have two solution - A or B

A)

run the following recover on standby namenode

su
hadoop namenode -recover

B)

Put Active NN in safemode

su hdfs
hdfs dfsadmin -safemode enter

Do a savenamespace operation on Active NN

su hdfs
hdfs dfsadmin -saveNamespace

Leave Safemode

su hdfs
hdfs dfsadmin -safemode leave

Login to Standby NN

Run below command on Standby namenode to get latest fsimage that we saved in above steps.

su hdfs
hdfs namenode -bootstrapStandby -force

what is the preferred solution ( solution A or Solution B ) for our problem?

Michael-Bronson

balajip · ‎01-22-2021

@mike_bronson7Please refer below KB article for this issue

https://my.cloudera.com/knowledge/StandBy-NameNode-fails-to-start-Error-shows--?id=271605

Cloudera Community

Support Questions

What is the preferred solution for corrupted namenode metadata