Support Questions

Find answers, ask questions, and share your expertise

Unable to restrat standby Namenode

avatar
Expert Contributor

Both Namenode are crashed (Active & Standby). I restarted the Active and it is serving. But we are unable to restart the standby NN. I tried to manually restart it but still it is failed. How do I recover and restart the standby Namenode.

Version: HDP 2.2

2016-05-20 18:53:57,954 INFO namenode.EditLogInputStream (RedundantEditLogInputStream.java:nextOp(176)) - Fast-forwarding stream 'http://usw2stdpma01.glassdoor.local:8480/getJournal?jid=dfs-nameservices&segmentTxId=14726901&storageInfo=-60%3A761966699%3A0%3ACID-d16e0895-7c12-404e-9223-952d1b19ace0' to transaction ID 13013207
2016-05-20 18:53:58,216 WARN namenode.FSNamesystem (FSNamesystem.java:loadFromDisk(750)) - Encountered exception loading fsimage
java.io.IOException: There appears to be a gap in the edit log. We expected txid 13013207, but got txid 14726901.
at org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:94)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:212)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:140)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:829)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:684)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:281)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1032)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:748)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:538)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:597)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:764)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:748)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1441)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1507)



2016-05-20 18:53:58,322 FATAL namenode.NameNode (NameNode.java:main(1512)) - Failed to start namenode.
java.io.IOException: There appears to be a gap in the edit log. We expected txid 13013207, but got txid 14726901.
at org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:94)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:212)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:140)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:829)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:684)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:281)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1032)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:748)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:538)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:597)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:764)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:748)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1441)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1507)
2016-05-20 18:53:58,324 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 1
2016-05-20 18:53:58,325 INFO namenode.NameNode (StringUtils.java:run(659)) - SHUTDOWN_MSG
1 ACCEPTED SOLUTION

avatar
Master Guru

@Anandha L Ranganathan

Please run below commands by root user.

1. Put Active NN in safemode

sudo -u hdfs hdfs dfsadmin -safemode enter

2. Do a savenamespace operation on Active NN

sudo -u hdfs hdfs dfsadmin -saveNamespace

3. Leave Safemode

sudo -u hdfs hdfs dfsadmin -safemode leave

4. Login to Standby NN

5. Run below command on Standby namenode to get latest fsimage that we saved in above steps.

sudo -u hdfs hdfs namenode -bootstrapStandby -force

View solution in original post

10 REPLIES 10

avatar
New Contributor

We just ran into this problem. @Jeff Arnold above is correct that since the standby namenode is down the dfsadmin commands will fail. Instead of the doing the /etc/hosts file change he recommends you can manually override the -fs in the commands suggested here https://issues-test.apache.org/jira/browse/HDFS-8277?focusedCommentId=14517247&page=com.atlassian.ji....

The dfsadmin commands change to this for example

sudo -u hdfs hdfs dfsadmin -fs hdfs://<active_namenode>:<rpc_port> -safemode enter

Also if you are using Cloudera Manager the config that gets used by "namenode -bootstrapStandby" command does not include the necessary config for the journal nodes for shared edits. You will need to copy the running config from the running active namenode. It will be under something like /run/cloudera-scm-agent/process/5134-hdfs-NAMENODE. Copy that to the standby namenode and set the bootstrap command to use it.

sudo -i -u hdfs
HADOOP_CONF_DIR=<your_copied_config> hdfs namenode -bootstrapStandby -force