Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

On start up both the Name nodes going to Stand by mode then both the NN1 and NN2 are going down automatically.

Highlighted

On start up both the Name nodes going to Stand by mode then both the NN1 and NN2 are going down automatically.

New Contributor

I have Four node cluster of hadoop, While starting the Name Node(Active) and Name Node(Stand by) both the name nodes are getting started and then automatically going to stand by mode then both the NN1 and NN2 are going down automatically.

namenode.png

7 REPLIES 7

Re: On start up both the Name nodes going to Stand by mode then both the NN1 and NN2 are going down automatically.

New Contributor
@Jay SenSharma

Can you please help me out again. Thanks in advance.

Re: On start up both the Name nodes going to Stand by mode then both the NN1 and NN2 are going down automatically.

Super Mentor

@kotesh banoth

Can you please check the NameNdoe log to see if there is any issue 9Error/Exception)

Please share the logs

# ls -l /var/log/hadoop/hdfs/hadoop-hdfs-namenode-xxxxxx.log
# ls -l /var/log/hadoop/hdfs/hadoop-hdfs-namenode-xxxxxx.out

.

Also please check if those hosts have enough memory

# free -m

.

Re: On start up both the Name nodes going to Stand by mode then both the NN1 and NN2 are going down automatically.

New Contributor

Re: On start up both the Name nodes going to Stand by mode then both the NN1 and NN2 are going down automatically.

Super Mentor

@kotesh banoth

May be you can try the following, On Journal Nodes, Move the "edits_inprogress" logs to some backup directory and then try starting the Journal Nodes again.

Example Location:

# mv  /hadoop/hdfs/journal/$NAME_SERVICE/current/edits_inprogress_0000000000000306232  /BackupDir
# mv  /hadoop/hdfs/journal/$NAME_SERVICE/current/edits_inprogress_0000000000000307748  /BackupDir

.

Then start the Journal Nodes and then NameNodes.

Re: On start up both the Name nodes going to Stand by mode then both the NN1 and NN2 are going down automatically.

Super Mentor

@kotesh banoth

We see the error:

2017-08-25 16:05:09,392 FATAL namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(398)) - Error: recoverUnfinalizedSegments failed for required journal (JournalAndStream(mgr=QJM to [172.16.0.76:8485, 172.16.0.77:8485, 172.16.0.75:8485], stream=null)) java.io.IOException: Timed out waiting 120000ms for a quorum of nodes to respond.
  at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)
  at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createNewUniqueEpoch(QuorumJournalManager.java:182)
  at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.recoverUnfinalizedSegments(QuorumJournalManager.java:436)
  at org.apache.hadoop.hdfs.server.namenode.JournalSet$8.apply(JournalSet.java:624)


Usually this happens when there are corrupt edits file in one of the Journal nodes.

In such cases we can check the journal node logs and find out which one has corrupt data. And then copy the journal edits dir from good journal node to other nodes.


(Please make sure that you keep a backup of the dirs for safety)

.

Re: On start up both the Name nodes going to Stand by mode then both the NN1 and NN2 are going down automatically.

New Contributor

@Jay SenSharma I have three JN and how can i identify which one is healthy and what to do if all the JN are corrupted.

server-jn6.txt server-jn7.txt server-jn8.txt

Re: On start up both the Name nodes going to Stand by mode then both the NN1 and NN2 are going down automatically.

New Contributor

Thanks so much @Jay SenSharma after moving the edits_inprogress to backupdir...

Then i started as mentioned

Don't have an account?
Coming from Hortonworks? Activate your account here