Support Questions

Find answers, ask questions, and share your expertise

On start up both the Name nodes going to Stand by mode then both the NN1 and NN2 are going down automatically.

I have Four node cluster of hadoop, While starting the Name Node(Active) and Name Node(Stand by) both the name nodes are getting started and then automatically going to stand by mode then both the NN1 and NN2 are going down automatically.



@Jay SenSharma

Can you please help me out again. Thanks in advance.

Super Mentor

@kotesh banoth

Can you please check the NameNdoe log to see if there is any issue 9Error/Exception)

Please share the logs

# ls -l /var/log/hadoop/hdfs/hadoop-hdfs-namenode-xxxxxx.log
# ls -l /var/log/hadoop/hdfs/hadoop-hdfs-namenode-xxxxxx.out


Also please check if those hosts have enough memory

# free -m


@Jay SenSharma please find the logs and screen shot of free -m





Super Mentor

@kotesh banoth

May be you can try the following, On Journal Nodes, Move the "edits_inprogress" logs to some backup directory and then try starting the Journal Nodes again.

Example Location:

# mv  /hadoop/hdfs/journal/$NAME_SERVICE/current/edits_inprogress_0000000000000306232  /BackupDir
# mv  /hadoop/hdfs/journal/$NAME_SERVICE/current/edits_inprogress_0000000000000307748  /BackupDir


Then start the Journal Nodes and then NameNodes.

Super Mentor

@kotesh banoth

We see the error:

2017-08-25 16:05:09,392 FATAL namenode.FSEditLog ( - Error: recoverUnfinalizedSegments failed for required journal (JournalAndStream(mgr=QJM to [,,], stream=null)) Timed out waiting 120000ms for a quorum of nodes to respond.
  at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(
  at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createNewUniqueEpoch(
  at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.recoverUnfinalizedSegments(
  at org.apache.hadoop.hdfs.server.namenode.JournalSet$8.apply(

Usually this happens when there are corrupt edits file in one of the Journal nodes.

In such cases we can check the journal node logs and find out which one has corrupt data. And then copy the journal edits dir from good journal node to other nodes.

(Please make sure that you keep a backup of the dirs for safety)


@Jay SenSharma I have three JN and how can i identify which one is healthy and what to do if all the JN are corrupted.

server-jn6.txt server-jn7.txt server-jn8.txt

Thanks so much @Jay SenSharma after moving the edits_inprogress to backupdir...

Then i started as mentioned