Created 08-25-2017 10:53 AM
I have Four node cluster of hadoop, While starting the Name Node(Active) and Name Node(Stand by) both the name nodes are getting started and then automatically going to stand by mode then both the NN1 and NN2 are going down automatically.
Created 08-25-2017 10:56 AM
Can you please help me out again. Thanks in advance.
Created 08-25-2017 10:59 AM
Can you please check the NameNdoe log to see if there is any issue 9Error/Exception)
Please share the logs
# ls -l /var/log/hadoop/hdfs/hadoop-hdfs-namenode-xxxxxx.log # ls -l /var/log/hadoop/hdfs/hadoop-hdfs-namenode-xxxxxx.out
.
Also please check if those hosts have enough memory
# free -m
.
Created on 08-25-2017 11:44 AM - edited 08-17-2019 06:09 PM
Created 08-25-2017 12:39 PM
May be you can try the following, On Journal Nodes, Move the "edits_inprogress" logs to some backup directory and then try starting the Journal Nodes again.
Example Location:
# mv /hadoop/hdfs/journal/$NAME_SERVICE/current/edits_inprogress_0000000000000306232 /BackupDir # mv /hadoop/hdfs/journal/$NAME_SERVICE/current/edits_inprogress_0000000000000307748 /BackupDir
.
Created 08-25-2017 11:52 AM
We see the error:
2017-08-25 16:05:09,392 FATAL namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(398)) - Error: recoverUnfinalizedSegments failed for required journal (JournalAndStream(mgr=QJM to [172.16.0.76:8485, 172.16.0.77:8485, 172.16.0.75:8485], stream=null)) java.io.IOException: Timed out waiting 120000ms for a quorum of nodes to respond. at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137) at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createNewUniqueEpoch(QuorumJournalManager.java:182) at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.recoverUnfinalizedSegments(QuorumJournalManager.java:436) at org.apache.hadoop.hdfs.server.namenode.JournalSet$8.apply(JournalSet.java:624)
Usually this happens when there are corrupt edits file in one of the Journal nodes.
In such cases we can check the journal node logs and find out which one has corrupt data. And then copy the journal edits dir from good journal node to other nodes.
(Please make sure that you keep a backup of the dirs for safety)
.
Created 08-25-2017 12:11 PM
@Jay SenSharma I have three JN and how can i identify which one is healthy and what to do if all the JN are corrupted.
Created 08-25-2017 01:17 PM
Thanks so much @Jay SenSharma after moving the edits_inprogress to backupdir...
Then i started as mentioned