- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
journal node edit log issue
- Labels:
-
Apache Hadoop
Created 10-19-2016 12:49 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
journal node is logging below WARN in the logs and ambari is alerting about journal web ui is not accessible. any idea how to recover from this ?
2016-10-19 12:36:20,353 WARN namenode.FSImage (EditLogFileInputStream.java:scanEditLog(359)) - Caught exception after scanning through 0 ops from /hadoop/hdfs/journal/stanleyhotel/current/edits_inprogress_0000000000064985103 while determining its valid length. Position was 888832 java.io.IOException: Can't scan a pre-transactional edit log. at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$LegacyReader.scanOp(FSEditLogOp.java:4959) at org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.scanNextOp(EditLogFileInputStream.java:245) at org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.scanEditLog(EditLogFileInputStream.java:355) at org.apache.hadoop.hdfs.server.namenode.FileJournalManager$EditLogFile.scanLog(FileJournalManager.java:551) at org.apache.hadoop.hdfs.qjournal.server.Journal.scanStorageForLatestEdits(Journal.java:192) at org.apache.hadoop.hdfs.qjournal.server.Journal.<init>(Journal.java:152) at org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:90) at org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:99)^C at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.heartbeat(JournalNodeRpcServer.java:158) at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.heartbeat(QJournalProtocolServerSideTranslatorPB.java:172) at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25423) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307) 2016-10-19 12:36:20,353 WARN namenode.FSImage (EditLogFileInputStream.java:scanEditLog(364)) - After resync, position is 888832
Created 10-19-2016 10:22 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Assuming that this is happening on a single JournalNode then you can try the following:
- As a precaution, stop HDFS. This will shut down all Journalnodes as well.
- On the node in question, move the fsimage edits directory (/hadoop/hdfs/journal/stanleyhotel/current) to an alternate location.
- Copy the fsimage edits directory (/hadoop/hdfs/journal/stanleyhotel/current) from a functioning JournalNode to this node.
- Start HDFS.
This should get this Journalnode back inline with the others and get you back to a properly functioning HA state.
Created 10-19-2016 10:22 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Assuming that this is happening on a single JournalNode then you can try the following:
- As a precaution, stop HDFS. This will shut down all Journalnodes as well.
- On the node in question, move the fsimage edits directory (/hadoop/hdfs/journal/stanleyhotel/current) to an alternate location.
- Copy the fsimage edits directory (/hadoop/hdfs/journal/stanleyhotel/current) from a functioning JournalNode to this node.
- Start HDFS.
This should get this Journalnode back inline with the others and get you back to a properly functioning HA state.
Created 10-25-2016 04:49 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Brandon Wilson Thanks it resolved the problem
Created 02-02-2017 08:59 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Your solution works perfectly but only if "edits_inprogress_" file has the same name on both JournalNodes (JN).
In case of my devcluster, I was not engaged in the problem of two months. During this time, a healthy JN has created a new "edits_inprogress_" file, but the sick JN still asks the old "edits_inprogress_" file. I did all 4 steps of your algorithm, but sick JN again asks old file. The content of /hadoop/hdfs/journal/devcluster/current is the same on both nodes.
What to do?
Log of healthy JN (edits_inprogress_0000000000016172345)
2017-02-02 10:15:12,513 INFO namenode.FileJournalManager (FileJournalManager.java:finalizeLogSegment(133)) - Finalizing edits file /hadoop/hdfs/journal/devcluster/current/edits_inprogress_0000000000016172345 -> /hadoop/hdfs/journal/devcluster/current/edits_0000000000016172345-0000000000016172394
Log of sick JN (edits_inprogress_0000000000011766543)
2017-02-02 10:15:57,744 WARN namenode.FSImage (EditLogFileInputStream.java:scanEditLog(350)) - Caught exception after scanning through 0 ops from /hadoop/hdfs/journal/devcluster/current/edits_inprogress_0000000000011766543 while determining its valid length. Position was 1036288 java.io.IOException: Can't scan a pre-transactional edit log.
Created 02-03-2017 07:14 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Solved it! Sick JN didn't stop when I stopped it in Ambari and even when I stop HDFS in Ambari. I killed the JN process manually, replaced the data from healthy JN and run HDFS. Now it works! 🙂
Created 06-21-2018 11:16 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks @Brandon Wilson it worked for me too.
