Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

journal node edit log issue

avatar

journal node is logging below WARN in the logs and ambari is alerting about journal web ui is not accessible. any idea how to recover from this ?

2016-10-19 12:36:20,353 WARN  namenode.FSImage (EditLogFileInputStream.java:scanEditLog(359)) - Caught exception after scanning through 0 ops from /hadoop/hdfs/journal/stanleyhotel/current/edits_inprogress_0000000000064985103 while determining its valid length. Position was 888832

java.io.IOException: Can't scan a pre-transactional edit log.

	at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$LegacyReader.scanOp(FSEditLogOp.java:4959)

	at org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.scanNextOp(EditLogFileInputStream.java:245)

	at org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.scanEditLog(EditLogFileInputStream.java:355)

	at org.apache.hadoop.hdfs.server.namenode.FileJournalManager$EditLogFile.scanLog(FileJournalManager.java:551)

	at org.apache.hadoop.hdfs.qjournal.server.Journal.scanStorageForLatestEdits(Journal.java:192)

	at org.apache.hadoop.hdfs.qjournal.server.Journal.<init>(Journal.java:152)

	at org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:90)

	at org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:99)^C

	at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.heartbeat(JournalNodeRpcServer.java:158)

	at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.heartbeat(QJournalProtocolServerSideTranslatorPB.java:172)

	at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25423)

	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)

	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)

	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313)

	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309)

	at java.security.AccessController.doPrivileged(Native Method)

	at javax.security.auth.Subject.doAs(Subject.java:422)

	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)

	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307)

2016-10-19 12:36:20,353 WARN  namenode.FSImage (EditLogFileInputStream.java:scanEditLog(364)) - After resync, position is 888832
1 ACCEPTED SOLUTION

avatar

Hi @Santhosh B Gowda,

Assuming that this is happening on a single JournalNode then you can try the following:

  1. As a precaution, stop HDFS. This will shut down all Journalnodes as well.
  2. On the node in question, move the fsimage edits directory (/hadoop/hdfs/journal/stanleyhotel/current) to an alternate location.
  3. Copy the fsimage edits directory (/hadoop/hdfs/journal/stanleyhotel/current) from a functioning JournalNode to this node.
  4. Start HDFS.

This should get this Journalnode back inline with the others and get you back to a properly functioning HA state.

View solution in original post

5 REPLIES 5

avatar

Hi @Santhosh B Gowda,

Assuming that this is happening on a single JournalNode then you can try the following:

  1. As a precaution, stop HDFS. This will shut down all Journalnodes as well.
  2. On the node in question, move the fsimage edits directory (/hadoop/hdfs/journal/stanleyhotel/current) to an alternate location.
  3. Copy the fsimage edits directory (/hadoop/hdfs/journal/stanleyhotel/current) from a functioning JournalNode to this node.
  4. Start HDFS.

This should get this Journalnode back inline with the others and get you back to a properly functioning HA state.

avatar

@Brandon Wilson Thanks it resolved the problem

avatar
Expert Contributor

Hi @Brandon Wilson

Your solution works perfectly but only if "edits_inprogress_" file has the same name on both JournalNodes (JN).

In case of my devcluster, I was not engaged in the problem of two months. During this time, a healthy JN has created a new "edits_inprogress_" file, but the sick JN still asks the old "edits_inprogress_" file. I did all 4 steps of your algorithm, but sick JN again asks old file. The content of /hadoop/hdfs/journal/devcluster/current is the same on both nodes.

What to do?

Log of healthy JN (edits_inprogress_0000000000016172345)

2017-02-02 10:15:12,513 INFO  namenode.FileJournalManager (FileJournalManager.java:finalizeLogSegment(133)) - Finalizing edits file /hadoop/hdfs/journal/devcluster/current/edits_inprogress_0000000000016172345 -> /hadoop/hdfs/journal/devcluster/current/edits_0000000000016172345-0000000000016172394

Log of sick JN (edits_inprogress_0000000000011766543)

2017-02-02 10:15:57,744 WARN  namenode.FSImage (EditLogFileInputStream.java:scanEditLog(350)) - Caught exception after scanning through 0 ops from /hadoop/hdfs/journal/devcluster/current/edits_inprogress_0000000000011766543 while determining its valid length. Position was 1036288
java.io.IOException: Can't scan a pre-transactional edit log.

avatar
Expert Contributor

Solved it! Sick JN didn't stop when I stopped it in Ambari and even when I stop HDFS in Ambari. I killed the JN process manually, replaced the data from healthy JN and run HDFS. Now it works! 🙂

avatar
Expert Contributor

Thanks @Brandon Wilson it worked for me too.