Support Questions

sgowda · ‎10-19-2016

journal node is logging below WARN in the logs and ambari is alerting about journal web ui is not accessible. any idea how to recover from this ?

2016-10-19 12:36:20,353 WARN  namenode.FSImage (EditLogFileInputStream.java:scanEditLog(359)) - Caught exception after scanning through 0 ops from /hadoop/hdfs/journal/stanleyhotel/current/edits_inprogress_0000000000064985103 while determining its valid length. Position was 888832

java.io.IOException: Can't scan a pre-transactional edit log.

	at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$LegacyReader.scanOp(FSEditLogOp.java:4959)

	at org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.scanNextOp(EditLogFileInputStream.java:245)

	at org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.scanEditLog(EditLogFileInputStream.java:355)

	at org.apache.hadoop.hdfs.server.namenode.FileJournalManager$EditLogFile.scanLog(FileJournalManager.java:551)

	at org.apache.hadoop.hdfs.qjournal.server.Journal.scanStorageForLatestEdits(Journal.java:192)

	at org.apache.hadoop.hdfs.qjournal.server.Journal.<init>(Journal.java:152)

	at org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:90)

	at org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:99)^C

	at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.heartbeat(JournalNodeRpcServer.java:158)

	at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.heartbeat(QJournalProtocolServerSideTranslatorPB.java:172)

	at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25423)

	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)

	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)

	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313)

	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309)

	at java.security.AccessController.doPrivileged(Native Method)

	at javax.security.auth.Subject.doAs(Subject.java:422)

	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)

	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307)

2016-10-19 12:36:20,353 WARN  namenode.FSImage (EditLogFileInputStream.java:scanEditLog(364)) - After resync, position is 888832

bwilson · ‎10-19-2016

Hi @Santhosh B Gowda,

Assuming that this is happening on a single JournalNode then you can try the following:

As a precaution, stop HDFS. This will shut down all Journalnodes as well.
On the node in question, move the fsimage edits directory (/hadoop/hdfs/journal/stanleyhotel/current) to an alternate location.
Copy the fsimage edits directory (/hadoop/hdfs/journal/stanleyhotel/current) from a functioning JournalNode to this node.
Start HDFS.

This should get this Journalnode back inline with the others and get you back to a properly functioning HA state.

View solution in original post

bwilson · ‎10-19-2016

Hi @Santhosh B Gowda,

Assuming that this is happening on a single JournalNode then you can try the following:

As a precaution, stop HDFS. This will shut down all Journalnodes as well.
On the node in question, move the fsimage edits directory (/hadoop/hdfs/journal/stanleyhotel/current) to an alternate location.
Copy the fsimage edits directory (/hadoop/hdfs/journal/stanleyhotel/current) from a functioning JournalNode to this node.
Start HDFS.

This should get this Journalnode back inline with the others and get you back to a properly functioning HA state.

sgowda · ‎10-25-2016

@Brandon Wilson Thanks it resolved the problem

aloha · ‎02-02-2017

Hi @Brandon Wilson

Your solution works perfectly but only if "edits_inprogress_" file has the same name on both JournalNodes (JN).

In case of my devcluster, I was not engaged in the problem of two months. During this time, a healthy JN has created a new "edits_inprogress_" file, but the sick JN still asks the old "edits_inprogress_" file. I did all 4 steps of your algorithm, but sick JN again asks old file. The content of /hadoop/hdfs/journal/devcluster/current is the same on both nodes.

What to do?

Log of healthy JN (edits_inprogress_0000000000016172345)

2017-02-02 10:15:12,513 INFO  namenode.FileJournalManager (FileJournalManager.java:finalizeLogSegment(133)) - Finalizing edits file /hadoop/hdfs/journal/devcluster/current/edits_inprogress_0000000000016172345 -> /hadoop/hdfs/journal/devcluster/current/edits_0000000000016172345-0000000000016172394

Log of sick JN (edits_inprogress_0000000000011766543)

2017-02-02 10:15:57,744 WARN  namenode.FSImage (EditLogFileInputStream.java:scanEditLog(350)) - Caught exception after scanning through 0 ops from /hadoop/hdfs/journal/devcluster/current/edits_inprogress_0000000000011766543 while determining its valid length. Position was 1036288
java.io.IOException: Can't scan a pre-transactional edit log.

aloha · ‎02-03-2017

Solved it! Sick JN didn't stop when I stopped it in Ambari and even when I stop HDFS in Ambari. I killed the JN process manually, replaced the data from healthy JN and run HDFS. Now it works! 🙂

erkansirin78 · ‎06-21-2018

Thanks @Brandon Wilson it worked for me too.

Cloudera Community

Support Questions

journal node edit log issue