Support Questions

BaiTing · ‎04-14-2016

repeat this log: 4/14, 16:53:28.739 WARN org.apache.hadoop.hdfs.server.namenode.FSImage Caught exception after scanning through 0 ops from /data01/dfs/jn/pt-nameservice/current/edits_inprogress_0000000000182289435 while determining its valid length. Position was 987136 java.io.IOException: Can't scan a pre-transactional edit log. at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$LegacyReader.scanOp(FSEditLogOp.java:4592) at org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.scanNextOp(EditLogFileInputStream.java:245) at org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.scanEditLog(EditLogFileInputStream.java:355) at org.apache.hadoop.hdfs.server.namenode.FileJournalManager$EditLogFile.scanLog(FileJournalManager.java:551) at org.apache.hadoop.hdfs.qjournal.server.Journal.scanStorageForLatestEdits(Journal.java:193) at org.apache.hadoop.hdfs.qjournal.server.Journal.(Journal.java:153) at org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:93) at org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:102) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.newEpoch(JournalNodeRpcServer.java:136) at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.newEpoch(QJournalProtocolServerSideTranslatorPB.java:133) at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25417) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)

ben.hemphill · ‎04-14-2016

This check seems to be added from https://issues.apache.org/jira/browse/HDFS-8965

/data01/dfs/jn/pt-nameservice/current/edits_inprogress_0000000000182289435 is either corrupted, or perhaps It is a log from before an upgrade?

Either way, To get it back I would suggest backing up the /data01/dfs/jn/pt-nameservice/current/ directory somewhere else and copy the journalnode data from one of your other journalnodes to that location.

View solution in original post

ben.hemphill · ‎04-14-2016

This check seems to be added from https://issues.apache.org/jira/browse/HDFS-8965

/data01/dfs/jn/pt-nameservice/current/edits_inprogress_0000000000182289435 is either corrupted, or perhaps It is a log from before an upgrade?

Either way, To get it back I would suggest backing up the /data01/dfs/jn/pt-nameservice/current/ directory somewhere else and copy the journalnode data from one of your other journalnodes to that location.

dpetrovan · ‎04-18-2016

Hi Ben,

 I have configure a cluster HDFS HA using the Quorum Journal Manager with 5 Journal Node, on one of them I receive the same error as BaiTing, If I copy the journal files from a working Journal Node should resolve the issue?

Thanks in advance!

ben.hemphill · ‎04-18-2016

Yes, correct, moving out the old, corrupted files and copying in the files from a working Journal Node should allow you to start the journalnode.

dpetrovan · ‎04-18-2016

It works thanks!

ussama · ‎08-08-2016

I am also facing the same issue but have a question: Do we need to stop HDFS ervice or put it to maintenance/read-only mode before copying the data? Otherwise, there may be data being written to the health journalnode (from which data will be copied).

ben.hemphill · ‎08-11-2016

you shouldn't have to, as when the journalnode starts up, it should recognize that it is behind and then sync up with the other journalnodes. (similar things happen when you stop a journalnode and then restart it later)

rafaelpecin · ‎02-23-2017

You don't have to stop any instance nor service on the cluster.
The error messages stop after 30~60 seconds.

banga · ‎11-24-2020

it works thanks

BaiTing · ‎04-18-2016

Hi,ben

it does work for me,thanks!

Cloudera Community

Support Questions

hdfs journalnode fail, can not start