Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

hdfs journalnode fail, can not start

SOLVED Go to solution

hdfs journalnode fail, can not start

New Contributor

repeat this log: 4/14, 16:53:28.739 WARN org.apache.hadoop.hdfs.server.namenode.FSImage Caught exception after scanning through 0 ops from /data01/dfs/jn/pt-nameservice/current/edits_inprogress_0000000000182289435 while determining its valid length. Position was 987136 java.io.IOException: Can't scan a pre-transactional edit log. at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$LegacyReader.scanOp(FSEditLogOp.java:4592) at org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.scanNextOp(EditLogFileInputStream.java:245) at org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.scanEditLog(EditLogFileInputStream.java:355) at org.apache.hadoop.hdfs.server.namenode.FileJournalManager$EditLogFile.scanLog(FileJournalManager.java:551) at org.apache.hadoop.hdfs.qjournal.server.Journal.scanStorageForLatestEdits(Journal.java:193) at org.apache.hadoop.hdfs.qjournal.server.Journal.(Journal.java:153) at org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:93) at org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:102) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.newEpoch(JournalNodeRpcServer.java:136) at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.newEpoch(QJournalProtocolServerSideTranslatorPB.java:133) at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25417) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)

1 ACCEPTED SOLUTION

Accepted Solutions

Re: hdfs journalnode fail, can not start

Expert Contributor

This check seems to be added from https://issues.apache.org/jira/browse/HDFS-8965

 

/data01/dfs/jn/pt-nameservice/current/edits_inprogress_0000000000182289435 is either corrupted, or perhaps It is a log from before an upgrade?

 

Either way, To get it back I would suggest backing up the /data01/dfs/jn/pt-nameservice/current/ directory somewhere else and copy the journalnode data from one of your other journalnodes to that location.

9 REPLIES 9

Re: hdfs journalnode fail, can not start

Expert Contributor

This check seems to be added from https://issues.apache.org/jira/browse/HDFS-8965

 

/data01/dfs/jn/pt-nameservice/current/edits_inprogress_0000000000182289435 is either corrupted, or perhaps It is a log from before an upgrade?

 

Either way, To get it back I would suggest backing up the /data01/dfs/jn/pt-nameservice/current/ directory somewhere else and copy the journalnode data from one of your other journalnodes to that location.

Re: hdfs journalnode fail, can not start

New Contributor

Hi Ben,


I have configure a cluster HDFS HA using the Quorum Journal Manager with 5 Journal Node, on one of them I receive the same error as BaiTing, If I copy the journal files from a working Journal Node should resolve the issue?

 

Thanks in advance!

Re: hdfs journalnode fail, can not start

Expert Contributor

Yes, correct, moving out the old, corrupted files and copying in the files from a working Journal Node should allow you to start the journalnode.

Re: hdfs journalnode fail, can not start

New Contributor

It works thanks!

Re: hdfs journalnode fail, can not start

New Contributor

I am also facing the same issue but have a question: Do we need to stop HDFS ervice or put it to maintenance/read-only mode before copying the data? Otherwise, there may be data being written to the health journalnode (from which data will be copied).

Highlighted

Re: hdfs journalnode fail, can not start

Expert Contributor

you shouldn't have to, as when the journalnode starts up, it should recognize that it is behind and then sync up with the other journalnodes. (similar things happen when you stop a journalnode and then restart it later)

Re: hdfs journalnode fail, can not start

New Contributor
You don't have to stop any instance nor service on the cluster.
The error messages stop after 30~60 seconds.

Re: hdfs journalnode fail, can not start

New Contributor

Hi,ben

it does work for me,thanks!

Re: hdfs journalnode fail, can not start

Expert Contributor

Glad to hear it BaiTing!