- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
hdfs journalnode fail, can not start
- Labels:
-
HDFS
Created ‎04-14-2016 03:14 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
repeat this log: 4/14, 16:53:28.739 WARN org.apache.hadoop.hdfs.server.namenode.FSImage Caught exception after scanning through 0 ops from /data01/dfs/jn/pt-nameservice/current/edits_inprogress_0000000000182289435 while determining its valid length. Position was 987136 java.io.IOException: Can't scan a pre-transactional edit log. at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$LegacyReader.scanOp(FSEditLogOp.java:4592) at org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.scanNextOp(EditLogFileInputStream.java:245) at org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.scanEditLog(EditLogFileInputStream.java:355) at org.apache.hadoop.hdfs.server.namenode.FileJournalManager$EditLogFile.scanLog(FileJournalManager.java:551) at org.apache.hadoop.hdfs.qjournal.server.Journal.scanStorageForLatestEdits(Journal.java:193) at org.apache.hadoop.hdfs.qjournal.server.Journal.(Journal.java:153) at org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:93) at org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:102) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.newEpoch(JournalNodeRpcServer.java:136) at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.newEpoch(QJournalProtocolServerSideTranslatorPB.java:133) at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25417) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
Created ‎04-14-2016 09:24 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This check seems to be added from https://issues.apache.org/jira/browse/HDFS-8965
/data01/dfs/jn/pt-nameservice/current/edits_inprogress_0000000000182289435 is either corrupted, or perhaps It is a log from before an upgrade?
Either way, To get it back I would suggest backing up the /data01/dfs/jn/pt-nameservice/current/ directory somewhere else and copy the journalnode data from one of your other journalnodes to that location.
Created ‎04-14-2016 09:24 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This check seems to be added from https://issues.apache.org/jira/browse/HDFS-8965
/data01/dfs/jn/pt-nameservice/current/edits_inprogress_0000000000182289435 is either corrupted, or perhaps It is a log from before an upgrade?
Either way, To get it back I would suggest backing up the /data01/dfs/jn/pt-nameservice/current/ directory somewhere else and copy the journalnode data from one of your other journalnodes to that location.
Created ‎04-18-2016 08:50 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ben,

I have configure a cluster HDFS HA using the Quorum Journal Manager with 5 Journal Node, on one of them I receive the same error as BaiTing, If I copy the journal files from a working Journal Node should resolve the issue?
Thanks in advance!
Created ‎04-18-2016 09:12 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, correct, moving out the old, corrupted files and copying in the files from a working Journal Node should allow you to start the journalnode.
Created ‎04-18-2016 10:43 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It works thanks!
Created ‎08-08-2016 03:48 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am also facing the same issue but have a question: Do we need to stop HDFS ervice or put it to maintenance/read-only mode before copying the data? Otherwise, there may be data being written to the health journalnode (from which data will be copied).
Created ‎08-11-2016 12:00 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
you shouldn't have to, as when the journalnode starts up, it should recognize that it is behind and then sync up with the other journalnodes. (similar things happen when you stop a journalnode and then restart it later)
Created ‎02-23-2017 12:23 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The error messages stop after 30~60 seconds.
Created ‎11-24-2020 10:08 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
it works thanks
Created ‎04-18-2016 09:08 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,ben
it does work for me,thanks!
