Created 06-25-2022 05:52 AM
Cluster has three journal nodes all three were in bad health and name nodes went down.
Since then I am trying to bring up the name nodes.
I am getting this error-
2022-06-25 08:32:24,365 INFO org.apache.hadoop.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1165ms
GC pool 'G1 Young Generation' had collection(s): count=2 time=1574ms
GC pool 'G1 Old Generation' had collection(s): count=1 time=218271ms
2022-06-25 08:32:24,474 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log: 2434229/2578837 transactions completed. (94%)
2022-06-25 08:32:24,764 ERROR org.apache.hadoop.hdfs.server.namenode.FSImage: Error replaying edit log at offset 554705629. Expected transaction ID was 60366342312
Recent opcode offsets: 554704754 554705115 554705361 554705629
java.io.IOException: Premature EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:203)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$LengthPrefixedReader.decodeOpFrame(FSEditLogOp.java:4488)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$LengthPrefixedReader.decodeOp(FSEditLogOp.java:4420)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$Reader.readOp(FSEditLogOp.java:4293)
at org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.nextOpImpl(EditLogFileInputStream.java:203)
at org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.nextOp(EditLogFileInputStream.java:250)
at org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85)
at org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:188)
at org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85)
at org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:188)
at org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:188)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:141)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:897)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:750)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:318)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1150)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:797)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:614)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:676)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:844)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:823)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615)
2022-06-25 08:32:24,767 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem write lock held for 5741014 ms via
java.lang.Thread.getStackTrace(Thread.java:1559)
org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:945)
org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:198)
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1689)
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1181)
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:797)
org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:614)
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:676)
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:844)
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:823)
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547)
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615)
Number of suppressed write-lock reports: 0
Longest write-lock held interval: 5741014
2022-06-25 08:32:24,767 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception loading fsimage
org.apache.hadoop.hdfs.server.namenode.EditLogInputException: Error replaying edit log at offset 554705629. Expected transaction ID was 60366342312
Recent opcode offsets: 554704754 554705115 554705361 554705629
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:201)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:141)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:897)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:750)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:318)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1150)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:797)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:614)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:676)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:844)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:823)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615)
Caused by: java.io.IOException: Premature EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:203)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$LengthPrefixedReader.decodeOpFrame(FSEditLogOp.java:4488)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$LengthPrefixedReader.decodeOp(FSEditLogOp.java:4420)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$Reader.readOp(FSEditLogOp.java:4293)
at org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.nextOpImpl(EditLogFileInputStream.java:203)
at org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.nextOp(EditLogFileInputStream.java:250)
at org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85)
at org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:188)
at org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85)
at org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:188)
at org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:188)
... 12 more
2022-06-25 08:32:24,771 INFO org.mortbay.log: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@qtmhdnn1.atl.careerbuilder.com:50070
2022-06-25 08:32:24,871 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NameNode metrics system...
2022-06-25 08:32:24,872 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system stopped.
2022-06-25 08:32:24,872 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system shutdown complete.
2022-06-25 08:32:24,872 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
org.apache.hadoop.hdfs.server.namenode.EditLogInputException: Error replaying edit log at offset 554705629. Expected transaction ID was 60366342312
Recent opcode offsets: 554704754 554705115 554705361 554705629
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:201)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:141)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:897)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:750)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:318)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1150)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:797)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:614)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:676)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:844)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:823)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615)
Caused by: java.io.IOException: Premature EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:203)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$LengthPrefixedReader.decodeOpFrame(FSEditLogOp.java:4488)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$LengthPrefixedReader.decodeOp(FSEditLogOp.java:4420)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$Reader.readOp(FSEditLogOp.java:4293)
at org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.nextOpImpl(EditLogFileInputStream.java:203)
at org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.nextOp(EditLogFileInputStream.java:250)
at org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85)
at org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:188)
at org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85)
at org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:188)
at org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:188)
... 12 more
2022-06-25 08:32:24,874 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
Please Help !!!
Created 06-27-2022 07:46 AM
Hi @ds_explorer, it seems because the edit log is too big and cannot be read by NameNode completely on the default/configured timeout.
2022-06-25 08:32:24,872 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
org.apache.hadoop.hdfs.server.namenode.EditLogInputException: Error replaying edit log at offset 554705629. Expected transaction ID was 60366342312
Recent opcode offsets: 554704754 554705115 554705361 554705629
.....
Caused by: java.io.IOException: Premature EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:203)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$LengthPrefixedReader.decodeOpFrame(FSEditLogOp.java:4488)
To fix this, can you add the below parameter and value (if you already have then kindly increase the value)
HDFS > Configuration > JournalNode Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml
hadoop.http.idle_timeout.ms=180000
And then restart the required services.