Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

After server crash, HA Standby NameNode "Premature EOF from inputStream" ; JournalNode out of sync

avatar
Expert Contributor

CDH 5.1.3 installed with Parcel, HDFS HA enabled.

 

After a server crash (node running NameNode in standby mode and JournalNode), issues occur during restart. 

 

In NameNode log:

Failed to load image from FSImageFile(file=/data1/dfs/nn/current/fsimage_0000000000004637167, cpktTxId=0000000000004637167)
java.io.IOException: Premature EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194)
at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:221)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:906)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:892)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:715)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:653)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:276)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:882)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:629)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:498)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:554)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:720)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:704)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1354)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1420)

 

In JournalNode log:

IPC Server handler 3 on 8485, call org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocol.getEditLogManifest from 192.168.88.37:53375 Call#2 Retry#0: output error

 

How can I recover the node?

 

 Thanks

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Not sure whether all these are correct. But I managed to make all instance "Green" with some manual operation...

 

JounalNode:

  • I stopped another (good) JournalNode and copied the 'jn/[nameservice]/current' directory to the bad JournalNode. I tried copying while the good JournalNode was running, but starting the bad JournalNode gave same error as previously.

NameNode:

  • I copied the fsimage file mentioned by the log from the a good NameNode.

 

View solution in original post

1 REPLY 1

avatar
Expert Contributor

Not sure whether all these are correct. But I managed to make all instance "Green" with some manual operation...

 

JounalNode:

  • I stopped another (good) JournalNode and copied the 'jn/[nameservice]/current' directory to the bad JournalNode. I tried copying while the good JournalNode was running, but starting the bad JournalNode gave same error as previously.

NameNode:

  • I copied the fsimage file mentioned by the log from the a good NameNode.