Support Questions

Find answers, ask questions, and share your expertise

After server crash, HA Standby NameNode "Premature EOF from inputStream" ; JournalNode out of sync

avatar
Expert Contributor

CDH 5.1.3 installed with Parcel, HDFS HA enabled.

 

After a server crash (node running NameNode in standby mode and JournalNode), issues occur during restart. 

 

In NameNode log:

Failed to load image from FSImageFile(file=/data1/dfs/nn/current/fsimage_0000000000004637167, cpktTxId=0000000000004637167)
java.io.IOException: Premature EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194)
at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:221)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:906)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:892)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:715)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:653)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:276)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:882)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:629)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:498)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:554)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:720)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:704)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1354)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1420)

 

In JournalNode log:

IPC Server handler 3 on 8485, call org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocol.getEditLogManifest from 192.168.88.37:53375 Call#2 Retry#0: output error

 

How can I recover the node?

 

 Thanks

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Not sure whether all these are correct. But I managed to make all instance "Green" with some manual operation...

 

JounalNode:

  • I stopped another (good) JournalNode and copied the 'jn/[nameservice]/current' directory to the bad JournalNode. I tried copying while the good JournalNode was running, but starting the bad JournalNode gave same error as previously.

NameNode:

  • I copied the fsimage file mentioned by the log from the a good NameNode.

 

View solution in original post

1 REPLY 1

avatar
Expert Contributor

Not sure whether all these are correct. But I managed to make all instance "Green" with some manual operation...

 

JounalNode:

  • I stopped another (good) JournalNode and copied the 'jn/[nameservice]/current' directory to the bad JournalNode. I tried copying while the good JournalNode was running, but starting the bad JournalNode gave same error as previously.

NameNode:

  • I copied the fsimage file mentioned by the log from the a good NameNode.