Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

Error: flush failed for required journal

Error: flush failed for required journal

Hi Communnity,

 

Our Development VM CDH 5.9 is HDFS HA enabled. Since there are only five nodes so two JournalNodes are running same with NN but another one is on a DataNode. However, last night, we ran into this below issue and took down both NNs.

 

FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: flush failed for required journal (JournalAndStream(mgr=QJM to [xxx.xx.xx.xx:8485, xxx.xx.xx.xx:8485, xxx.xx.xx.xx:8485], stream=QuorumOutputStream starting at txid 9861544))

 

Sine this is a Dev cluster so no one was using it runing weekend or nights. After extensive search, the potential issue could be NN Garbarge collection pause.

 

Is there a good approach how to debug and tweak the heap setting? Currently, NN heap setting is 4GB on both. Default time out is 20 seconds (dfs.qjournal.select-input-streams.timeout.ms).

 

Any help really appreciated that.

 

Thanks,

Silaphet 

 

 

2 REPLIES 2

Re: Error: flush failed for required journal

Explorer

Same issue in my environment.

Error: flush failed for required journal (JournalAndStream(mgr=QJM to [10.196.64.44:8485, 10.196.64.68:8485, 10.196.64.86:8485], stream=QuorumOutputStream starting at txid 434560443))

We are running cloudera on VM machine (cloud)

Re: Error: flush failed for required journal

Cloudera Employee

Hi,

 

When the NameNode flushes the edits to Journal Nodes it maintains the quorum of 20 seconds. The reason you are seeing this Error message is because it took >20 sec for NN to send the edits. This could be because of various reasons i.e NN GC or JVM pause, whether JN is sharing the disks with other roles, network communication issues , slow group lookups  etc. 

 

Checking the NameNode logs just before the FATAL message would be a good starting point. Check for Warning messages just before the FATAL error message on NameNode logs.