03-27-2017 02:06 PM
Our Development VM CDH 5.9 is HDFS HA enabled. Since there are only five nodes so two JournalNodes are running same with NN but another one is on a DataNode. However, last night, we ran into this below issue and took down both NNs.
FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: flush failed for required journal (JournalAndStream(mgr=QJM to [xxx.xx.xx.xx:8485, xxx.xx.xx.xx:8485, xxx.xx.xx.xx:8485], stream=QuorumOutputStream starting at txid 9861544))
Sine this is a Dev cluster so no one was using it runing weekend or nights. After extensive search, the potential issue could be NN Garbarge collection pause.
Is there a good approach how to debug and tweak the heap setting? Currently, NN heap setting is 4GB on both. Default time out is 20 seconds (dfs.qjournal.select-input-streams.timeout.ms).
Any help really appreciated that.
01-09-2019 03:51 AM
Same issue in my environment.
Error: flush failed for required journal (JournalAndStream(mgr=QJM to [10.196.64.44:8485, 10.196.64.68:8485, 10.196.64.86:8485], stream=QuorumOutputStream starting at txid 434560443))
We are running cloudera on VM machine (cloud)