Support Questions

Find answers, ask questions, and share your expertise

Name Node instability : flush failed for required journal

avatar

Hello,

I have a HA cluster (HDP-2.4.0.0-169) in dev environment, with 4 nodes virtualized with vShpere.

  • 1 Master (Active Namenode, Active Ressource Manager)
  • 1 Master/Slave (Standby Namenode, Standby Ressource Manager, Data Node, Node Manager, Journal Node)
  • 2 Slaves (Data Node, Node Manager, Journal Node)

The Name Nodes of my cluster shuts down regularly, even when there is no work on it.

Name Node log:

2016-06-02 02:23:18,682 FATAL namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(398)) - Error: flush failed for required journal (JournalAndStream(mgr=QJM to [192.168.1.49:8485, 192.168.1.47:8485, 192.168.1.46:8485], stream=QuorumOutputStream starting at txid 2549780))
java.io.IOException: Timed out waiting 60000ms for a quorum of nodes to respond.

It seems to come from the Journal Nodes but i can't find a way to fix the issue. Any idea where to look at?

Thanks!

Here are the Name Node detailed logs when it shutdown:

2016-06-02 02:23:12,610 INFO  BlockStateChange (BlockManager.java:computeReplicationWorkForBlocks(1527)) - BLOCK* neededReplications = 0, pendingReplications = 0.
2016-06-02 02:23:12,719 WARN  client.QuorumJournalManager (QuorumCall.java:waitFor(134)) - Waited 54038 ms (timeout=60000 ms) for a response for sendEdits. No responses yet.
2016-06-02 02:23:13,720 WARN  client.QuorumJournalManager (QuorumCall.java:waitFor(134)) - Waited 55039 ms (timeout=60000 ms) for a response for sendEdits. No responses yet.
2016-06-02 02:23:14,721 WARN  client.QuorumJournalManager (QuorumCall.java:waitFor(134)) - Waited 56040 ms (timeout=60000 ms) for a response for sendEdits. No responses yet.
2016-06-02 02:23:15,611 INFO  BlockStateChange (UnderReplicatedBlocks.java:chooseUnderReplicatedBlocks(394)) - chooseUnderReplicatedBlocks selected  Total=0 Reset bookmarks? true
2016-06-02 02:23:15,611 INFO  BlockStateChange (BlockManager.java:computeReplicationWorkForBlocks(1527)) - BLOCK* neededReplications = 0, pendingReplications = 0.
2016-06-02 02:23:15,722 WARN  client.QuorumJournalManager (QuorumCall.java:waitFor(134)) - Waited 57041 ms (timeout=60000 ms) for a response for sendEdits. No responses yet.
2016-06-02 02:23:16,723 WARN  client.QuorumJournalManager (QuorumCall.java:waitFor(134)) - Waited 58043 ms (timeout=60000 ms) for a response for sendEdits. No responses yet.
2016-06-02 02:23:17,725 WARN  client.QuorumJournalManager (QuorumCall.java:waitFor(134)) - Waited 59044 ms (timeout=60000 ms) for a response for sendEdits. No responses yet.
2016-06-02 02:23:18,611 INFO  BlockStateChange (UnderReplicatedBlocks.java:chooseUnderReplicatedBlocks(394)) - chooseUnderReplicatedBlocks selected  Total=0 Reset bookmarks? true
2016-06-02 02:23:18,611 INFO  BlockStateChange (BlockManager.java:computeReplicationWorkForBlocks(1527)) - BLOCK* neededReplications = 0, pendingReplications = 0.
2016-06-02 02:23:18,682 FATAL namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(398)) - Error: flush failed for required journal (JournalAndStream(mgr=QJM to [192.168.1.49:8485, 192.168.1.47:8485, 192.168.1.46:8485], stream=QuorumOutputStream starting at txid 2549780))
java.io.IOException: Timed out waiting 60000ms for a quorum of nodes to respond.
                at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)
                at org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
                at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
                at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
                at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:533)
                at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393)
                at org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:57)
                at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:529)
                at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:647)
                at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2470)
                at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2335)
                at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:688)
                at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:397)
                at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
                at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
                at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
                at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)
                at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)
                at java.security.AccessController.doPrivileged(Native Method)
                at javax.security.auth.Subject.doAs(Subject.java:422)
                at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
                at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145)
2016-06-02 02:23:18,682 WARN  client.QuorumJournalManager (QuorumOutputStream.java:abort(72)) - Aborting QuorumOutputStream starting at txid 2549780
2016-06-02 02:23:21,611 INFO  BlockStateChange (UnderReplicatedBlocks.java:chooseUnderReplicatedBlocks(394)) - chooseUnderReplicatedBlocks selected  Total=0 Reset bookmarks? true
2016-06-02 02:23:21,611 INFO  BlockStateChange (BlockManager.java:computeReplicationWorkForBlocks(1527)) - BLOCK* neededReplications = 0, pendingReplications = 0.
2016-06-02 02:23:24,612 INFO  BlockStateChange (UnderReplicatedBlocks.java:chooseUnderReplicatedBlocks(394)) - chooseUnderReplicatedBlocks selected  Total=0 Reset bookmarks? true
2016-06-02 02:23:24,612 INFO  BlockStateChange (BlockManager.java:computeReplicationWorkForBlocks(1527)) - BLOCK* neededReplications = 0, pendingReplications = 0.
2016-06-02 02:23:24,625 WARN  client.QuorumJournalManager (IPCLoggerChannel.java:call(406)) - Took 65944ms to send a batch of 1 edits (205 bytes) to remote journal 192.168.1.47:8485
2016-06-02 02:23:24,773 WARN  client.QuorumJournalManager (IPCLoggerChannel.java:call(406)) - Took 66092ms to send a batch of 1 edits (205 bytes) to remote journal 192.168.1.49:8485
2016-06-02 02:23:24,898 WARN  client.QuorumJournalManager (IPCLoggerChannel.java:call(406)) - Took 66216ms to send a batch of 1 edits (205 bytes) to remote journal 192.168.1.46:8485
2016-06-02 02:23:25,201 INFO  util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 1
2016-06-02 02:23:25,258 INFO  namenode.NameNode (LogAdapter.java:info(47)) - SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at d1hdpmaster01/192.168.1.45
************************************************************/
2016-06-02 02:23:25,635 WARN  timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:putMetrics(212)) - Unable to send metrics to collector by address:http://d1hdpmaster01:6188/ws/v1/timeline/metric
4 REPLIES 4

avatar
Super Guru

Took 65944ms to send a batch of 1 edits (205 bytes) to remote journal 192.168.1.47:8485 2016-06-02

You either have a serious network problem b/w nodes or may be underlining disk write is very slow.

avatar
Super Guru

@Xavier LIEUTIER

how many Journaler nodes you have in cluster ? Make sure you have odd number.

  • There must be at least three JournalNode daemons, since edit log modifications must be written to a majority of JournalNodes. This will allow the system to tolerate the failure of a single machine. You can also run more than three JournalNodes, but in order to actually increase the number of failures the system can tolerate, you should run an odd number of JournalNodes, (three, five, seven, etc.) Note that when running with N JournalNodes, the system can tolerate at most (N - 1) / 2 failures and continue to function normally. If the requisite quorum is not available, the NameNode will not format or start, and you will see an error similar to this:
12/10/01 17:34:18 WARN namenode.FSEditLog: Unable to determine input streams from QJM to [10.0.1.10:8485, 10.0.1.10:8486, 10.0.1.10:8487]. Skipping.
java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.

avatar

I have 3 Journal Noeds in my cluster, but they don't seem to fail.

avatar

Hello @Xavier LIEUTIER.

These log messages indicate that there was a timeout condition when the NameNode attempted to call the JournalNodes. The NameNode must successfully call a quorum of JournalNodes: at least 2 out of 3. This means that the call timed out to at least 2 out of 3 of them. This is a fatal condition for the NameNode, so by design, it aborts.

There are multiple potential reasons for this timeout condition. Reviewing logs from the NameNodes and JournalNodes would likely reveal more details. There are several common causes to watch for:

  1. A long stop-the-world garbage collection pause may surpass the timeout threshold for the call. Garbage collection logging would show what kind of garbage collection activity the process is doing. You might also see log messages about the "JvmPauseMonitor". Consider reviewing the article NameNode Garbage Collection Configuration: Best Practices and Rationale to make sure your cluster's heap and garbage collection settings match best practices.
  2. In environments that integrate with LDAP for resolution of users' group memberships, load problems on the LDAP infrastructure can cause delays. In extreme cases, we have seen such timeouts at the JournalNodes cause edit logging calls to fail, which causes a NameNode abort and an HA failover. See Hadoop and LDAP: Usage, Load Patterns and Tuning for a more detailed description and potential mitigation steps.
  3. It is possible that there is a failure in network connectivity between the NameNode and the JournalNodes. This tends to be rare, because NameNodes and JournalNodes tend to be colocated on the same host or placed relatively close to one another in the network topology. Still, it is worth investigating that basic network connectivity between all NameNode hosts and all JournalNode hosts is working fine.