Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

NameNode fails due to JournalNode de-sync/timeout

NameNode fails due to JournalNode de-sync/timeout

New Contributor

Hello,

 

We're currently running a fairly busy cluster on CDH 5.7.4 with our NN configured in HA with QJM running 3 JournalNodes. We've been experiencing periodic aborts of the NN for a while, however they seem to have gotten much more common after upgrading from CDH 5.4.10 to 5.7.4, which may or may not be unrelated.

 

The topology looks like this:

  • master1 hosts RM, JN, HBase Master
  • master2 hosts NN, JN
  • master3 hosts RM, JN, NN, HBase Master

 

Our RM and HBase Master also run in HA configuration.

 

The failure mode looks something like this: NN attempts to commit transactions to to the JournalNode quorum. One node (master1) usually commits very quickly, while the the other two typically exhibit one of two failure modes, either a de-sync or a timeout. It does appear that the timeouts and de-syncs are more common on master2 than on master3, however they're fairly common on both. When this happens NN obviously exits and the hot spare NN takes over. We maintain availability, but obviously this is less than ideal from an operational perspective. Checking over the logs, we see lots of complaints about long JN sync times, however we don't see disk saturation in our monitoring for the disks on the affected servers. The volumes in question are RAID1 volumes of two physical disks each.

 

For refence relevant logs are pasted below, with IP addresses scrubbed.

 

NameNode on master2:

2016-10-28 16:25:18,397 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 8003 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so f
ar: [master1:8485]. Exceptions so far: [master3:8485: Journal disabled until next roll]
2016-10-28 16:25:19,397 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 9004 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so f
ar: [master1:8485]. Exceptions so far: [master3:8485: Journal disabled until next roll]
2016-10-28 16:25:20,398 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 10005 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so
far: [master1:8485]. Exceptions so far: [master3:8485: Journal disabled until next roll]
2016-10-28 16:25:21,399 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 11006 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so
far: [master1:8485]. Exceptions so far: [master3:8485: Journal disabled until next roll]
2016-10-28 16:25:22,400 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 12007 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so
far: [master1:8485]. Exceptions so far: [master3:8485: Journal disabled until next roll]
2016-10-28 16:25:23,003 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30001 milliseconds
2016-10-28 16:25:23,402 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 13008 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so
far: [master1:8485]. Exceptions so far: [master3:8485: Journal disabled until next roll]
2016-10-28 16:25:24,403 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 14010 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so
far: [master1:8485]. Exceptions so far: [master3:8485: Journal disabled until next roll]
2016-10-28 16:25:25,404 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 15011 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so
far: [master1:8485]. Exceptions so far: [master3:8485: Journal disabled until next roll]
2016-10-28 16:25:26,405 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 16012 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so
far: [master1:8485]. Exceptions so far: [master3:8485: Journal disabled until next roll]
2016-10-28 16:25:27,406 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 17013 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so
far: [master1:8485]. Exceptions so far: [master3:8485: Journal disabled until next roll]
2016-10-28 16:25:28,408 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 18014 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so
far: [master1:8485]. Exceptions so far: [master3:8485: Journal disabled until next roll]
2016-10-28 16:25:29,409 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 19015 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so
far: [master1:8485]. Exceptions so far: [master3:8485: Journal disabled until next roll]
2016-10-28 16:25:30,394 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: flush failed for required journal (JournalAndStream(mgr=QJM to [master1:84
85, master2:8485, master3:8485], stream=QuorumOutputStream starting at txid 23834184641))
java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.

 

JournalNode on master2:

2016-10-28 16:25:33,430 WARN org.apache.hadoop.hdfs.qjournal.server.Journal: Sync of transaction range 23834653773-23834653915 took 23036ms
2016-10-28 16:25:34,078 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 8485: readAndProcess from client master2 threw exception [java.io.IOExceptio
n: Connection reset by peer]
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
        at sun.nio.ch.IOUtil.read(IOUtil.java:197)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
        at org.apache.hadoop.ipc.Server.channelRead(Server.java:2641)
        at org.apache.hadoop.ipc.Server.access$2800(Server.java:134)
        at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1488)
        at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:774)
        at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:647)
        at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:618)

JournalNode on master3:

2016-10-28 16:23:57,227 WARN org.apache.hadoop.hdfs.qjournal.server.Journal: Sync of transaction range 23834547943-23834547943 took 13986ms
2016-10-28 16:24:03,020 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 8485, call org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocol.journal from master2:
52766 Call#95308256 Retry#0
org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't write txid 23834590082 expecting nextTxId=23834590042
        at org.apache.hadoop.hdfs.qjournal.server.Journal.checkSync(Journal.java:485)
        at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:371)
        at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:149)
        at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:158)
        at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25421)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
2016-10-28 16:25:43,219 INFO org.apache.hadoop.hdfs.qjournal.server.Journal: Updating lastPromisedEpoch from 54 to 55 for client /master3
2 REPLIES 2

Re: NameNode fails due to JournalNode de-sync/timeout

Expert Contributor
2016-10-28 16:25:33,430 WARN org.apache.hadoop.hdfs.qjournal.server.Journal: Sync of transaction range 23834653773-23834653915 took 23036ms

 

Since your timeout is 20 seconds, and it took 23 seconds to sync this transaction, that is the cause of the failure, which I believe you know, just documenting for others.

The cause of the long sync could be many things, for this kind of thing my gut reaction is to look at GC for the JournalNode JVM. If you have GC logging enabled then those logs should tell us if this is part of the problem. Another possibility is resource contention. since you mentioned it happens on nodes 2 and 3 and those are the nodes for NN, you might check to see if there is significant GC activity on the NN at these times or other high CPU spikes happening at the same time. 

Re: NameNode fails due to JournalNode de-sync/timeout

Cloudera Employee

Addition to what Ben has said, You may want to check if the Journal Node is sharing the disk with any other roles.  Journal Node shouldnt be sharing its disk with any other roles (not even at OS level roles) As this will effect the JN response time under high load and cause slow sync problem.