Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

NameNode crash while enabling G1GC on large heap

avatar
Explorer

We've enabled HA in our NameNodes. After increasing the heap size of the standby NameNode and manually failover to make it active, we encountered a crash. Previously we use CMS on a 120GB JVM heap. When we increased the heap size to 180GB and enabled G1GC, NameNode crashed due to timeout in writing to any JournalNodes.

Here are the logs of the crashed NameNode:

2019-01-09 18:54:07,788 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
2019-01-09 18:54:10,042 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 6001 ms (timeout=20000 ms) for a response for sendEdits. No responses yet.
2019-01-09 18:54:11,043 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 7003 ms (timeout=20000 ms) for a response for sendEdits. No responses yet.
2019-01-09 18:54:12,045 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 8004 ms (timeout=20000 ms) for a response for sendEdits. No responses yet.
2019-01-09 18:54:13,046 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 9005 ms (timeout=20000 ms) for a response for sendEdits. No responses yet.
2019-01-09 18:54:14,047 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 10007 ms (timeout=20000 ms) for a response for sendEdits. No responses yet.
2019-01-09 18:54:15,049 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 11008 ms (timeout=20000 ms) for a response for sendEdits. No responses yet.
2019-01-09 18:54:16,050 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 12009 ms (timeout=20000 ms) for a response for sendEdits. No responses yet.
2019-01-09 18:54:17,050 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 13010 ms (timeout=20000 ms) for a response for sendEdits. No responses yet.
2019-01-09 18:54:18,052 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 14011 ms (timeout=20000 ms) for a response for sendEdits. No responses yet.
2019-01-09 18:54:19,053 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 15012 ms (timeout=20000 ms) for a response for sendEdits. No responses yet.
2019-01-09 18:54:20,053 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 16013 ms (timeout=20000 ms) for a response for sendEdits. No responses yet.
2019-01-09 18:54:21,055 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 17014 ms (timeout=20000 ms) for a response for sendEdits. No responses yet.
2019-01-09 18:54:22,056 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 18015 ms (timeout=20000 ms) for a response for sendEdits. No responses yet.
2019-01-09 18:54:23,056 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 19016 ms (timeout=20000 ms) for a response for sendEdits. No responses yet.
2019-01-09 18:54:24,041 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: flush failed for required journal (JournalAndStream(mgr=QJM to [10.88.131.30:8485, 10.88.133.30:8485, 10.88.135.29:8485], stream=QuorumOutputStream starting at txid 46729238851))
java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.
        at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)
        at org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
        at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
        at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
        at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:533)
        at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393)
        at org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:57)
        at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:529)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:641)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3363)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:677)
        at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:213)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:485)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
2019-01-09 18:54:24,042 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Aborting QuorumOutputStream starting at txid 46729238851
2019-01-09 18:54:24,045 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2019-01-09 18:54:42,494 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at metrics-hadoop-warehousestore-nn01.server.hulu.com/10.88.135.13
************************************************************/

No obvious GC pauses were detected. Only one complain of JvmPauseMonitor was shown far early before the timeout:

2019-01-09 18:50:44,772 INFO org.apache.hadoop.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1997ms
No GCs detected<br>

So I think the timeout is not due to GC pause. However, resources (CPU, disk/network IO) usage are low at that time. We're using ShellBasedUnixGroupsMapping so the bottleneck is not in connections with LDAP server.

In the QuaromJournalNodes, no warning logs are found in that period. It seems the NameNode fails to connect to any of the JournalNodes.

Here're the GC options we set for G1GC:

-XX:+UseG1GC -XX:MaxGCPauseMillis=400 -XX:ParallelGCThreads=32 -XX:ConcGCThreads=8 -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/var/log/hadoop-hdfs/gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=5M -verbose.gc

Could anyone give some thoughts?

Thanks!

2 REPLIES 2

avatar
Explorer

We missed some WARN logs after the SHUTDOWN_MSG:

2019-01-09 18:54:42,614 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took 38573ms to send a batch of 11 edits (1261 bytes) to remote journal 10.88.135.29:8485
2019-01-09 18:54:42,639 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took 38598ms to send a batch of 11 edits (1261 bytes) to remote journal 10.88.131.30:8485
2019-01-09 18:54:42,639 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took 38598ms to send a batch of 11 edits (1261 bytes) to remote journal 10.88.133.30:8485

Somehow QJM took 38s to write to JNs. It's hard to diagnose the time out without a thread dump. We decided to bump the following values:

dfs.qjournal.select-input-streams.timeout.ms = 60000
dfs.qjournal.start-segment.timeout.ms = 60000
dfs.qjournal.write-txns.timeout.ms = 60000

avatar
New Contributor

Have you found the root cause , we are facing the same problem