Member since
01-13-2019
6
Posts
0
Kudos Received
0
Solutions
01-27-2019
07:35 PM
Hi EricL, We are Cloudera customers. But there're no places to create tickets for Cloudera Impala. Could you help to create a ticket for this? We're still in CDH-5.x and really need this patch. Thanks
... View more
01-23-2019
10:15 PM
The graceful shutdown has been supported since Impala-3.1 (IMPALA-1760). Will the later release of CDH-5.x merge this patch too? Thanks
... View more
Labels:
- Labels:
-
Impala
01-23-2019
10:07 PM
The Impala-2.x has support ORC format in IMPALA-5717 in the community. Will the later release of CDH-5.x merge this patch? We really need this feature since most of our Hive tables are in ORC format due to some historical reasons. It's impossible to transform them into Parquet, not only because of the huge data size, but also the dramatic number of legacy jobs need to be changed. Thanks!
... View more
Labels:
- Labels:
-
Impala
01-21-2019
03:07 AM
We missed some WARN logs after the SHUTDOWN_MSG: 2019-01-09 18:54:42,614 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took 38573ms to send a batch of 11 edits (1261 bytes) to remote journal 10.88.135.29:8485
2019-01-09 18:54:42,639 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took 38598ms to send a batch of 11 edits (1261 bytes) to remote journal 10.88.131.30:8485
2019-01-09 18:54:42,639 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took 38598ms to send a batch of 11 edits (1261 bytes) to remote journal 10.88.133.30:8485
Somehow QJM took 38s to write to JNs. It's hard to diagnose the time out without a thread dump. We decided to bump the following values: dfs.qjournal.select-input-streams.timeout.ms = 60000
dfs.qjournal.start-segment.timeout.ms = 60000
dfs.qjournal.write-txns.timeout.ms = 60000
... View more
01-14-2019
08:28 AM
We've enabled HA in our NameNodes. After increasing the heap size of the standby NameNode and manually failover to make it active, we encountered a crash. Previously we use CMS on a 120GB JVM heap. When we increased the heap size to 180GB and enabled G1GC, NameNode crashed due to timeout in writing to any JournalNodes. Here are the logs of the crashed NameNode: 2019-01-09 18:54:07,788 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
2019-01-09 18:54:10,042 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 6001 ms (timeout=20000 ms) for a response for sendEdits. No responses yet.
2019-01-09 18:54:11,043 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 7003 ms (timeout=20000 ms) for a response for sendEdits. No responses yet.
2019-01-09 18:54:12,045 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 8004 ms (timeout=20000 ms) for a response for sendEdits. No responses yet.
2019-01-09 18:54:13,046 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 9005 ms (timeout=20000 ms) for a response for sendEdits. No responses yet.
2019-01-09 18:54:14,047 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 10007 ms (timeout=20000 ms) for a response for sendEdits. No responses yet.
2019-01-09 18:54:15,049 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 11008 ms (timeout=20000 ms) for a response for sendEdits. No responses yet.
2019-01-09 18:54:16,050 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 12009 ms (timeout=20000 ms) for a response for sendEdits. No responses yet.
2019-01-09 18:54:17,050 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 13010 ms (timeout=20000 ms) for a response for sendEdits. No responses yet.
2019-01-09 18:54:18,052 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 14011 ms (timeout=20000 ms) for a response for sendEdits. No responses yet.
2019-01-09 18:54:19,053 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 15012 ms (timeout=20000 ms) for a response for sendEdits. No responses yet.
2019-01-09 18:54:20,053 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 16013 ms (timeout=20000 ms) for a response for sendEdits. No responses yet.
2019-01-09 18:54:21,055 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 17014 ms (timeout=20000 ms) for a response for sendEdits. No responses yet.
2019-01-09 18:54:22,056 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 18015 ms (timeout=20000 ms) for a response for sendEdits. No responses yet.
2019-01-09 18:54:23,056 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 19016 ms (timeout=20000 ms) for a response for sendEdits. No responses yet.
2019-01-09 18:54:24,041 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: flush failed for required journal (JournalAndStream(mgr=QJM to [10.88.131.30:8485, 10.88.133.30:8485, 10.88.135.29:8485], stream=QuorumOutputStream starting at txid 46729238851))
java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.
at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)
at org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:533)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:57)
at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:529)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:641)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3363)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:677)
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:213)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:485)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
2019-01-09 18:54:24,042 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Aborting QuorumOutputStream starting at txid 46729238851
2019-01-09 18:54:24,045 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2019-01-09 18:54:42,494 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at metrics-hadoop-warehousestore-nn01.server.hulu.com/10.88.135.13
************************************************************/ No obvious GC pauses were detected. Only one complain of JvmPauseMonitor was shown far early before the timeout: 2019-01-09 18:50:44,772 INFO org.apache.hadoop.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1997ms
No GCs detected<br> So I think the timeout is not due to GC pause. However, resources (CPU, disk/network IO) usage are low at that time. We're using ShellBasedUnixGroupsMapping so the bottleneck is not in connections with LDAP server. In the QuaromJournalNodes, no warning logs are found in that period. It seems the NameNode fails to connect to any of the JournalNodes. Here're the GC options we set for G1GC: -XX:+UseG1GC -XX:MaxGCPauseMillis=400 -XX:ParallelGCThreads=32 -XX:ConcGCThreads=8 -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/var/log/hadoop-hdfs/gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=5M -verbose.gc Could anyone give some thoughts? Thanks!
... View more
Labels:
- Labels:
-
Apache Hadoop