in CDH 5.13 some queries have a very high client fetch time (executed from Hue)
This one example here produced just 10 records, but was running for more than 3 hours. I think Hue does not close the fetch procedure, and Impala Daemon thinks the client will fetch for more.. Even though this does not make sense, Impala Daemon "knows" that 100% of records are sent to the client so why does not cancel or close it?
Here are the selected query stats:
Query Type: QUERY Query State: FINISHED Start Time: Aug 22, 2018 9:11:03 AM End Time: Aug 22, 2018 12:26:40 PM Duration: 3h, 15m Rows Produced: 10 Admission Result: Admitted (queued) Admission Wait Time: 5ms Bytes Streamed: 353 B Client Fetch Wait Time: 3.3h Client Fetch Wait Time Percentage: 100 Connected User: hue/xxx Estimated per Node Peak Memory: 32.0 MiB File Formats: PARQUET/SNAPPY HDFS Average Scan Range: 1.3 KiB HDFS Bytes Read: 1.3 KiB HDFS Bytes Read From Cache: 0 B HDFS Bytes Read From Cache Percentage: 0 HDFS Local Bytes Read: 1.3 KiB HDFS Local Bytes Read Percentage: 100 HDFS Remote Bytes Read: 0 B HDFS Remote Bytes Read Percentage: 0 HDFS Scanner Average Read Throughput: 0 B/s HDFS Short Circuit Bytes Read: 1.3 KiB HDFS Short Circuit Bytes Read Percentage: 100 Impala Version: impalad version 2.10.0-cdh5.13.3 RELEASE (build 15a453e15865344e75ce0fc6c4c760696d50f626) Out of Memory: false Per Node Peak Memory Usage: 197.1 KiB Planning Wait Time: 1ms Planning Wait Time Percentage: 0 Pool: root.pool1 Query Status: OK Session ID: 9647e779051c0b0b:302f01f9698839ba Session Type: HIVESERVER2 Statistics Corrupt: false Statistics Missing: true Threads: CPU Time: 13ms Threads: CPU Time Percentage: 78 Threads: Network Receive Wait Time: 0ms Threads: Network Receive Wait Time Percentage: 0 Threads: Network Send Wait Time: 1ms Threads: Network Send Wait Time Percentage: 11 Threads: Storage Wait Time: 1ms Threads: Storage Wait Time Percentage: 11
I have couple of questions:
- is this a problem on Impala or in Hue side?
- the impala has idle_session_timeout=7200 configured. Why did not closed the IDaemon the session after 2 hours of inactivity?
- is this hanging query occupying a "slot" in resource pools - affecting Max Running Queries in Impala admission control? (My observation is yes, just want to be sure)
One more observation:
during the query "fetch time" the query on Impala daemons is reported as:
"waiting to be closed"
But has a state=FINISHED, First row fetched, Scan progress 100%.
So my additional question is why is Impala not closing automatically the queries when the state is in "FINISHED"? Is this a configurable behaviour?
Edit: adding query timeout does not affect this behaviour:
Configured Hue to 30sec timeout, but the query is waiting to be closed for more than 2 minutes...
This is directly from the Query profile:
Query Options (set by configuration): MEM_LIMIT=419430400,QUERY_TIMEOUT_S=30 Query Options (set by configuration and planner): MEM_LIMIT=419430400,QUERY_TIMEOUT_S=30,MT_DOP=0