Hi all,
Although I have partition the data table, the query duration still take a long time,
I have search into the query profile, most of the time spending in "SCAN HDFS",
How can I improve the query performance?
There are 6 Nodes in my cluster,
Please give me some advice, Thanks!
Paste the query info below:
============================================================
Query Info
Query ID: fd47ab1669e06b97:c243ead53ffd6baa
User: TL120
Database: default
Coordinator: host4.localdomain
Query Type: QUERY
Query State: FINISHED
Start Time: Mar 12, 2018 1:53:05 PM
End Time: Mar 12, 2018 1:54:05 PM
Duration: 1m
Rows Produced: 588
Admission Result: Unknown
Aggregate Peak Memory Usage: 7.9 GiB
Bytes Streamed: 1.4 MiB
Client Fetch Wait Time: 969ms
Client Fetch Wait Time Percentage: 2
Connected User: TL120
Estimated per Node Peak Memory: 9.0 GiB
File Formats: PARQUET/SNAPPY,TEXT/NONE
HDFS Average Scan Range: 105.1 MiB
HDFS Bytes Read: 183.2 GiB
HDFS Bytes Read From Cache: 0 B
HDFS Bytes Read From Cache Percentage: 0
HDFS Local Bytes Read: 183.2 GiB
HDFS Local Bytes Read Percentage: 100
HDFS Remote Bytes Read: 0 B
HDFS Remote Bytes Read Percentage: 0
HDFS Scanner Average Read Throughput: 624.1 MiB/s
HDFS Short Circuit Bytes Read: 183.2 GiB
HDFS Short Circuit Bytes Read Percentage: 100
Impala Version: impalad version 2.5.0-cdh5.7.x RELEASE (build 7ec1f598396347d722b8c56b8f4f458122ed3f4e)
Memory Accrual: 6,392,067,652 byte seconds
Memory Spilled: 0 B
Network Address: ::ffff:10.85.214.159:58564
Node with Peak Memory Usage: host5.localdomain:22000
Out of Memory: false
Per Node Peak Memory Usage: 7.9 GiB
Planning Wait Time: 37ms
Planning Wait Time Percentage: 0
Pool: default-pool
Query Status: OK
Session ID: b3485b36f5f668b7:ade833c16d1086
Session Type: BEESWAX
Statistics Missing: true
Threads: CPU Time: 2.5h
Threads: CPU Time Percentage: 94
Threads: Network Receive Wait Time: 3.9m
Threads: Network Receive Wait Time Percentage: 2
Threads: Network Send Wait Time: 19ms
Threads: Network Send Wait Time Percentage: 0
Threads: Storage Wait Time: 5.7m
Threads: Storage Wait Time Percentage: 4
Threads: Total Time: 2.6h
Work CPU Time: 2.5h
Download Profile...
Query Timeline
Start execution: 33.78us (33.78us)
Planning finished: 37ms (37ms)
Ready to start 11 remote fragments: 40ms (2ms)
All 11 remote fragments started: 47ms (7ms)
Rows available: 59.54s (59.49s)
First row fetched: 59.69s (156ms)
Unregister query: 1.0m (814ms)
Planner Timeline
Analysis finished: 13ms (13ms)
Equivalence classes computed: 14ms (1ms)
Single node plan created: 22ms (8ms)
Runtime filters computed: 22ms (20.58us)
Distributed plan created: 26ms (3ms)
Lineage info computed: 27ms (926.52us)
Planning finished: 30ms (3ms)
============================================================