Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Impala slow on Group By

Explorer

Hi

 

I have created table

CREATE EXTERNAL TABLE tbw (   time_stamp STRING,     operator STRING,   datatype STRING,    band STRING ) PARTITIONED BY (   campaign STRING,    market STRING,    carrier STRING,    technology STRING ) STORED AS PARQUET LOCATION '/venkata/table/tbw' ;

 

Impala> explain
SELECT campaign,market, band, SUM(1) FROM
tbwGROUP BY 1,2,3;
Query: explain select campaign,market,band, SUM(1) FROM
tbw GROUP BY 1,2,3
+----------------------------------------------------------+
| Explain String |
+----------------------------------------------------------+
| Estimated Per-Host Requirements: Memory=44.00MB VCores=2 |
| |
| PLAN-ROOT SINK |
| | |
| 04:EXCHANGE [UNPARTITIONED] |
| | |
| 03:AGGREGATE [FINALIZE] |
| | output: sum:merge(1) |
| | group by: campaign, market,band |
| | |
| 02:EXCHANGE [HASH(campaign, market,band)] |
| | |
| 01:AGGREGATE [STREAMING] |
| | output: sum(1) |
| | group by: campaign, market,band |
| | |
| 00:SCAN HDFS [tbw] |
| partitions=6917/6917 files=6986 size=69.05GB |
+----------------------------------------------------------+
Fetched 18 row(s) in 0.04s

final when I ran :

 

Impala>SELECT campaign,market, band, SUM(1) FROM tbw GROUP BY 1,2,3;
Query: SELECT campaign,market, band, SUM(1) FROM tbw GROUP BY 1,2,3
Query submitted at: 2017-10-12 13:05:18 (Coordinator: http://ip-myIP.myflaovr.mycompany.com:25000)
Query progress can be monitored at: http://ip-myIP.myflaovr.mycompany.com:25000/query_plan?query_id=ed4d94c1d7bd4e6e:d0e1875e00000000

 

 

 

 

Fetched 26038 row(s) in 402.52s

 

It's taking around 7mints.

 

 

Thanks

Venkat.

4 REPLIES 4

Master Collaborator

What is your question?

 

If your question is around performance, please provide a query profile.

 

My guess is the query is CPU bound on the aggregation.

 

Would also be good to know your Impala version.

Explorer

Yes.

 

Impalad version 2.7.0-cdh5.10.1 

 

How to Improve my query performance(SELECT campaign, market, band, SUM(1) FROM

tbw GROUP BY 1,2,3;)?

 

 

Fetched 26038 row(s) in 402.52s

 

Table contains 35,342,034,607 records, 6986 files, parquet format and 70GB.

 

Impala runs on 10 Nodes(Max Memory 800Gib)

 

 

Query profile:

query memory.PNGquery plan.PNG

query summary.PNG

 

 

Query b54a1f635a0503a0:9e55c34f00000000
myflaovr.mycompany.com
Profile

Query (id=b54a1f635a0503a0:9e55c34f00000000):
Summary:
Session ID: 874330b779ec785f:6e37ce57b00ee4b7
Session Type: BEESWAX
Start Time: 2017-10-12 19:20:14.354494000
End Time:
Query Type: QUERY
Query State: CREATED
Query Status: OK
Impala Version: impalad version 2.7.0-cdh5.10.1 RELEASE (build 876895d2a90346e69f2aea02d5528c2125ae7a32)
User: venkata@my.com
Connected User: venkata@my.com
Delegated User:
Network Address: ::ffff:10.0.135.192:45724
Default Db: default
Sql Statement: select campaign,market, band, SUM(1) FROM
my. _tbw GROUP BY 1,2,3
Coordinator: ip-myIP.myflaovr.my.com:22000
Query Options (non default): MEM_LIMIT=2147483648
Plan:
----------------
Estimated Per-Host Requirements: Memory=44.00MB VCores=2

PLAN-ROOT SINK
|
04:EXCHANGE [UNPARTITIONED]
| hosts=10 per-host-mem=unavailable
| tuple-ids=1 row-size=58B cardinality=14130
|
03:AGGREGATE [FINALIZE]
| output: sum:merge(1)
| group by: campaign, market, band
| hosts=10 per-host-mem=10.00MB
| tuple-ids=1 row-size=58B cardinality=14130
|
02:EXCHANGE [HASH(campaign,market,band)]
| hosts=10 per-host-mem=0B
| tuple-ids=1 row-size=58B cardinality=14130
|
01:AGGREGATE [STREAMING]
| output: sum(1)
| group by: campaign, market, band
| hosts=10 per-host-mem=10.00MB
| tuple-ids=1 row-size=58B cardinality=14130
|
00:SCAN HDFS [my. _tbw, RANDOM]
partitions=6917/6917 files=6986 size=69.05GB
table stats: 35342034607 rows total
column stats: all
hosts=10 per-host-mem=24.00MB
tuple-ids=0 row-size=50B cardinality=35342034607
----------------
Estimated Per-Host Mem: 46137344
Estimated Per-Host VCores: 2
Request Pool: root.users
Admission result: Admitted immediately
Planner Timeline: 44.968ms
- Analysis finished: 1.680ms (1.680ms)
- Equivalence classes computed: 1.875ms (195.279us)
- Single node plan created: 18.650ms (16.774ms)
- Runtime filters computed: 18.706ms (55.973us)
- Distributed plan created: 18.892ms (186.353us)
- Lineage info computed: 18.961ms (69.433us)
- Planning finished: 44.968ms (26.006ms)
Query Timeline: 2m21s
- Query submitted: 0.000ns (0.000ns)
- Planning finished: 280.000ms (280.000ms)
- Submit for admission: 334.001ms (54.000ms)
- Completed admission: 334.001ms (0.000ns)
- Ready to start 21 fragment instances: 373.001ms (39.000ms)
- All 21 fragment instances started: 912.003ms (539.001ms)
- ComputeScanRangeAssignmentTimer: 11.000ms
ImpalaServer:
- ClientFetchWaitTimer: 0.000ns
- RowMaterializationTimer: 0.000ns
Execution Profile b54a1f635a0503a0:9e55c34f00000000:(Total: 539.001ms, non-child: 0.000ns, % non-child: 0.00%)
Number of filters: 0
Filter routing table:
ID Src. Node Tgt. Node(s) Targets Target type Partition filter Pending (Expected) First arrived Completed Enabled
----------------------------------------------------------------------------------------------------------------------------

Fragment instance start latencies: Count: 21, 25th %-ile: 1ms, 50th %-ile: 2ms, 75th %-ile: 137ms, 90th %-ile: 153ms, 95th %-ile: 164ms, 99.9th %-ile: 176ms
- FiltersReceived: 0 (0)
- FinalizationTimer: 0.000ns
Coordinator Fragment F02:
Instance b54a1f635a0503a0:9e55c34f00000000 (ip-myIP.myflaovr.my.com:22000):(Total: 40.000ms, non-child: 1.000ms, % non-child: 2.50%)
MemoryUsage(4s000ms): 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB
- AverageThreadTokens: 0.00
- BloomFilterBytes: 0
- PeakMemoryUsage: 8.00 KB (8192)
- PerHostPeakMemUsage: 64.09 MB (67201658)
- RowsProduced: 0 (0)
- TotalNetworkReceiveTime: 0.000ns
- TotalNetworkSendTime: 0.000ns
- TotalStorageWaitTime: 0.000ns
- TotalThreadsInvoluntaryContextSwitches: 0 (0)
- TotalThreadsTotalWallClockTime: 0.000ns
- TotalThreadsSysTime: 0.000ns
- TotalThreadsUserTime: 0.000ns
- TotalThreadsVoluntaryContextSwitches: 0 (0)
Fragment Instance Lifecycle Timings:
- OpenTime: 0.000ns
- ExecTreeOpenTime: 0.000ns
- PrepareTime: 40.000ms
- ExecTreePrepareTime: 0.000ns
BlockMgr:
- BlockWritesOutstanding: 0 (0)
- BlocksCreated: 48 (48)
- BlocksRecycled: 0 (0)
- BufferedPins: 0 (0)
- BytesWritten: 0
- MaxBlockSize: 8.00 MB (8388608)
- MemoryLimit: 1.60 GB (1717986944)
- PeakMemoryUsage: 784.00 KB (802816)
- ScratchFileUsedBytes: 0
- TotalBufferWaitTime: 0.000ns
- TotalEncryptionTime: 0.000ns
- TotalReadBlockTime: 0.000ns
PLAN_ROOT_SINK:
- PeakMemoryUsage: 0
CodeGen:(Total: 39.000ms, non-child: 39.000ms, % non-child: 100.00%)
- CodegenTime: 0.000ns
- CompileTime: 0.000ns
- LoadTime: 0.000ns
- ModuleBitcodeSize: 1.90 MB (1992592)
- NumFunctions: 0 (0)
- NumInstructions: 0 (0)
- OptimizationTime: 0.000ns
- PeakMemoryUsage: 0
- PrepareTime: 38.000ms
EXCHANGE_NODE (id=4):
BytesReceived(4s000ms): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
- BytesReceived: 0
- ConvertRowBatchTime: 0.000ns
- DeserializeRowBatchTimer: 0.000ns
- FirstBatchArrivalWaitTime: 0.000ns
- PeakMemoryUsage: 0
- RowsReturned: 0 (0)
- RowsReturnedRate: 0
- SendersBlockedTimer: 0.000ns
- SendersBlockedTotalTimer(*): 0.000ns
Averaged Fragment F01:(Total: 23.600ms, non-child: 0.000ns, % non-child: 0.00%)
split sizes: min: 0, max: 0, avg: 0, stddev: 0
- AverageThreadTokens: 1.00
- BloomFilterBytes: 0
- PeakMemoryUsage: 2.80 MB (2934136)
- PerHostPeakMemUsage: 58.02 MB (60843115)
- RowsProduced: 0 (0)
- TotalNetworkReceiveTime: 0.000ns
- TotalNetworkSendTime: 0.000ns
- TotalStorageWaitTime: 0.000ns
- TotalThreadsInvoluntaryContextSwitches: 0 (0)
- TotalThreadsTotalWallClockTime: 0.000ns
- TotalThreadsSysTime: 0.000ns
- TotalThreadsUserTime: 0.000ns
- TotalThreadsVoluntaryContextSwitches: 0 (0)
Fragment Instance Lifecycle Timings:
- OpenTime: 0.000ns
- ExecTreeOpenTime: 0.000ns
- PrepareTime: 23.600ms
- ExecTreePrepareTime: 800.000us
BlockMgr:
- BlockWritesOutstanding: 0 (0)
- BlocksCreated: 48 (48)
- BlocksRecycled: 0 (0)
- BufferedPins: 0 (0)
- BytesWritten: 0
- MaxBlockSize: 8.00 MB (8388608)
- MemoryLimit: 1.60 GB (1717986944)
- PeakMemoryUsage: 784.00 KB (802816)
- ScratchFileUsedBytes: 0
- TotalBufferWaitTime: 0.000ns
- TotalEncryptionTime: 0.000ns
- TotalReadBlockTime: 0.000ns
DataStreamSender (dst_id=4):
- BytesSent: 0
- NetworkThroughput(*): 0.00 /sec
- OverallThroughput: 0.00 /sec
- PeakMemoryUsage: 2.24 KB (2296)
- RowsReturned: 0 (0)
- SerializeBatchTime: 0.000ns
- TransmitDataRPCTime: 0.000ns
- UncompressedRowBatchSize: 0
CodeGen:(Total: 149.800ms, non-child: 149.800ms, % non-child: 100.00%)
- CodegenTime: 2.499ms
- CompileTime: 43.400ms
- LoadTime: 0.000ns
- ModuleBitcodeSize: 1.90 MB (1992592)
- NumFunctions: 48 (48)
- NumInstructions: 984 (984)
- OptimizationTime: 84.400ms
- PeakMemoryUsage: 492.00 KB (503808)
- PrepareTime: 21.500ms
AGGREGATION_NODE (id=3):(Total: 800.000us, non-child: 800.000us, % non-child: 100.00%)
- BuildTime: 0.000ns
- GetNewBlockTime: 200.002us
- GetResultsTime: 0.000ns
- HTResizeTime: 0.000ns
- HashBuckets: 0 (0)
- LargestPartitionPercent: 0 (0)
- MaxPartitionLevel: 0 (0)
- NumRepartitions: 0 (0)
- PartitionsCreated: 16 (16)
- PeakMemoryUsage: 2.31 MB (2419840)
- PinTime: 0.000ns
- RowsRepartitioned: 0 (0)
- RowsReturned: 0 (0)
- RowsReturnedRate: 0
- SpilledPartitions: 0 (0)
- UnpinTime: 0.000ns
EXCHANGE_NODE (id=2):
- BytesReceived: 0
- ConvertRowBatchTime: 0.000ns
- DeserializeRowBatchTimer: 0.000ns
- FirstBatchArrivalWaitTime: 0.000ns
- PeakMemoryUsage: 0
- RowsReturned: 0 (0)
- RowsReturnedRate: 0
- SendersBlockedTimer: 0.000ns
- SendersBlockedTotalTimer(*): 0.000ns
Fragment F01:
Instance b54a1f635a0503a0:12 (host=ip-myIP.myflaovr.my.com:22000):(Total: 20.000ms, non-child: 0.000ns, % non-child: 0.00%)

x 10 nodes.

 

 

 

Thanks.

 

Explorer

Can you reply me  @alex.behm ?

Master Collaborator

Thanks for the profile. Unfortunately, that profile is not useful to me because I need a profile of the query after it has completely finished (query state of the profile you sent is CREATED). Please run the query, let it finish, and then provide the profile of that successfully run query.

 

In any case, you should know that by default Impala runs all operations except scans in a single thread per node. This tends to work well for multi-user workloads, but might you might not be able to max out the CPU with one single query.

 

You may experiment with the MT_DOP query option which controls the intra-node paralellism. However, it only works for certain "simple" queries without distributed joins. I encourade you to read up on this option in the docs:

https://www.cloudera.com/documentation/enterprise/latest/topics/impala_mt_dop.html

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.