Created on 09-21-2016 03:43 PM - edited 09-16-2022 03:40 AM
Hi,
I am running impala 2.5 on cdh 5.7.3.
I am currently bechmarking a simple query :
select count(*),`session_id` from flat_table group by `session_id` limit 10;
Here is the results of 'summary' :
+--------------+--------+----------+----------+---------+------------+-----------+---------------+-----------------------------------------+ | Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail | +--------------+--------+----------+----------+---------+------------+-----------+---------------+-----------------------------------------+ | 04:EXCHANGE | 1 | 13.63us | 13.63us | 10 | 10 | 0 B | -1 B | UNPARTITIONED | | 03:AGGREGATE | 6 | 1.11s | 1.15s | 60 | 247.06M | 171.09 MB | 128.00 MB | FINALIZE | | 02:EXCHANGE | 6 | 86.76ms | 92.08ms | 12.94M | 247.06M | 0 B | 0 B | HASH(session_id) | | 01:AGGREGATE | 6 | 4.07s | 6.14s | 12.94M | 247.06M | 525.03 MB | 128.00 MB | STREAMING | | 00:SCAN HDFS | 6 | 337.83ms | 494.40ms | 268.67M | 247.06M | 145.36 MB | 88.00 MB | flat_table | +--------------+--------+----------+----------+---------+------------+-----------+---------------+-----------------------------------------+
We can easily see that most of the time is going into the aggrerate part. And I have a lot of query that have the same botleneck.
I have control over hardware and impala configuration. The table is parquet table, cached in hdfs and with incremental stats for each partition.
Am I missing something or is this expected performances for a query like this?
Thanks
Created 09-23-2016 10:41 AM
No I don't think you're missing any obvious optimisation. Yes we only use a single core per aggregation per Impala daemon. This is obviously not ideal so we have a big push right now to do full parallelization of every operator.
Created 09-22-2016 12:46 PM
It's aggregating 10 million rows per core per second which is within expectations - the main factor affecting performance
We are currently working on multi-threaded joins and aggregation, which would increase the level of parallelism available in this case. There were also some improvements to the aggregation in Impala 2.6 (https://issues.cloudera.org/browse/IMPALA-3286) that might improve throughput a bit (I'd guess somewhere between 10% to 80% speedup depending on the input data).
Created on 09-22-2016 07:23 PM - edited 09-22-2016 08:26 PM
Hi,
I will update to 2.6 over the week end and post the results.
I have 32 cores per hosts available to impala daemon.
If you say that 10 million record are being process in parallel, I guess you imply that only one core is used by host (268M rows/6hosts/4 sec = ~11million).
Is it expected to have only 1 core use per Node ? Did I miss something in the configuration?
Or is it because of the multi-threaded aggregation improvement that you are working on ?
I just want to make sure I didn't miss any obvious optimization.
And just to tell you the column is of type "string".
thanks
Created 09-23-2016 10:41 AM
No I don't think you're missing any obvious optimisation. Yes we only use a single core per aggregation per Impala daemon. This is obviously not ideal so we have a big push right now to do full parallelization of every operator.
Created 09-26-2016 12:52 PM
Created 09-27-2016 09:50 AM
Thanks for the data point :).
We're tracking the parallelisation work here: https://issues.cloudera.org/browse/IMPALA-3902 . It's probably going to get enabled in phases - we may have parallelisation for aggregations before joins for example.