Support Questions

maurin · ‎09-21-2016

Hi,

I am running impala 2.5 on cdh 5.7.3.

I am currently bechmarking a simple query :

 select count(*),`session_id` from flat_table group by `session_id` limit 10;

Here is the results of 'summary' :

+--------------+--------+----------+----------+---------+------------+-----------+---------------+-----------------------------------------+
| Operator     | #Hosts | Avg Time | Max Time | #Rows   | Est. #Rows | Peak Mem  | Est. Peak Mem | Detail                                  |
+--------------+--------+----------+----------+---------+------------+-----------+---------------+-----------------------------------------+
| 04:EXCHANGE  | 1      | 13.63us  | 13.63us  | 10      | 10         | 0 B       | -1 B          | UNPARTITIONED                           |
| 03:AGGREGATE | 6      | 1.11s    | 1.15s    | 60      | 247.06M    | 171.09 MB | 128.00 MB     | FINALIZE                                |
| 02:EXCHANGE  | 6      | 86.76ms  | 92.08ms  | 12.94M  | 247.06M    | 0 B       | 0 B           | HASH(session_id)                |
| 01:AGGREGATE | 6      | 4.07s    | 6.14s    | 12.94M  | 247.06M    | 525.03 MB | 128.00 MB     | STREAMING                               |
| 00:SCAN HDFS | 6      | 337.83ms | 494.40ms | 268.67M | 247.06M    | 145.36 MB | 88.00 MB      | flat_table |
+--------------+--------+----------+----------+---------+------------+-----------+---------------+-----------------------------------------+

We can easily see that most of the time is going into the aggrerate part. And I have a lot of query that have the same botleneck.

I have control over hardware and impala configuration. The table is parquet table, cached in hdfs and with incremental stats for each partition.

Am I missing something or is this expected performances for a query like this?

Thanks

Tim Armstrong · ‎09-23-2016

No I don't think you're missing any obvious optimisation. Yes we only use a single core per aggregation per Impala daemon. This is obviously not ideal so we have a big push right now to do full parallelization of every operator.

View solution in original post

Tim Armstrong · ‎09-22-2016

It's aggregating 10 million rows per core per second which is within expectations - the main factor affecting performance

We are currently working on multi-threaded joins and aggregation, which would increase the level of parallelism available in this case. There were also some improvements to the aggregation in Impala 2.6 (https://issues.cloudera.org/browse/IMPALA-3286) that might improve throughput a bit (I'd guess somewhere between 10% to 80% speedup depending on the input data).

maurin · ‎09-22-2016

Hi,
I will update to 2.6 over the week end and post the results.
I have 32 cores per hosts available to impala daemon.
If you say that 10 million record are being process in parallel, I guess you imply that only one core is used by host (268M rows/6hosts/4 sec = ~11million).
Is it expected to have only 1 core use per Node ? Did I miss something in the configuration?
Or is it because of the multi-threaded aggregation improvement that you are working on ?
I just want to make sure I didn't miss any obvious optimization.

And just to tell you the column is of type "string".

thanks

Tim Armstrong · ‎09-23-2016

No I don't think you're missing any obvious optimisation. Yes we only use a single core per aggregation per Impala daemon. This is obviously not ideal so we have a big push right now to do full parallelization of every operator.

maurin · ‎09-26-2016

Hi,
I upgraded impala to 2.6. The query aggregation improved by about 15%.
I there a open ticket or an expected release date/version for the "full parallelization" ?

thanks

Tim Armstrong · ‎09-27-2016

Thanks for the data point :).

We're tracking the parallelisation work here: https://issues.cloudera.org/browse/IMPALA-3902 . It's probably going to get enabled in phases - we may have parallelisation for aggregations before joins for example.

Cloudera Community

Support Questions

AGGREGATE of query is to long