Created 05-21-2016 06:17 AM
Is it possible to take advantage of partition pruning without specifying the value of the partition column as a constant in the query? Is there a way to get it from another joined small table?
In this example, table wallet_month has 1 record with column month_of_year, that has value '2016-05'.
Table transaction_detail is partitioned on month_of_year, and has 400 partitions.
Looking at the Explain plan, below query still uses all 400 paritions from TD.
SELECT count(*)
FROM wallet_month WM
JOIN transaction_detail TD
AND TD.month_of_year = WM.month_of_year
Adding a where clause (WHERE WM.month_of_year = '2016-05') gives the desired behavior of only using 1 partition from TD, but I'd like to know if there's another way.
Created 05-23-2016 08:44 AM
Yes, that's right. It's enabled for some cases by default (broadcast joins) in Impala 2.5. To enable it for a wider category of joins you can set the query option runtime_filter_mode=global. This setting will become the default in Impala 2.7 because of the performance benefits.
Created 05-21-2016 07:12 AM
Hmm, looks like the functionality I'm looking for is called Dynamic Partition Pruning? (in Impala 2.5 / CDH 5.7)
https://dzone.com/articles/apache-impala-incubating-in-cdh-57-4x-faster-for-b
Created 05-23-2016 08:44 AM
Yes, that's right. It's enabled for some cases by default (broadcast joins) in Impala 2.5. To enable it for a wider category of joins you can set the query option runtime_filter_mode=global. This setting will become the default in Impala 2.7 because of the performance benefits.