Support Questions
Find answers, ask questions, and share your expertise
Alert: Please see the Cloudera blog for information on the Cloudera Response to CVE-2021-4428

impala 'between predicate' not push down to kudu?


We are running kudu 1.3.0 with cdh 5.10(the kudu client version suppose to be 1.2).

When we doing tpc-ds query with impala on kudu(according to, we found that the 'query 3 between predicate' is not push down to kudu, cause kudu scan many rows return to impala.

The following is what we found in impala query profile:



tpc-ds q3.sql snippets:



any reply will be appreciate.



while reading the "using impala with kudu" document, it's saying that: "If the WHERE clause of your query includes comparisons with the operators =, <=, '\<', '\>', >=, BETWEEN, or IN, Kudu evaluates the condition directly and only returns the relevant results. This provides optimum performance, because Kudu only returns the relevant results to Impala."

But here, with tpc-ds query3, between predicate is not push down to kudu.
Is that anything wrong?

Finally I found that 'or' predicate will not push down to kudu:
explain select * from student where age=10 or age=20 or age=50 or age=60;
| Explain String |
| Estimated Per-Host Requirements: Memory=0B VCores=1 |
| WARNING: The following tables are missing relevant table and/or column statistics. |
| preresearch.student |
| |
| | |
| | |
| 00:SCAN KUDU [preresearch.student] |
| predicates: age = 10 OR age = 20 OR age = 50 OR age = 60 |

Cloudera Employee

Hi @lewiss, that's correct, currently OR (disjunctive) predicates can't be pushed to Kudu.  In theory Impala could rewrite this query to be a union between a bunch of disjoint sub-selects each using a BETWEEN predicate, but I think that optimization is currently missing (it's not something that can be done in general, since the result sets could overlap).