Support Questions
Find answers, ask questions, and share your expertise

impala 'between predicate' not push down to kudu?


We are running kudu 1.3.0 with cdh 5.10(the kudu client version suppose to be 1.2).

When we doing tpc-ds query with impala on kudu(according to, we found that the 'query 3 between predicate' is not push down to kudu, cause kudu scan many rows return to impala.

The following is what we found in impala query profile:



tpc-ds q3.sql snippets:



any reply will be appreciate.



while reading the "using impala with kudu" document, it's saying that: "If the WHERE clause of your query includes comparisons with the operators =, <=, '\<', '\>', >=, BETWEEN, or IN, Kudu evaluates the condition directly and only returns the relevant results. This provides optimum performance, because Kudu only returns the relevant results to Impala."

But here, with tpc-ds query3, between predicate is not push down to kudu.
Is that anything wrong?

Finally I found that 'or' predicate will not push down to kudu:
explain select * from student where age=10 or age=20 or age=50 or age=60;
| Explain String |
| Estimated Per-Host Requirements: Memory=0B VCores=1 |
| WARNING: The following tables are missing relevant table and/or column statistics. |
| preresearch.student |
| |
| | |
| | |
| 00:SCAN KUDU [preresearch.student] |
| predicates: age = 10 OR age = 20 OR age = 50 OR age = 60 |

Cloudera Employee

Hi @lewiss, that's correct, currently OR (disjunctive) predicates can't be pushed to Kudu.  In theory Impala could rewrite this query to be a union between a bunch of disjoint sub-selects each using a BETWEEN predicate, but I think that optimization is currently missing (it's not something that can be done in general, since the result sets could overlap).