Reply
Explorer
Posts: 23
Registered: ‎11-29-2016

impala 'between predicate' not push down to kudu?

We are running kudu 1.3.0 with cdh 5.10(the kudu client version suppose to be 1.2).

When we doing tpc-ds query with impala on kudu(according to https://github.com/cloudera/impala-tpcds-kit), we found that the 'query 3 between predicate' is not push down to kudu, cause kudu scan many rows return to impala.

The following is what we found in impala query profile:

predicate.png

 

tpc-ds q3.sql snippets:

predicate-query.png

 

any reply will be appreciate.

 

Explorer
Posts: 23
Registered: ‎11-29-2016

Re: impala 'between predicate' not push down to kudu?

while reading the "using impala with kudu" document, it's saying that: "If the WHERE clause of your query includes comparisons with the operators =, <=, '\<', '\>', >=, BETWEEN, or IN, Kudu evaluates the condition directly and only returns the relevant results. This provides optimum performance, because Kudu only returns the relevant results to Impala."

But here, with tpc-ds query3, between predicate is not push down to kudu.
Is that anything wrong?
Highlighted
Explorer
Posts: 23
Registered: ‎11-29-2016

Re: impala 'between predicate' not push down to kudu?

Finally I found that 'or' predicate will not push down to kudu:
explain select * from student where age=10 or age=20 or age=50 or age=60;
+------------------------------------------------------------------------------------+
| Explain String |
+------------------------------------------------------------------------------------+
| Estimated Per-Host Requirements: Memory=0B VCores=1 |
| WARNING: The following tables are missing relevant table and/or column statistics. |
| preresearch.student |
| |
| PLAN-ROOT SINK |
| | |
| 01:EXCHANGE [UNPARTITIONED] |
| | |
| 00:SCAN KUDU [preresearch.student] |
| predicates: age = 10 OR age = 20 OR age = 50 OR age = 60 |
+------------------------------------------------------------------------------------+
Cloudera Employee
Posts: 20
Registered: ‎09-28-2015

Re: impala 'between predicate' not push down to kudu?

Hi @lewiss, that's correct, currently OR (disjunctive) predicates can't be pushed to Kudu.  In theory Impala could rewrite this query to be a union between a bunch of disjoint sub-selects each using a BETWEEN predicate, but I think that optimization is currently missing (it's not something that can be done in general, since the result sets could overlap).