Reply
Highlighted
Explorer
Posts: 25
Registered: ‎09-25-2016
Accepted Solution

Impala query to scan two records in different partitions

I have a table with two columns test, id and yyyymmdd and group_id are my partition columns.

 

When I run following query it runs fast as it only scans 1 partition.

 

select id, yyyymmdd, group_id, test from dwh.table where
(id='1a' and yyyymmdd=20170815 and group_id=1)

 

But when I tried to run following . It scans entire table. Even explain shows that it is going to perform full table scan. I think I'm missing somethign simple here. Appreciate community's help to find out what I'm missing here ?

 

select id, yyyymmdd, group_id, test from dwh.table where
(id='1a' and yyyymmdd=20170815 and group_id=1) OR
(id='2b' and yyyymmdd=20170811 and group_id=2)

 

How to scan two rows in two different partitions ?

 

 

 

Cloudera Employee
Posts: 290
Registered: ‎10-16-2013

Re: Impala query to scan two records in different partitions

I'm afraid Impala is not yet able to recognize that only two partitions need to be scanned. We're aware of the gap and that specific optimization is tracked by:

https://issues.apache.org/jira/browse/IMPALA-2108

 

For now, you can manually rewrite your query as suggested in the JIRA as follows:

select id, yyyymmdd, group_id, test from dwh.table where

((id='1a' and yyyymmdd=20170815 and group_id=1) OR (id='2b' and yyyymmdd=20170811 and group_id=2))

AND

((yyyymmdd=20170811 and group_id=2) OR (yyyymmdd=20170815 and group_id=1))

 

or alternatively, use a union:

 

select id, yyyymmdd, group_id, test from dwh.table where
id='1a' and yyyymmdd=20170815 and group_id=1

union all

select id, yyyymmdd, group_id, test from dwh.table where

id='2b' and yyyymmdd=20170811 and group_id=2

 

Announcements