Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Impala query to scan two records in different partitions

SOLVED Go to solution

Impala query to scan two records in different partitions

Contributor

I have a table with two columns test, id and yyyymmdd and group_id are my partition columns.

 

When I run following query it runs fast as it only scans 1 partition.

 

select id, yyyymmdd, group_id, test from dwh.table where
(id='1a' and yyyymmdd=20170815 and group_id=1)

 

But when I tried to run following . It scans entire table. Even explain shows that it is going to perform full table scan. I think I'm missing somethign simple here. Appreciate community's help to find out what I'm missing here ?

 

select id, yyyymmdd, group_id, test from dwh.table where
(id='1a' and yyyymmdd=20170815 and group_id=1) OR
(id='2b' and yyyymmdd=20170811 and group_id=2)

 

How to scan two rows in two different partitions ?

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Impala query to scan two records in different partitions

Master Collaborator

I'm afraid Impala is not yet able to recognize that only two partitions need to be scanned. We're aware of the gap and that specific optimization is tracked by:

https://issues.apache.org/jira/browse/IMPALA-2108

 

For now, you can manually rewrite your query as suggested in the JIRA as follows:

select id, yyyymmdd, group_id, test from dwh.table where

((id='1a' and yyyymmdd=20170815 and group_id=1) OR (id='2b' and yyyymmdd=20170811 and group_id=2))

AND

((yyyymmdd=20170811 and group_id=2) OR (yyyymmdd=20170815 and group_id=1))

 

or alternatively, use a union:

 

select id, yyyymmdd, group_id, test from dwh.table where
id='1a' and yyyymmdd=20170815 and group_id=1

union all

select id, yyyymmdd, group_id, test from dwh.table where

id='2b' and yyyymmdd=20170811 and group_id=2

 

1 REPLY 1

Re: Impala query to scan two records in different partitions

Master Collaborator

I'm afraid Impala is not yet able to recognize that only two partitions need to be scanned. We're aware of the gap and that specific optimization is tracked by:

https://issues.apache.org/jira/browse/IMPALA-2108

 

For now, you can manually rewrite your query as suggested in the JIRA as follows:

select id, yyyymmdd, group_id, test from dwh.table where

((id='1a' and yyyymmdd=20170815 and group_id=1) OR (id='2b' and yyyymmdd=20170811 and group_id=2))

AND

((yyyymmdd=20170811 and group_id=2) OR (yyyymmdd=20170815 and group_id=1))

 

or alternatively, use a union:

 

select id, yyyymmdd, group_id, test from dwh.table where
id='1a' and yyyymmdd=20170815 and group_id=1

union all

select id, yyyymmdd, group_id, test from dwh.table where

id='2b' and yyyymmdd=20170811 and group_id=2