Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Impala runtime filter not working as expected

avatar
Explorer

Hey all,

I'm using CDH 5.9, Impala 2.7.

I'm examining the runtime-filter feature, but it's not working as I expected.

 

Here's an example of my case:

 

-- Table 1 - Partitioned by day
CREATE TABLE adb.test_prt (ID string)
partitioned BY (day INT);

INSERT INTO adb.test_prt PARTITION(day)
SELECT 'a' AS ID ,20170102 AS day
UNION all
SELECT 'b' AS ID ,20170102 AS day
UNION ALL
SELECT 'c' AS ID,20170102 AS day
UNION ALL
SELECT 'd' AS ID ,20170103 AS day
UNION ALL
SELECT 'd' AS ID ,20170105 AS day
UNION all
SELECT 'g' AS ID ,20170105 AS day


SELECT * FROM adb.test_prt
show partitions adb.test_prt

-- Table 2 - raw data
CREATE TABLE adb.test_1 (day INT)

INSERT INTO adb.test_1
SELECT 20170102 AS day

SELECT * FROM adb.test_1

 

--###################################
-- explain 1
explain
SELECT *
FROM adb.test_prt t
WHERE t.day IN (SELECT day FROM adb.test_1)

 

output : partitions=3/3 files=3 size=12B

 

-- explain 2

explain
SELECT *
FROM adb.test_prt t
WHERE t.day IN (20170102)

 

output:  partitions=1/3 files=1 size=6B

 

I don't understand why there is a difference between the outputs. Table adb.test_1 has only one value which match to specific partition in adb.test_prt. I'm expecting from the runtime filter to figure this out.

What am I missing?

 

Another question: Is this feature support joins as well, rather then where clause?

Here's an example

 

explain
SELECT *
FROM adb.test_prt t
inner JOIN adb.test_1 a
ON t.day=a.day

 

output:  partitions=3/3 files=3 size=12B
runtime filters: RF000 -> t.day

 

Thanks!

Dror

1 ACCEPTED SOLUTION

avatar
Expert Contributor
Can you provide the whole of your 'explain 1'? I would expect to see
runtime filtering enabled in the plan.

The number of partitions scanned ('partitions=3/3') is correct. Runtime
filtering happens at, well, runtime, so the planner doesn't know how many
partitions it's going to skip. In your second example, the planner can
figure this out ahead of time.

Henry

View solution in original post

10 REPLIES 10

avatar
Rising Star

Alex,

 

Thank you again.

 

Subquery approach has been recommended to our team as a long term solution.

However, for short-tem solution to avoid regression impact, using view with limited partitions has been selected.

 

If I remember correctly, in MySQL `table A` data can be limited by `ON Clause` before joining so that cadidates for join can be reduced.

 

Thank you for your valuable comment.

Gatsby