Created on 01-17-2017 12:49 AM - edited 09-16-2022 03:55 AM
Hey all,
I'm using CDH 5.9, Impala 2.7.
I'm examining the runtime-filter feature, but it's not working as I expected.
Here's an example of my case:
-- Table 1 - Partitioned by day
CREATE TABLE adb.test_prt (ID string)
partitioned BY (day INT);
INSERT INTO adb.test_prt PARTITION(day)
SELECT 'a' AS ID ,20170102 AS day
UNION all
SELECT 'b' AS ID ,20170102 AS day
UNION ALL
SELECT 'c' AS ID,20170102 AS day
UNION ALL
SELECT 'd' AS ID ,20170103 AS day
UNION ALL
SELECT 'd' AS ID ,20170105 AS day
UNION all
SELECT 'g' AS ID ,20170105 AS day
SELECT * FROM adb.test_prt
show partitions adb.test_prt
-- Table 2 - raw data
CREATE TABLE adb.test_1 (day INT)
INSERT INTO adb.test_1
SELECT 20170102 AS day
SELECT * FROM adb.test_1
--###################################
-- explain 1
explain
SELECT *
FROM adb.test_prt t
WHERE t.day IN (SELECT day FROM adb.test_1)
output : partitions=3/3 files=3 size=12B
-- explain 2
explain
SELECT *
FROM adb.test_prt t
WHERE t.day IN (20170102)
output: partitions=1/3 files=1 size=6B
I don't understand why there is a difference between the outputs. Table adb.test_1 has only one value which match to specific partition in adb.test_prt. I'm expecting from the runtime filter to figure this out.
What am I missing?
Another question: Is this feature support joins as well, rather then where clause?
Here's an example
explain
SELECT *
FROM adb.test_prt t
inner JOIN adb.test_1 a
ON t.day=a.day
output: partitions=3/3 files=3 size=12B
runtime filters: RF000 -> t.day
Thanks!
Dror
Created 01-17-2017 10:14 AM
Created 03-21-2017 12:08 AM
Alex,
Thank you again.
Subquery approach has been recommended to our team as a long term solution.
However, for short-tem solution to avoid regression impact, using view with limited partitions has been selected.
If I remember correctly, in MySQL `table A` data can be limited by `ON Clause` before joining so that cadidates for join can be reduced.
Thank you for your valuable comment.
Gatsby