I have table (dictionary) with ~80M records, transactional, stored as ORC. This table is properly compacted (I even run compaciton manually and verified by looking on files on HDFS). I'm using TEZ and cost based optimizer
When I run query which left join this table i runs very slow, ca. 15 minutes. If I change "left join" to "join" it runs 5 sec...
I even tried simplified query, where problem still occurs:
create table t1 as select 1 id;
select * from t1 left join dictionary d on t1.id = d.id;
I've check generated query plan and there are two mapper phases and one reducer.
When I change query replacing "left join" to join, there is no reducer phase. Similarly if I change query to such one:
select * from (select 1 id) t1 left join dictionary d on t1.id = d.id;
also there is no reducer phase and it runs fast.
I also tried change join order (dictionary right join t1), but it didn't help.
Do you have any idea where could be the problem? Why left join is