Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

HIVE 3.1.0 - very slow left join

New Contributor

I have table (dictionary) with ~80M records, transactional, stored as ORC. This table is properly compacted (I even run compaciton manually and verified by looking on files on HDFS). I'm using TEZ and cost based optimizer

When I run query which left join this table i runs very slow, ca. 15 minutes. If I change "left join" to "join" it runs 5 sec...

I even tried simplified query, where problem still occurs:

create table t1 as select 1 id;
select * from t1 left join dictionary d on t1.id = d.id;

I've check generated query plan and there are two mapper phases and one reducer.

When I change query replacing "left join" to join, there is no reducer phase. Similarly if I change query to such one:

select * from (select 1 id) t1 left join dictionary d on t1.id = d.id;

also there is no reducer phase and it runs fast.

I also tried change join order (dictionary right join t1), but it didn't help.

Do you have any idea where could be the problem? Why left join is

1 REPLY 1

Expert Contributor

are the statistics up to date on the tables? It should be able to do a semi join reduction. Updating stats may help: https://cwiki.apache.org/confluence/display/Hive/Column%2BStatistics%2Bin%2BHive

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.