Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

HIVE 3.1.0 - very slow left join

Highlighted

HIVE 3.1.0 - very slow left join

New Contributor

I have table (dictionary) with ~80M records, transactional, stored as ORC. This table is properly compacted (I even run compaciton manually and verified by looking on files on HDFS). I'm using TEZ and cost based optimizer

When I run query which left join this table i runs very slow, ca. 15 minutes. If I change "left join" to "join" it runs 5 sec...

I even tried simplified query, where problem still occurs:

create table t1 as select 1 id;
select * from t1 left join dictionary d on t1.id = d.id;

I've check generated query plan and there are two mapper phases and one reducer.

When I change query replacing "left join" to join, there is no reducer phase. Similarly if I change query to such one:

select * from (select 1 id) t1 left join dictionary d on t1.id = d.id;

also there is no reducer phase and it runs fast.

I also tried change join order (dictionary right join t1), but it didn't help.

Do you have any idea where could be the problem? Why left join is

1 REPLY 1

Re: HIVE 3.1.0 - very slow left join

Expert Contributor

are the statistics up to date on the tables? It should be able to do a semi join reduction. Updating stats may help: https://cwiki.apache.org/confluence/display/Hive/Column%2BStatistics%2Bin%2BHive