Support Questions
Find answers, ask questions, and share your expertise

Can we see SMB join in explain query using Tez and Hive 1.2.1?

I have two bucketed tables on the same column, and I am trying to see if hive performs SMB join on these tables. But when I do an explain plan it only does a Map join.

Does SMB join work in Hive 1.2.1?

8 REPLIES 8

Re: Can we see SMB join in explain query using Tez and Hive 1.2.1?

Rising Star

SMB should work on Hive 1.2.1. Make sure these properties are set

set hive.auto.convert.sortmerge.join=true;
set hive.optimize.bucketmapjoin = true;
set hive.optimize.bucketmapjoin.sortedmerge = true;

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization#LanguageManualJoinO...

Re: Can we see SMB join in explain query using Tez and Hive 1.2.1?

@icocio

I can see SMB join only working on MR mode but not in TEZ mode.Could you see SMB join been shown in the explain plan with those properties ?

Re: Can we see SMB join in explain query using Tez and Hive 1.2.1?

Cloudera Employee
@vishwa

Yes the explain plan should show a 'Sort Merge Join Operator'.

In addition to the properties mentioned by @icocio you may also have to set below properties:

set hive.input.format= org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat; 
set hive.enforce.sortmergebucketmapjoin=false; 

Re: Can we see SMB join in explain query using Tez and Hive 1.2.1?

@thussain

I still cannot find sort merge join operator.Below are my queries

set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;

set hive.enforce.sortmergebucketmapjoin=false;

set hive.auto.convert.sortmerge.join=true;

set hive.optimize.bucketmapjoin=true;

set hive.optimize.bucketmapjoin.sortedmerge=true;

set hive.enforce.bucketing=true;

set hive.enforce.sorting=true;

set hive.auto.convert.join=true;

drop table key_value_large; drop table key_value_small;

create table key_value_large ( key int, value string ) partitioned by (ds string) CLUSTERED BY (key) SORTED BY (key ASC) INTO 8 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE;

create table key_value_small ( key int, value string ) partitioned by (ds string) CLUSTERED BY (key) SORTED BY (key ASC) INTO 4 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE;

explain extended select count(*) from key_value_large a JOIN key_value_small b ON a.key = b.key

It only shows map join

Re: Can we see SMB join in explain query using Tez and Hive 1.2.1?

Cloudera Employee
@vishwa

Can you also try with MR execution engine by setting below property as well:

set hive.execution.engine=mr;

Re: Can we see SMB join in explain query using Tez and Hive 1.2.1?

@thussain

Yes I tried it on MR and I can clearly see Sort merge bucket map join in explain query.

Are you suggesting me that SMB does not work in TEZ ?

Re: Can we see SMB join in explain query using Tez and Hive 1.2.1?

@vishwa

SMB join should work in MR and TEZ as well. I have just tested in my playground. In tez engine check for something like sort order: +.

Re: Can we see SMB join in explain query using Tez and Hive 1.2.1?

@Bala Vignesh N V

I do see sort oder: + in Tez in explain plan. But on Tez it shows it as Map join operator(with sort oder:+ option seen under reduce output operator) and on MR it clearly shows it as Sort Merge Bucket map operator.

Can you please confirm I cannot see Sort Merge Bucket map operator in Tez explain plan, the only way I can confirm its SMB join by looking for something like

sort order: +. in tez explain plan