Support Questions
Find answers, ask questions, and share your expertise

Can we see SMB join in explain query using Tez and Hive 1.2.1?

Can we see SMB join in explain query using Tez and Hive 1.2.1?

I have two bucketed tables on the same column, and I am trying to see if hive performs SMB join on these tables. But when I do an explain plan it only does a Map join.

Does SMB join work in Hive 1.2.1?

8 REPLIES 8

Re: Can we see SMB join in explain query using Tez and Hive 1.2.1?

Rising Star

SMB should work on Hive 1.2.1. Make sure these properties are set

set hive.auto.convert.sortmerge.join=true;
set hive.optimize.bucketmapjoin = true;
set hive.optimize.bucketmapjoin.sortedmerge = true;

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization#LanguageManualJoinO...

Re: Can we see SMB join in explain query using Tez and Hive 1.2.1?

@icocio

I can see SMB join only working on MR mode but not in TEZ mode.Could you see SMB join been shown in the explain plan with those properties ?

Re: Can we see SMB join in explain query using Tez and Hive 1.2.1?

Cloudera Employee
@vishwa

Yes the explain plan should show a 'Sort Merge Join Operator'.

In addition to the properties mentioned by @icocio you may also have to set below properties:

set hive.input.format= org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat; 
set hive.enforce.sortmergebucketmapjoin=false; 

Re: Can we see SMB join in explain query using Tez and Hive 1.2.1?

@thussain

I still cannot find sort merge join operator.Below are my queries

set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;

set hive.enforce.sortmergebucketmapjoin=false;

set hive.auto.convert.sortmerge.join=true;

set hive.optimize.bucketmapjoin=true;

set hive.optimize.bucketmapjoin.sortedmerge=true;

set hive.enforce.bucketing=true;

set hive.enforce.sorting=true;

set hive.auto.convert.join=true;

drop table key_value_large; drop table key_value_small;

create table key_value_large ( key int, value string ) partitioned by (ds string) CLUSTERED BY (key) SORTED BY (key ASC) INTO 8 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE;

create table key_value_small ( key int, value string ) partitioned by (ds string) CLUSTERED BY (key) SORTED BY (key ASC) INTO 4 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE;

explain extended select count(*) from key_value_large a JOIN key_value_small b ON a.key = b.key

It only shows map join

Re: Can we see SMB join in explain query using Tez and Hive 1.2.1?

Cloudera Employee
@vishwa

Can you also try with MR execution engine by setting below property as well:

set hive.execution.engine=mr;

Re: Can we see SMB join in explain query using Tez and Hive 1.2.1?

@thussain

Yes I tried it on MR and I can clearly see Sort merge bucket map join in explain query.

Are you suggesting me that SMB does not work in TEZ ?

Re: Can we see SMB join in explain query using Tez and Hive 1.2.1?

@vishwa

SMB join should work in MR and TEZ as well. I have just tested in my playground. In tez engine check for something like sort order: +.

Re: Can we see SMB join in explain query using Tez and Hive 1.2.1?

@Bala Vignesh N V

I do see sort oder: + in Tez in explain plan. But on Tez it shows it as Map join operator(with sort oder:+ option seen under reduce output operator) and on MR it clearly shows it as Sort Merge Bucket map operator.

Can you please confirm I cannot see Sort Merge Bucket map operator in Tez explain plan, the only way I can confirm its SMB join by looking for something like

sort order: +. in tez explain plan