Support Questions
Find answers, ask questions, and share your expertise

Is SMB Join or SMB Map join Enabled in TEZ

Is SMB Join or SMB Map join Enabled in TEZ

The conversation of a join to SMB seems to be depending up on the execution engine. If I run the below commands on using MR

set hive.execution.engine=mr;

set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;

set hive.auto.convert.sortmerge.join=true;

set hive.optimize.bucketmapjoin=true;

set hive.optimize.bucketmapjoin.sortedmerge=true;

set hive.enforce.bucketing=true;

set hive.enforce.sorting=true;

set hive.auto.convert.join=true;

drop table key_value_large; drop table key_value_small;

create table key_value_large ( key int, value string ) partitioned by (ds string) CLUSTERED BY (key) SORTED BY (key ASC) INTO 8 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE;

create table key_value_small ( key int, value string ) partitioned by (ds string) CLUSTERED BY (key) SORTED BY (key ASC) INTO 4 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE;

explain extended select count(*) from key_value_large a JOIN key_value_small b ON a.key = b.key

I can see a 'Sorted Merge Bucket Map Join Operator' in the explain statement,But If I set the execution engine to TEZ.

set hive.execution.engine=tez;

And then run the same explain plan I get to see 'Map Join Operator' instead of SMB map join in the plan.

I could see in some of JIRA pages that SMB is not implemented in TEZ

http://mail-archives.apache.org/mod_mbox/hive-user/201508.mbox/%3c4D4BDAE9-F6A8-456F-A90A-A550D3C289...

Can someone if TEZ can run SMB join.

4 REPLIES 4

Re: Is SMB Join or SMB Map join Enabled in TEZ

@viswanath kammula

Share the explain plan for both execution engines tez and mr as:

explain <query>;

Re: Is SMB Join or SMB Map join Enabled in TEZ

You can see the plan below

Re: Is SMB Join or SMB Map join Enabled in TEZ

16381-mr.png

@Sindhu This is Explain for MR.

The Query is

explain select count(*) from key_value_large a JOIN key_value_small b ON a.key = b.key;

And I also had to do

set hive.enforce.sortmergebucketmapjoin=false; just for MR

Re: Is SMB Join or SMB Map join Enabled in TEZ

16383-tez.png

And this is the explain for TEZ