Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Is SMB Join or SMB Map join Enabled in TEZ

Is SMB Join or SMB Map join Enabled in TEZ

The conversation of a join to SMB seems to be depending up on the execution engine. If I run the below commands on using MR

set hive.execution.engine=mr;

set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;

set hive.auto.convert.sortmerge.join=true;

set hive.optimize.bucketmapjoin=true;

set hive.optimize.bucketmapjoin.sortedmerge=true;

set hive.enforce.bucketing=true;

set hive.enforce.sorting=true;

set hive.auto.convert.join=true;

drop table key_value_large; drop table key_value_small;

create table key_value_large ( key int, value string ) partitioned by (ds string) CLUSTERED BY (key) SORTED BY (key ASC) INTO 8 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE;

create table key_value_small ( key int, value string ) partitioned by (ds string) CLUSTERED BY (key) SORTED BY (key ASC) INTO 4 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE;

explain extended select count(*) from key_value_large a JOIN key_value_small b ON a.key = b.key

I can see a 'Sorted Merge Bucket Map Join Operator' in the explain statement,But If I set the execution engine to TEZ.

set hive.execution.engine=tez;

And then run the same explain plan I get to see 'Map Join Operator' instead of SMB map join in the plan.

I could see in some of JIRA pages that SMB is not implemented in TEZ

http://mail-archives.apache.org/mod_mbox/hive-user/201508.mbox/%3c4D4BDAE9-F6A8-456F-A90A-A550D3C289...

Can someone if TEZ can run SMB join.

4 REPLIES 4
Highlighted

Re: Is SMB Join or SMB Map join Enabled in TEZ

@viswanath kammula

Share the explain plan for both execution engines tez and mr as:

explain <query>;

Highlighted

Re: Is SMB Join or SMB Map join Enabled in TEZ

You can see the plan below

Highlighted

Re: Is SMB Join or SMB Map join Enabled in TEZ

16381-mr.png

@Sindhu This is Explain for MR.

The Query is

explain select count(*) from key_value_large a JOIN key_value_small b ON a.key = b.key;

And I also had to do

set hive.enforce.sortmergebucketmapjoin=false; just for MR

Highlighted

Re: Is SMB Join or SMB Map join Enabled in TEZ

16383-tez.png

And this is the explain for TEZ

Don't have an account?
Coming from Hortonworks? Activate your account here