Created 06-11-2017 03:46 PM
The conversation of a join to SMB seems to be depending up on the execution engine. If I run the below commands on using MR
set hive.execution.engine=mr;
set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
set hive.auto.convert.sortmerge.join=true;
set hive.optimize.bucketmapjoin=true;
set hive.optimize.bucketmapjoin.sortedmerge=true;
set hive.enforce.bucketing=true;
set hive.enforce.sorting=true;
set hive.auto.convert.join=true;
drop table key_value_large; drop table key_value_small;
create table key_value_large ( key int, value string ) partitioned by (ds string) CLUSTERED BY (key) SORTED BY (key ASC) INTO 8 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE;
create table key_value_small ( key int, value string ) partitioned by (ds string) CLUSTERED BY (key) SORTED BY (key ASC) INTO 4 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE;
explain extended select count(*) from key_value_large a JOIN key_value_small b ON a.key = b.key
I can see a 'Sorted Merge Bucket Map Join Operator' in the explain statement,But If I set the execution engine to TEZ.
set hive.execution.engine=tez;
And then run the same explain plan I get to see 'Map Join Operator' instead of SMB map join in the plan.
I could see in some of JIRA pages that SMB is not implemented in TEZ
Can someone if TEZ can run SMB join.
Created 06-14-2017 05:48 AM
Created 06-15-2017 02:19 PM
You can see the plan below
Created on 06-14-2017 02:54 PM - edited 08-17-2019 09:51 PM
@Sindhu This is Explain for MR.
The Query is
explain select count(*) from key_value_large a JOIN key_value_small b ON a.key = b.key;
And I also had to do
set hive.enforce.sortmergebucketmapjoin=false; just for MR
Created on 06-14-2017 02:55 PM - edited 08-17-2019 09:51 PM
And this is the explain for TEZ