Created 10-18-2017 03:07 PM
I have two bucketed tables on the same column, and I am trying to see if hive performs SMB join on these tables. But when I do an explain plan it only does a Map join.
Does SMB join work in Hive 1.2.1?
Created 10-19-2017 06:06 PM
SMB should work on Hive 1.2.1. Make sure these properties are set
set hive.auto.convert.sortmerge.join=true; set hive.optimize.bucketmapjoin = true; set hive.optimize.bucketmapjoin.sortedmerge = true;
Created 10-23-2017 07:26 PM
I can see SMB join only working on MR mode but not in TEZ mode.Could you see SMB join been shown in the explain plan with those properties ?
Created 10-20-2017 01:44 AM
Created 10-23-2017 07:35 PM
I still cannot find sort merge join operator.Below are my queries
set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
set hive.enforce.sortmergebucketmapjoin=false;
set hive.auto.convert.sortmerge.join=true;
set hive.optimize.bucketmapjoin=true;
set hive.optimize.bucketmapjoin.sortedmerge=true;
set hive.enforce.bucketing=true;
set hive.enforce.sorting=true;
set hive.auto.convert.join=true;
drop table key_value_large; drop table key_value_small;
create table key_value_large ( key int, value string ) partitioned by (ds string) CLUSTERED BY (key) SORTED BY (key ASC) INTO 8 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE;
create table key_value_small ( key int, value string ) partitioned by (ds string) CLUSTERED BY (key) SORTED BY (key ASC) INTO 4 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE;
explain extended select count(*) from key_value_large a JOIN key_value_small b ON a.key = b.key
It only shows map join
Created 10-23-2017 07:37 PM
Can you also try with MR execution engine by setting below property as well:
set hive.execution.engine=mr;
Created 10-24-2017 02:31 PM
Yes I tried it on MR and I can clearly see Sort merge bucket map join in explain query.
Are you suggesting me that SMB does not work in TEZ ?
Created 10-24-2017 03:00 PM
SMB join should work in MR and TEZ as well. I have just tested in my playground. In tez engine check for something like sort order: +.
Created 10-24-2017 07:12 PM
I do see sort oder: + in Tez in explain plan. But on Tez it shows it as Map join operator(with sort oder:+ option seen under reduce output operator) and on MR it clearly shows it as Sort Merge Bucket map operator.
Can you please confirm I cannot see Sort Merge Bucket map operator in Tez explain plan, the only way I can confirm its SMB join by looking for something like
sort order: +. in tez explain plan