I have created two tables using bucketing and ordered them by the common key. I have use clustered by and order by while creating the table. When I do an inner join on these two tables and run an explain plan it shows the join operation as Map Join Operation instead of Sorted Merge Bucket Join Operator.
I have also set the below parameters
set hive.auto.convert.sortmerge.join=true; set hive.optimize.bucketmapjoin = true; set hive.optimize.bucketmapjoin.sortedmerge = true;
Why is it not able to convert map join to SMB join.
From hive manual it looks like the properties you are setting is for converting SMB join to map-join SMB
Yes, But the explain plan on the query shows that its just a 'Map Join Operator' instead of 'Sorted Merge Bucket Map Join Operator' .
So effectively its neither converting SMB to map-join SMB, its just doing map join instead of SMB or SMB map join
The conversation of a join to SMB seems to be depending up on the execution engine. If I run the below commands on using MR
drop table key_value_large; drop table key_value_small;
create table key_value_large ( key int, value string ) partitioned by (ds string) CLUSTERED BY (key) SORTED BY (key ASC) INTO 8 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE;
create table key_value_small ( key int, value string ) partitioned by (ds string) CLUSTERED BY (key) SORTED BY (key ASC) INTO 4 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE;
explain extended select count(*) from key_value_large a JOIN key_value_small b ON a.key = b.key
I can see a 'Sorted Merge Bucket Map Join Operator' in the explain statement,But If I set the execution engine to TEZ.
And then run the same explain plan I get to see 'Map Join Operator' instead of SMB map join in the plan.
Could anyone confirm if they could could get SMB map join in tez execution engine.
Was trying the above queries on HDP-126.96.36.199, with MR execution engine, getting below error message
0: jdbc:hive2://<hostname>:> explain extended select count(*) from key_value_large a JOIN key_value_small b ON a.key = b.key; Error: Error while compiling statement: FAILED: SemanticException [Error 10135]: Sort merge bucketed join could not be performed. If you really want to perform the operation, either set hive.optimize.bucketmapjoin.sortedmerge=false, or set hive.enforce.sortmergebucketmapjoin=false. (state=42000,code=10135)
With tez i could see Map Join not SMB Map Join
Try set hive.convert.join.bucket.mapjoin.tez=true;
Then you cans see Map Join gets converted to Bucket Map join.
If you set hive.enforce.sortmergebucketmapjoin=false; in MR then you will get through with that error