Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Unable to see Sorted Merge Bucket Join Operator in analyze query

Highlighted

Unable to see Sorted Merge Bucket Join Operator in analyze query

New Contributor

I have created two tables using bucketing and ordered them by the common key. I have use clustered by and order by while creating the table. When I do an inner join on these two tables and run an explain plan it shows the join operation as Map Join Operation instead of Sorted Merge Bucket Join Operator.

I have also set the below parameters

set hive.auto.convert.sortmerge.join=true;
set hive.optimize.bucketmapjoin = true;
set hive.optimize.bucketmapjoin.sortedmerge = true;
set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat; set hive.optimize.bucketmapjoin =true; set hive.optimize.bucketmapjoin.sortedmerge =true;

Why is it not able to convert map join to SMB join.

5 REPLIES 5

Re: Unable to see Sorted Merge Bucket Join Operator in analyze query

Contributor

From hive manual it looks like the properties you are setting is for converting SMB join to map-join SMB

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization#LanguageManualJoinO...

Re: Unable to see Sorted Merge Bucket Join Operator in analyze query

New Contributor

Yes, But the explain plan on the query shows that its just a 'Map Join Operator' instead of 'Sorted Merge Bucket Map Join Operator' .

So effectively its neither converting SMB to map-join SMB, its just doing map join instead of SMB or SMB map join

Re: Unable to see Sorted Merge Bucket Join Operator in analyze query

New Contributor

The conversation of a join to SMB seems to be depending up on the execution engine. If I run the below commands on using MR

set hive.execution.engine=mr;

set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;

set hive.auto.convert.sortmerge.join=true;

set hive.optimize.bucketmapjoin=true;

set hive.optimize.bucketmapjoin.sortedmerge=true;

set hive.enforce.bucketing=true;

set hive.enforce.sorting=true;

set hive.auto.convert.join=true;

drop table key_value_large; drop table key_value_small;

create table key_value_large ( key int, value string ) partitioned by (ds string) CLUSTERED BY (key) SORTED BY (key ASC) INTO 8 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE;

create table key_value_small ( key int, value string ) partitioned by (ds string) CLUSTERED BY (key) SORTED BY (key ASC) INTO 4 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE;

explain extended select count(*) from key_value_large a JOIN key_value_small b ON a.key = b.key

I can see a 'Sorted Merge Bucket Map Join Operator' in the explain statement,But If I set the execution engine to TEZ.

set hive.execution.engine=tez;

And then run the same explain plan I get to see 'Map Join Operator' instead of SMB map join in the plan.

Could anyone confirm if they could could get SMB map join in tez execution engine.

Re: Unable to see Sorted Merge Bucket Join Operator in analyze query

Contributor

Was trying the above queries on HDP-2.6.1.0, with MR execution engine, getting below error message

0: jdbc:hive2://<hostname>:> explain extended select count(*) from key_value_large a JOIN key_value_small b ON a.key = b.key; Error: Error while compiling statement: FAILED: SemanticException [Error 10135]: Sort merge bucketed join could not be performed. If you really want to perform the operation, either set hive.optimize.bucketmapjoin.sortedmerge=false, or set hive.enforce.sortmergebucketmapjoin=false. (state=42000,code=10135)

With tez i could see Map Join not SMB Map Join

Re: Unable to see Sorted Merge Bucket Join Operator in analyze query

New Contributor

Try set hive.convert.join.bucket.mapjoin.tez=true;

Then you cans see Map Join gets converted to Bucket Map join.

If you set hive.enforce.sortmergebucketmapjoin=false; in MR then you will get through with that error