Support Questions
Find answers, ask questions, and share your expertise

BucketMap Join or SMB MapJoin not generating or creating???

BucketMap Join or SMB MapJoin not generating or creating???

Rising Star

I have two table bucket_small and bucket_big as following

hive> show create table bucket_big;
OK
CREATE TABLE `bucket_big`(
  `id` int,
  `student_id` string,
  `student_name` string,
  `course_id` int)
CLUSTERED BY (
  course_id)
INTO 4 BUCKETS
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://hdp1.stratapps.com:8020/apps/hive/warehouse/bucket_big'
TBLPROPERTIES (
  'COLUMN_STATS_ACCURATE'='true',
  'numFiles'='4',
  'numRows'='10',
  'rawDataSize'='148',
  'totalSize'='158',
  'transient_lastDdlTime'='1451302285')
Time taken: 0.166 seconds, Fetched: 23 row(s)
hive>


hive> show create table bucket_small;
OK
CREATE TABLE `bucket_small`(
  `course_id` int,
  `course_name` string)
CLUSTERED BY (
  course_id)
INTO 2 BUCKETS
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://hdp1.stratapps.com:8020/apps/hive/warehouse/bucket_small'
TBLPROPERTIES (
  'COLUMN_STATS_ACCURATE'='true',
  'numFiles'='2',
  'numRows'='6',
  'rawDataSize'='39',
  'totalSize'='45',
  'transient_lastDdlTime'='1451302349')
Time taken: 0.172 seconds, Fetched: 21 row(s)

And i inserted Data as following.

hive>insert overwrite table bucket_big select *from table_one;
hive>insert overwrite table bucket_small select *from table_two;

Now tables have buckets as i expected.And i set the following configurations.

hive>set hive.auto.convert.join=true;
hive>set hive.auto.convert.sortmerge.join=true;
hive>set hive.optimize.bucketmapjoin = true;
hive>set hive.optimize.bucketmapjoin.sortedmerge = true;

When i run the following query,

hive> explain select /*+ MAPJOIN(a) */ b.student_id,a.course_name FROM bucket_small a JOIN bucket_big b ON a.course_id = b.course_id;
OK
STAGE DEPENDENCIES:
  Stage-4 is a root stage
  Stage-3 depends on stages: Stage-4
  Stage-0 depends on stages: Stage-3


STAGE PLANS:
  Stage: Stage-4
    Map Reduce Local Work
      Alias -> Map Local Tables:
        a
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        a
          TableScan
            alias: a
            filterExpr: course_id is not null (type: boolean)
            Statistics: Num rows: 6 Data size: 39 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: course_id is not null (type: boolean)
              Statistics: Num rows: 3 Data size: 19 Basic stats: COMPLETE Column stats: NONE
              HashTable Sink Operator
                condition expressions:
                  0 {course_name}
                  1 {student_id}
                keys:
                  0 course_id (type: int)
                  1 course_id (type: int)


  Stage: Stage-3
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: b
            filterExpr: course_id is not null (type: boolean)
            Statistics: Num rows: 10 Data size: 148 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: course_id is not null (type: boolean)
              Statistics: Num rows: 5 Data size: 74 Basic stats: COMPLETE Column stats: NONE
              Map Join Operator
              -----------------
                condition map:
                     Inner Join 0 to 1
                condition expressions:
                  0 {course_name}
                  1 {student_id}
                keys:
                  0 course_id (type: int)
                  1 course_id (type: int)
                outputColumnNames: _col1, _col6
                Statistics: Num rows: 5 Data size: 81 Basic stats: COMPLETE Column stats: NONE
                Select Operator
                  expressions: _col6 (type: string), _col1 (type: string)
                  outputColumnNames: _col0, _col1
                  Statistics: Num rows: 5 Data size: 81 Basic stats: COMPLETE Column stats: NONE
                  File Output Operator
                    compressed: false
                    Statistics: Num rows: 5 Data size: 81 Basic stats: COMPLETE Column stats: NONE
                    table:
                        input format: org.apache.hadoop.mapred.TextInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                        serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      Local Work:
        Map Reduce Local Work


  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink


Time taken: 0.299 seconds, Fetched: 70 row(s)
hive>

It is generating only Map Only join not Bucket Map join or SMB Map join.What is the problem ???

						
5 REPLIES 5
Highlighted

Re: BucketMap Join or SMB MapJoin not generating or creating???

Master Collaborator

For SMBJoin you will also need the sorted by (course_id) in your create table DDL.

Highlighted

Re: BucketMap Join or SMB MapJoin not generating or creating???

Rising Star

You need to set the bucketing parameters while loading the bucketed tables as well.

Highlighted

Re: BucketMap Join or SMB MapJoin not generating or creating???

Rising Star

@hrongali

I tried like that also ,

set hive.auto.convert.sortmerge.join=true;
set hive.optimize.bucketmapjoin=true;
set hive.optimize.bucketmapjoin.sortedmerge=true;
set hive.enforce.bucketing=true;
set hive.enforce.sorting=true;
set hive.auto.convert.join=true;
Now ,i make the tables sorted along with bucketing then i got following error

hive> explain select * FROM bucket_small a JOIN bucket_big b ON a.key = b.key;
FAILED: SemanticException [Error 10135]: Sort merge bucketed join could not be performed. If you really want to perform the operation, either set hive.optimize.bucketmapjoin.sortedmerge=false, or set hive.enforce.sortmergebucketmapjoin=false.
when i make following property false and execute plan,map join is generating
 

hive> set hive.optimize.bucketmapjoin.sortedmerge=false;

Highlighted

Re: BucketMap Join or SMB MapJoin not generating or creating???

Rising Star

You can try setting the auto join option with below 2 additional properties.

set hive.auto.convert.join.noconditionaltask = true;

set hive.auto.convert.join.noconditionaltask.size = 10000000;

Also there is an option to set the big table selection policy using the following configuration:

set hive.auto.convert.sortmerge.join.bigtable.selection.policy = org.apache.hadoop.hive.ql.optimizer.TableSizeBasedBigTableSelectorForAutoSMJ;

please try if that helps.

Highlighted

Re: BucketMap Join or SMB MapJoin not generating or creating???

Mentor

@Suresh Bonam has this been resolved? Can you post your solution or accept the best answer?