Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

BucketMap Join or SMB MapJoin not generating or creating???

BucketMap Join or SMB MapJoin not generating or creating???

Rising Star

I have two table bucket_small and bucket_big as following

hive> show create table bucket_big;
OK
CREATE TABLE `bucket_big`(
  `id` int,
  `student_id` string,
  `student_name` string,
  `course_id` int)
CLUSTERED BY (
  course_id)
INTO 4 BUCKETS
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://hdp1.stratapps.com:8020/apps/hive/warehouse/bucket_big'
TBLPROPERTIES (
  'COLUMN_STATS_ACCURATE'='true',
  'numFiles'='4',
  'numRows'='10',
  'rawDataSize'='148',
  'totalSize'='158',
  'transient_lastDdlTime'='1451302285')
Time taken: 0.166 seconds, Fetched: 23 row(s)
hive>


hive> show create table bucket_small;
OK
CREATE TABLE `bucket_small`(
  `course_id` int,
  `course_name` string)
CLUSTERED BY (
  course_id)
INTO 2 BUCKETS
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://hdp1.stratapps.com:8020/apps/hive/warehouse/bucket_small'
TBLPROPERTIES (
  'COLUMN_STATS_ACCURATE'='true',
  'numFiles'='2',
  'numRows'='6',
  'rawDataSize'='39',
  'totalSize'='45',
  'transient_lastDdlTime'='1451302349')
Time taken: 0.172 seconds, Fetched: 21 row(s)

And i inserted Data as following.

hive>insert overwrite table bucket_big select *from table_one;
hive>insert overwrite table bucket_small select *from table_two;

Now tables have buckets as i expected.And i set the following configurations.

hive>set hive.auto.convert.join=true;
hive>set hive.auto.convert.sortmerge.join=true;
hive>set hive.optimize.bucketmapjoin = true;
hive>set hive.optimize.bucketmapjoin.sortedmerge = true;

When i run the following query,

hive> explain select /*+ MAPJOIN(a) */ b.student_id,a.course_name FROM bucket_small a JOIN bucket_big b ON a.course_id = b.course_id;
OK
STAGE DEPENDENCIES:
  Stage-4 is a root stage
  Stage-3 depends on stages: Stage-4
  Stage-0 depends on stages: Stage-3


STAGE PLANS:
  Stage: Stage-4
    Map Reduce Local Work
      Alias -> Map Local Tables:
        a
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        a
          TableScan
            alias: a
            filterExpr: course_id is not null (type: boolean)
            Statistics: Num rows: 6 Data size: 39 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: course_id is not null (type: boolean)
              Statistics: Num rows: 3 Data size: 19 Basic stats: COMPLETE Column stats: NONE
              HashTable Sink Operator
                condition expressions:
                  0 {course_name}
                  1 {student_id}
                keys:
                  0 course_id (type: int)
                  1 course_id (type: int)


  Stage: Stage-3
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: b
            filterExpr: course_id is not null (type: boolean)
            Statistics: Num rows: 10 Data size: 148 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: course_id is not null (type: boolean)
              Statistics: Num rows: 5 Data size: 74 Basic stats: COMPLETE Column stats: NONE
              Map Join Operator
              -----------------
                condition map:
                     Inner Join 0 to 1
                condition expressions:
                  0 {course_name}
                  1 {student_id}
                keys:
                  0 course_id (type: int)
                  1 course_id (type: int)
                outputColumnNames: _col1, _col6
                Statistics: Num rows: 5 Data size: 81 Basic stats: COMPLETE Column stats: NONE
                Select Operator
                  expressions: _col6 (type: string), _col1 (type: string)
                  outputColumnNames: _col0, _col1
                  Statistics: Num rows: 5 Data size: 81 Basic stats: COMPLETE Column stats: NONE
                  File Output Operator
                    compressed: false
                    Statistics: Num rows: 5 Data size: 81 Basic stats: COMPLETE Column stats: NONE
                    table:
                        input format: org.apache.hadoop.mapred.TextInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                        serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      Local Work:
        Map Reduce Local Work


  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink


Time taken: 0.299 seconds, Fetched: 70 row(s)
hive>

It is generating only Map Only join not Bucket Map join or SMB Map join.What is the problem ???

						
5 REPLIES 5

Re: BucketMap Join or SMB MapJoin not generating or creating???

Master Collaborator

For SMBJoin you will also need the sorted by (course_id) in your create table DDL.

Re: BucketMap Join or SMB MapJoin not generating or creating???

Rising Star

You need to set the bucketing parameters while loading the bucketed tables as well.

Re: BucketMap Join or SMB MapJoin not generating or creating???

Rising Star

@hrongali

I tried like that also ,

set hive.auto.convert.sortmerge.join=true;
set hive.optimize.bucketmapjoin=true;
set hive.optimize.bucketmapjoin.sortedmerge=true;
set hive.enforce.bucketing=true;
set hive.enforce.sorting=true;
set hive.auto.convert.join=true;
Now ,i make the tables sorted along with bucketing then i got following error

hive> explain select * FROM bucket_small a JOIN bucket_big b ON a.key = b.key;
FAILED: SemanticException [Error 10135]: Sort merge bucketed join could not be performed. If you really want to perform the operation, either set hive.optimize.bucketmapjoin.sortedmerge=false, or set hive.enforce.sortmergebucketmapjoin=false.
when i make following property false and execute plan,map join is generating
 

hive> set hive.optimize.bucketmapjoin.sortedmerge=false;

Highlighted

Re: BucketMap Join or SMB MapJoin not generating or creating???

Rising Star

You can try setting the auto join option with below 2 additional properties.

set hive.auto.convert.join.noconditionaltask = true;

set hive.auto.convert.join.noconditionaltask.size = 10000000;

Also there is an option to set the big table selection policy using the following configuration:

set hive.auto.convert.sortmerge.join.bigtable.selection.policy = org.apache.hadoop.hive.ql.optimizer.TableSizeBasedBigTableSelectorForAutoSMJ;

please try if that helps.

Re: BucketMap Join or SMB MapJoin not generating or creating???

Mentor

@Suresh Bonam has this been resolved? Can you post your solution or accept the best answer?