I'm trying to explore bucket map join. Theoretically, it is an optimized map join where not all of the small table but required buckets of a small table are sent to every mapper.
I have created two tables user_plays_buck and small_user_subscription_buck, both having 16 buckets, bucketed on same key, which is also used as join key.
Even if I enable bucket map join, still only map join gets called.
ast-bucket-map-join.txt contains the AST generated after I attempt to perform bucket map join.
ast-map-join.txt contains the AST generated after I attempt to perform map join.
ddl-small-user-subscription-buck.txt contains DDL of small table.
ddl-user-plays-buck.txt contains DDL of bigger table.
small-user-subscription-buck-files.txt contains files present in the smaller table.
Any insight on this will be highly appreciated.
did you try the following steps in order?
create table abc(col0 string,col1 string,col2 string,col3 string,col4 string,col5 string,col6 string) clustered by (col0) into 16 buckets; create table xyz(col0 string,col1 string,col2 string,col3 string,col4 string,col5 string,col6 string) clustered by (col0) into 16 buckets; set hive.enforce.bucketing = true; insert OVERWRITE table abc insert OVERWRITE table xyz set hive.optimize.bucketmapjoin=true; explain select /*+ MAPJOIN(b2) */ abc.* from abc,xyz where abc.col0=xyz.col0 ;
I have tried that, but still it goes for Map Join only.
Bucket Map Join is not invoked.
What is the value of hive.enforce.bucketing on your setup? It should be set to true. Can you try your explain query after setting hive.ignore.mapjoin.hint=false?