Archives of Support Questions (Read Only)

rushikeshdeshmu · ‎03-12-2016

Hi,

Can anyone explain What is Sort Merge Bucket (SMB) Join in Hive? When it is used?

rushikeshdeshmu · ‎03-12-2016

I got below answer:

In SMB join in Hive, each mapper reads a bucket from the first table and the corresponding bucket from the second table and then a merge sort join is performed. Sort Merge Bucket (SMB) join in hive is mainly used as there is no limit on file or partition or table join. SMB join can best be used when the tables are large. In SMB join the columns are bucketed and sorted using the join columns. All tables should have the same number of buckets in SMB join.

View solution in original post

aervits · ‎03-12-2016

please refer to Hive wiki https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization

rushikeshdeshmu · ‎03-12-2016

@Artem Ervits, thanks for reply and link.

viswanath_kammu · ‎06-09-2017

Does these configuration mentioned in this page work on TEZ engine .I could see SMB working only on MR

rushikeshdeshmu · ‎03-12-2016

I got below answer:

In SMB join in Hive, each mapper reads a bucket from the first table and the corresponding bucket from the second table and then a merge sort join is performed. Sort Merge Bucket (SMB) join in hive is mainly used as there is no limit on file or partition or table join. SMB join can best be used when the tables are large. In SMB join the columns are bucketed and sorted using the join columns. All tables should have the same number of buckets in SMB join.

shivanageshch · ‎09-19-2016

@Rushikesh Deshmukh

What is the purpose of merging the tables used in joins ?? can you please explain??