Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

What is Sort Merge Bucket (SMB) Join in Hive? When it is used?

avatar

Hi,

Can anyone explain What is Sort Merge Bucket (SMB) Join in Hive? When it is used?

1 ACCEPTED SOLUTION

avatar

I got below answer:

In SMB join in Hive, each mapper reads a bucket from the first table and the corresponding bucket from the second table and then a merge sort join is performed. Sort Merge Bucket (SMB) join in hive is mainly used as there is no limit on file or partition or table join. SMB join can best be used when the tables are large. In SMB join the columns are bucketed and sorted using the join columns. All tables should have the same number of buckets in SMB join.

View solution in original post

5 REPLIES 5

avatar
Master Mentor

avatar

@Artem Ervits, thanks for reply and link.

avatar

Does these configuration mentioned in this page work on TEZ engine .I could see SMB working only on MR

avatar

I got below answer:

In SMB join in Hive, each mapper reads a bucket from the first table and the corresponding bucket from the second table and then a merge sort join is performed. Sort Merge Bucket (SMB) join in hive is mainly used as there is no limit on file or partition or table join. SMB join can best be used when the tables are large. In SMB join the columns are bucketed and sorted using the join columns. All tables should have the same number of buckets in SMB join.

avatar
Expert Contributor

@Rushikesh Deshmukh

What is the purpose of merging the tables used in joins ?? can you please explain??