Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

What is Sort Merge Bucket (SMB) Join in Hive? When it is used?

avatar

Hi,

Can anyone explain What is Sort Merge Bucket (SMB) Join in Hive? When it is used?

1 ACCEPTED SOLUTION

avatar

I got below answer:

In SMB join in Hive, each mapper reads a bucket from the first table and the corresponding bucket from the second table and then a merge sort join is performed. Sort Merge Bucket (SMB) join in hive is mainly used as there is no limit on file or partition or table join. SMB join can best be used when the tables are large. In SMB join the columns are bucketed and sorted using the join columns. All tables should have the same number of buckets in SMB join.

View solution in original post

5 REPLIES 5

avatar
Master Mentor

avatar

@Artem Ervits, thanks for reply and link.

avatar
Contributor

Does these configuration mentioned in this page work on TEZ engine .I could see SMB working only on MR

avatar

I got below answer:

In SMB join in Hive, each mapper reads a bucket from the first table and the corresponding bucket from the second table and then a merge sort join is performed. Sort Merge Bucket (SMB) join in hive is mainly used as there is no limit on file or partition or table join. SMB join can best be used when the tables are large. In SMB join the columns are bucketed and sorted using the join columns. All tables should have the same number of buckets in SMB join.

avatar
Expert Contributor

@Rushikesh Deshmukh

What is the purpose of merging the tables used in joins ?? can you please explain??