Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

What is Sort Merge Bucket (SMB) Join in Hive? When it is used?

Solved Go to solution

What is Sort Merge Bucket (SMB) Join in Hive? When it is used?

Hi,

Can anyone explain What is Sort Merge Bucket (SMB) Join in Hive? When it is used?

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: What is Sort Merge Bucket (SMB) Join in Hive? When it is used?

I got below answer:

In SMB join in Hive, each mapper reads a bucket from the first table and the corresponding bucket from the second table and then a merge sort join is performed. Sort Merge Bucket (SMB) join in hive is mainly used as there is no limit on file or partition or table join. SMB join can best be used when the tables are large. In SMB join the columns are bucketed and sorted using the join columns. All tables should have the same number of buckets in SMB join.

5 REPLIES 5

Re: What is Sort Merge Bucket (SMB) Join in Hive? When it is used?

Mentor

Re: What is Sort Merge Bucket (SMB) Join in Hive? When it is used?

@Artem Ervits, thanks for reply and link.

Re: What is Sort Merge Bucket (SMB) Join in Hive? When it is used?

New Contributor

Does these configuration mentioned in this page work on TEZ engine .I could see SMB working only on MR

Highlighted

Re: What is Sort Merge Bucket (SMB) Join in Hive? When it is used?

I got below answer:

In SMB join in Hive, each mapper reads a bucket from the first table and the corresponding bucket from the second table and then a merge sort join is performed. Sort Merge Bucket (SMB) join in hive is mainly used as there is no limit on file or partition or table join. SMB join can best be used when the tables are large. In SMB join the columns are bucketed and sorted using the join columns. All tables should have the same number of buckets in SMB join.

Re: What is Sort Merge Bucket (SMB) Join in Hive? When it is used?

Contributor

@Rushikesh Deshmukh

What is the purpose of merging the tables used in joins ?? can you please explain??