Created 03-12-2016 09:19 AM
Hi,
Can anyone explain What is Sort Merge Bucket (SMB) Join in Hive? When it is used?
Created 03-12-2016 10:04 AM
I got below answer:
In SMB join in Hive, each mapper reads a bucket from the first table and the corresponding bucket from the second table and then a merge sort join is performed. Sort Merge Bucket (SMB) join in hive is mainly used as there is no limit on file or partition or table join. SMB join can best be used when the tables are large. In SMB join the columns are bucketed and sorted using the join columns. All tables should have the same number of buckets in SMB join.
Created 03-12-2016 09:45 AM
please refer to Hive wiki https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization
Created 03-12-2016 10:01 AM
@Artem Ervits, thanks for reply and link.
Created 06-09-2017 03:11 PM
Does these configuration mentioned in this page work on TEZ engine .I could see SMB working only on MR
Created 03-12-2016 10:04 AM
I got below answer:
In SMB join in Hive, each mapper reads a bucket from the first table and the corresponding bucket from the second table and then a merge sort join is performed. Sort Merge Bucket (SMB) join in hive is mainly used as there is no limit on file or partition or table join. SMB join can best be used when the tables are large. In SMB join the columns are bucketed and sorted using the join columns. All tables should have the same number of buckets in SMB join.
Created 09-19-2016 05:00 AM
What is the purpose of merging the tables used in joins ?? can you please explain??