Member since
02-10-2022
1
Post
0
Kudos Received
0
Solutions
02-10-2022
12:16 PM
Hello, I have a question regarding hive bucketed tables (bucketed only no partitions) optimization. Now, I know that if we have table A and table B and we want to join them using A.COL1 and B.COL1 ( A.COL1 = B.COL1), we should bucket both table A and B on col1 into same no of buckets or its multiple. But what if we have more than 2 tables ? for example, I have table A and I want to join it on table B and table C. table A is joined with table B using COL1 (A.COL1 = B.COL1) and table A is joined with table C using COL2 (A.COL2 = C.COL2 ) what columns should I cluster by for table A ??? is it bucket by COL1 and COL2 (clustered by col1,col2)? in summary, how can I optimize one table if it is joined with more than one table using buckets only. Thanks in advance.
... View more
Labels: