About Soa

Soa · ‎02-10-2022

Hello, I have a question regarding hive bucketed tables (bucketed only no partitions) optimization. Now, I know that if we have table A and table B and we want to join them using A.COL1 and B.COL1 ( A.COL1 = B.COL1), we should bucket both table A and B on col1 into same no of buckets or its multiple. But what if we have more than 2 tables ? for example, I have table A and I want to join it on table B and table C. table A is joined with table B using COL1 (A.COL1 = B.COL1) and table A is joined with table C using COL2 (A.COL2 = C.COL2 ) what columns should I cluster by for table A ??? is it bucket by COL1 and COL2 (clustered by col1,col2)? in summary, how can I optimize one table if it is joined with more than one table using buckets only. Thanks in advance.

Online	Offline
Last Visited	‎02-10-2022 04:00 PM

Member Since	‎02-10-2022 11:56 AM
Last Visited	‎02-10-2022 04:00 PM
Posts	1

Cloudera Community

How bucketing helps in case of more than two table...