Partitioned the table of size 39 TB on the date column. However most of the queries use id column to join with this huge table. I was wondering if adding bucketing would help on id column. Please recommend.
Use Bucketing using ID and use sort , also store as ORC file format and compress and set hive.exec.orc.split.strategy=BI;
'This is Employee table clustered by id sorted by age into 5 buckets'
BY(ID) SORTED BY(AGE)INTO 5 BUCKETS
tblproperties (“orc.compress” = “SNAPPY”);
Let me know if you face any issue