Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

Column recommendation to partition huge hive table

New Contributor

Partitioned the table of size 39 TB on the date column. However most of the queries use id column to join with this huge table. I was wondering if adding bucketing would help on id column. Please recommend.

1 REPLY 1

New Contributor

Use Bucketing using ID and use sort , also store as ORC file format and compress and set hive.exec.orc.split.strategy=BI;

Example :

CREATE TABLE Employee(

ID BIGINT,

NAME STRING,

AGE INT,

SALARY BIGINT,

DEPARTMENT STRING

)

COMMENT 'This is Employee table clustered by id sorted by age into 5 buckets'

CLUSTERED BY(ID) SORTED BY(AGE)INTO 5 BUCKETS

STORED AS ORC

tblproperties (“orc.compress” = “SNAPPY”);


Let me know if you face any issue