Reply
New Contributor
Posts: 1
Registered: ‎02-03-2019

Dynamic Partitioning in Hive external table

Hi All,

I have a fact table which populates data on daily basis and carries data for 10 years. This is in sql server and i want to get it incrementally loaded to hive. Is partitioning a good idea ? The table contains data for 10 years based on around 10 systemIDs. I was thinking of having partition per quarter till last business day and separate partition for todays business day. Does it looks like good approach ? Or are there other ways as thumb rule to be followed ? Also how it can be implemented ? Please guide me.

Highlighted
Cloudera Employee
Posts: 4
Registered: ‎08-16-2018

Re: Dynamic Partitioning in Hive external table

Hello,

 

Yes. Partitioning is a good idea. Having a partition per quarter will lead to 40 partitions in total for 10 years which is good. I am not sure about "separate partition for todays business day" that you have mentioned. 

 

You can also have the data partitioned by month meaning 120 partitions for 10 years. Having more than 10k partitions for a table might often lead to issues when the table is queried, Hence it is not recommended.

 

Thanks

Announcements