02-03-2019 04:00 AM
I have a fact table which populates data on daily basis and carries data for 10 years. This is in sql server and i want to get it incrementally loaded to hive. Is partitioning a good idea ? The table contains data for 10 years based on around 10 systemIDs. I was thinking of having partition per quarter till last business day and separate partition for todays business day. Does it looks like good approach ? Or are there other ways as thumb rule to be followed ? Also how it can be implemented ? Please guide me.
02-11-2019 02:50 AM
Yes. Partitioning is a good idea. Having a partition per quarter will lead to 40 partitions in total for 10 years which is good. I am not sure about "separate partition for todays business day" that you have mentioned.
You can also have the data partitioned by month meaning 120 partitions for 10 years. Having more than 10k partitions for a table might often lead to issues when the table is queried, Hence it is not recommended.