Is there a max limit o partitions in hive
If I create a table partitioned as below
with the partitions as (importdate,classifier, grouping,date)
where I can have 365 import dates, 1 classifier, 10 groupings, and again 365 businessdates
the numbe of partitiosn for a year is going to be huge? 365*1*10*365
Two main comments: 1) that will be too many partitions and 2) consider flattened partitions over nested. (See links below).
Thus, you should concatenate your partition strings and break this up into more than one table (each table partition designed around a query pattern around where clause).
@ Greg Keys - the dataset is not small
Each day is 3-4 million rows partitioned by classifier=object, grouping=artifact and date=2017-01-01
I could merge everything into 1 folder which would remove the importdate partition so the total partition would be reduced
to 1 (/classifier=X/) * 10 (grouping=Y) * 365 (date=Z)
However I then lose the ability if an error occurs during merge to recover easily
With the existing partitioning If an import fails I can simply blow away that folder and redo ( then re-adding the partitions using add partition)
The date is grouped to allow queries such as
select * from table where classifier = 'object' and grouping='artificact' and businesdate ='2017-01-01'
select * from table where businesdate ='2017-01-01'
select * from table where businesdate ='2017-01-01' and table_column = 'XYZ'
and of course the select count(*) from table
There's no real correlation between the import date and the businesses, which contributes to the problem