02-13-2017 09:17 AM
Hi Impala/Kudu gurus,
I'm extremely excited by the new Impala/Kudu release that supports non-covering range partition, as described here: https://github.com/cloudera/kudu/blob/master/docs/design-docs/non-covering-range-partitions.md
and here: https://gerrit.cloudera.org/#/c/4856/
Yet I haven't figured out how exactly to use it to support rolling window data retention that our business needs. The syntax descibed in the 2nd document above still seems to require static partition specification.
What we need is the ability to auto-create new partitions based on a timestamp expression so that each partition contains x days of data only. We then can drop the old partitions based on our data retention policy on a per table basis.
As a comparison, the similar function is provided by Oracle's range interval partition:
PARTITION BY RANGE (CREATION_DATE) INTERVAL (NUMTODSINTERVAL(7, 'DAY'))
and Vertica's partition key expression:
PARTITION BY (floor((((tbl.creation_ts)::date - '0001-12-31 BC'::date) / 3)))
03-22-2017 11:11 AM
Unfortunately Kudu partitions must be pre-defined as you suspected, so the Oracle syntax you described won't work for Impala. However, you can add and drop range partitions even after the table is created, so you can manually add the next hour/day/week partition, and drop some historical partition. The syntax is described in the latest version of the CDH documentation:
04-13-2017 06:36 AM
It seems we are going down this path for now. It is close enough to what we have in Vertica, which does grow new partitions automatically though.