Support Questions

Find answers, ask questions, and share your expertise

Impala/Kudu non-covering range partition to support rolling window data retention

avatar
Explorer

Hi Impala/Kudu gurus,

 

I'm extremely excited by the new Impala/Kudu release that supports non-covering range partition, as described here: https://github.com/cloudera/kudu/blob/master/docs/design-docs/non-covering-range-partitions.md

and here: https://gerrit.cloudera.org/#/c/4856/

 

Yet I haven't figured out how exactly to use it to support rolling window data retention that our business needs. The syntax descibed in the 2nd document above still seems to require static partition specification.

 

What we need is the ability to auto-create new partitions based on a timestamp expression so that each partition contains x days of data only. We then can drop the old partitions based on our data retention policy on a per table basis.

 

As a comparison, the similar function is provided by Oracle's range interval partition:

 

 

PARTITION BY RANGE (CREATION_DATE)
INTERVAL (NUMTODSINTERVAL(7, 'DAY'))

and Vertica's partition key expression:

 

PARTITION BY (floor((((tbl.creation_ts)::date - '0001-12-31 BC'::date) / 3)))

 

Thanks,

Brian

2 ACCEPTED SOLUTIONS

avatar
Contributor

Hi Brian,

 

Unfortunately Kudu partitions must be pre-defined as you suspected, so the Oracle syntax you described won't work for Impala. However, you can add and drop range partitions even after the table is created, so you can manually add the next hour/day/week partition, and drop some historical partition. The syntax is described in the latest version of the CDH documentation:

https://www.cloudera.com/documentation/enterprise/latest/topics/impala_kudu.html#kudu_range_partitio...

 

Best,

Matt

View solution in original post

avatar
Explorer

Hi Matt,

 

It seems we are going down this path for now. It is close enough to what we have in Vertica, which does grow new partitions automatically though.

 

Thanks,

Brian

View solution in original post

2 REPLIES 2

avatar
Contributor

Hi Brian,

 

Unfortunately Kudu partitions must be pre-defined as you suspected, so the Oracle syntax you described won't work for Impala. However, you can add and drop range partitions even after the table is created, so you can manually add the next hour/day/week partition, and drop some historical partition. The syntax is described in the latest version of the CDH documentation:

https://www.cloudera.com/documentation/enterprise/latest/topics/impala_kudu.html#kudu_range_partitio...

 

Best,

Matt

avatar
Explorer

Hi Matt,

 

It seems we are going down this path for now. It is close enough to what we have in Vertica, which does grow new partitions automatically though.

 

Thanks,

Brian