Reply
Explorer
Posts: 9
Registered: ‎04-18-2016

Creating dynamic range partition in kudu

I need to add hourly partition in kudu table , for eg a similar create table in impala for parquet table would be :

CREATE TABLE  tbl_impala_parquet( col1 int, col2 String, col3 String)  PARTITIONED BY (   hour TINYINT,    day TINYINT,    month TINYINT,    year INT ) STORED AS PARQUET;

I could have used range partition on hour with splits from 0-23; but will need year, month, day and then hour as partition. Any suggestion will be greatly appreciated.

 

Regards,

Kundan

Cloudera Employee
Posts: 46
Registered: ‎02-05-2016

Re: Creating dynamic range partition in kudu

Kudu tables can be range partitioned on multiple columns, though the syntax is slightly different than the Parquet syntax. Take a look at https://kudu.apache.org/docs/kudu_impala_integration.html#_specifying_tablet_partitioning to see what I mean.

 

https://kudu.apache.org/docs/kudu_impala_integration.html#partitioning_tables also has much more information.

Explorer
Posts: 9
Registered: ‎04-18-2016

Re: Creating dynamic range partition in kudu

Here is a simple exmaple of month and year wise partitioned table in kudu, thoush partition is not dynamic and will have to add more partition after the existing partitions exhausts:

CREATE TABLE sample_table (year INT,month INT,day INT,some_id INT,some_value INT, PRIMARY KEY(year,month,day,some_id) )
 PARTITION BY HASH (month) PARTITIONS 12,
 RANGE(year)
(
  PARTITION VALUE = (2017),
  PARTITION VALUE = (2018),
   PARTITION VALUE = (2019),
   PARTITION VALUE = (2020)
  
) 
 STORED AS KUDU;

Note: The partiton by column has to be declared in the first columns while createing tables, i didnt get the reason where as in parquet it had to be declared in the last.
This will create 48 partitions for replication of 3: (12*3)+(4*3)
After the year 2020 will have to add new partitons :

ALTER TABLE sales_by_year ADD RANGE PARTITION VALUE = 2021;
 
Highlighted
Cloudera Employee
Posts: 6
Registered: ‎09-28-2015

Re: Creating dynamic range partition in kudu

It's also possible to range partition on multiple columns, so you could potentially skip the hash partitions if you wanted (although it requires specifying many more explicit range partitions):

 

CREATE TABLE sample_table (year INT,month INT,day INT,some_id INT,some_value INT, PRIMARY KEY(year,month,day,some_id))
RANGE(year, month) (
PARTITION VALUE = (2017, 01),
PARTITION VALUE = (2017, 02),
PARTITION VALUE = (2017, 03),
PARTITION VALUE = (2017, 04),
PARTITION VALUE = (2017, 05),
PARTITION VALUE = (2017, 06),
PARTITION VALUE = (2017, 07),
PARTITION VALUE = (2017, 08),
PARTITION VALUE = (2017, 09),
PARTITION VALUE = (2017, 10),
PARTITION VALUE = (2017, 11),
PARTITION VALUE = (2017, 12),
PARTITION VALUE = (2018, 01),
PARTITION VALUE = (2018, 02),
) STORED AS KUDU;

 

I think the reason you had to specify the year and month up front in the column list is that currently Kudu tables have a restriction that the primary key columns must come first in the column list.

Announcements