Support Questions

kundansonuj · ‎03-28-2017

I need to add hourly partition in kudu table , for eg a similar create table in impala for parquet table would be :

CREATE TABLE  tbl_impala_parquet( col1 int, col2 String, col3 String)  PARTITIONED BY (   hour TINYINT,    day TINYINT,    month TINYINT,    year INT ) STORED AS PARQUET;

I could have used range partition on hour with splits from 0-23; but will need year, month, day and then hour as partition. Any suggestion will be greatly appreciated.

Regards,

Kundan

Adar · ‎03-28-2017

Kudu tables can be range partitioned on multiple columns, though the syntax is slightly different than the Parquet syntax. Take a look at https://kudu.apache.org/docs/kudu_impala_integration.html#_specifying_tablet_partitioning to see what I mean.

https://kudu.apache.org/docs/kudu_impala_integration.html#partitioning_tables also has much more information.

kundansonuj · ‎04-14-2017

Here is a simple exmaple of month and year wise partitioned table in kudu, thoush partition is not dynamic and will have to add more partition after the existing partitions exhausts:

CREATE TABLE sample_table (year INT,month INT,day INT,some_id INT,some_value INT, PRIMARY KEY(year,month,day,some_id) )
 PARTITION BY HASH (month) PARTITIONS 12,
 RANGE(year)
(
  PARTITION VALUE = (2017),
  PARTITION VALUE = (2018),
   PARTITION VALUE = (2019),
   PARTITION VALUE = (2020)
  
) 
 STORED AS KUDU;

Note: The partiton by column has to be declared in the first columns while createing tables, i didnt get the reason where as in parquet it had to be declared in the last.
This will create 48 partitions for replication of 3: (12*3)+(4*3)
After the year 2020 will have to add new partitons :

ALTER TABLE sales_by_year ADD RANGE PARTITION VALUE = 2021;

Dan Burkert · ‎04-14-2017

It's also possible to range partition on multiple columns, so you could potentially skip the hash partitions if you wanted (although it requires specifying many more explicit range partitions):

CREATE TABLE sample_table (year INT,month INT,day INT,some_id INT,some_value INT, PRIMARY KEY(year,month,day,some_id))
RANGE(year, month) (
 PARTITION VALUE = (2017, 01),
 PARTITION VALUE = (2017, 02),
 PARTITION VALUE = (2017, 03),
 PARTITION VALUE = (2017, 04),
 PARTITION VALUE = (2017, 05),
 PARTITION VALUE = (2017, 06),
 PARTITION VALUE = (2017, 07),
 PARTITION VALUE = (2017, 08),
 PARTITION VALUE = (2017, 09),
 PARTITION VALUE = (2017, 10),
 PARTITION VALUE = (2017, 11),
 PARTITION VALUE = (2017, 12),
 PARTITION VALUE = (2018, 01),
 PARTITION VALUE = (2018, 02),
) STORED AS KUDU;

I think the reason you had to specify the year and month up front in the column list is that currently Kudu tables have a restriction that the primary key columns must come first in the column list.

Althotta · ‎01-31-2024

Hi @Dan Burkert , i have tried exactly the same DDL and it give the error. I used hue/impala to create the table. Can you help?

ParseException: Syntax error in line 4:undefined: RANGE(year, month) ( ^ Encountered: RANGE Expected: CACHED, COMMENT, LOCATION, PARTITION, PARTITIONED, PRODUCED, ROW, SORT, STORED, TBLPROPERTIES, UNCACHED, WITH CAUSED BY: Exception: Syntax error

DE_dojo · ‎05-24-2022

Is there any way in Kudu, dynamic partitions can be created which can automatically add new partition when the new date(month) is available.

e.g. in the above post

RANGE(year)
(
  PARTITION VALUE = (2017),
  PARTITION VALUE = (2018),
   PARTITION VALUE = (2019),
   PARTITION VALUE = (2020)
  
)

Values after 2020 will get rejected, any workaround so create partitions based on interval

PARTITION BY RANGE (MY_DATE)
INTERVAL( NUMTODSINTERVAL(1,'YEAR')) -- Syntax from a famous relational DB

Cloudera Community

Support Questions

Creating dynamic range partition in kudu