- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Creating dynamic range partition in kudu
- Labels:
-
Apache Impala
-
Apache Kudu
Created on 03-28-2017 02:30 AM - edited 09-16-2022 04:21 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I need to add hourly partition in kudu table , for eg a similar create table in impala for parquet table would be :
CREATE TABLE tbl_impala_parquet( col1 int, col2 String, col3 String) PARTITIONED BY ( hour TINYINT, day TINYINT, month TINYINT, year INT ) STORED AS PARQUET;
I could have used range partition on hour with splits from 0-23; but will need year, month, day and then hour as partition. Any suggestion will be greatly appreciated.
Regards,
Kundan
Created 03-28-2017 10:54 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Kudu tables can be range partitioned on multiple columns, though the syntax is slightly different than the Parquet syntax. Take a look at https://kudu.apache.org/docs/kudu_impala_integration.html#_specifying_tablet_partitioning to see what I mean.
https://kudu.apache.org/docs/kudu_impala_integration.html#partitioning_tables also has much more information.
Created 04-14-2017 08:43 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here is a simple exmaple of month and year wise partitioned table in kudu, thoush partition is not dynamic and will have to add more partition after the existing partitions exhausts:
CREATE TABLE sample_table (year INT,month INT,day INT,some_id INT,some_value INT, PRIMARY KEY(year,month,day,some_id) ) PARTITION BY HASH (month) PARTITIONS 12, RANGE(year) ( PARTITION VALUE = (2017), PARTITION VALUE = (2018), PARTITION VALUE = (2019), PARTITION VALUE = (2020) ) STORED AS KUDU;
Note: The partiton by column has to be declared in the first columns while createing tables, i didnt get the reason where as in parquet it had to be declared in the last.
This will create 48 partitions for replication of 3: (12*3)+(4*3)
After the year 2020 will have to add new partitons :
ALTER TABLE sales_by_year ADD RANGE PARTITION VALUE = 2021;
Created 04-14-2017 10:35 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It's also possible to range partition on multiple columns, so you could potentially skip the hash partitions if you wanted (although it requires specifying many more explicit range partitions):
CREATE TABLE sample_table (year INT,month INT,day INT,some_id INT,some_value INT, PRIMARY KEY(year,month,day,some_id))
RANGE(year, month) (
PARTITION VALUE = (2017, 01),
PARTITION VALUE = (2017, 02),
PARTITION VALUE = (2017, 03),
PARTITION VALUE = (2017, 04),
PARTITION VALUE = (2017, 05),
PARTITION VALUE = (2017, 06),
PARTITION VALUE = (2017, 07),
PARTITION VALUE = (2017, 08),
PARTITION VALUE = (2017, 09),
PARTITION VALUE = (2017, 10),
PARTITION VALUE = (2017, 11),
PARTITION VALUE = (2017, 12),
PARTITION VALUE = (2018, 01),
PARTITION VALUE = (2018, 02),
) STORED AS KUDU;
I think the reason you had to specify the year and month up front in the column list is that currently Kudu tables have a restriction that the primary key columns must come first in the column list.
Created 01-31-2024 10:43 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Dan Burkert , i have tried exactly the same DDL and it give the error. I used hue/impala to create the table. Can you help?
- ParseException: Syntax error in line 4:undefined: RANGE(year, month) ( ^ Encountered: RANGE Expected: CACHED, COMMENT, LOCATION, PARTITION, PARTITIONED, PRODUCED, ROW, SORT, STORED, TBLPROPERTIES, UNCACHED, WITH CAUSED BY: Exception: Syntax error
Created 05-24-2022 06:07 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is there any way in Kudu, dynamic partitions can be created which can automatically add new partition when the new date(month) is available.
e.g. in the above post
RANGE(year) ( PARTITION VALUE = (2017), PARTITION VALUE = (2018), PARTITION VALUE = (2019), PARTITION VALUE = (2020) )
Values after 2020 will get rejected, any workaround so create partitions based on interval
PARTITION BY RANGE (MY_DATE)
INTERVAL( NUMTODSINTERVAL(1,'YEAR')) -- Syntax from a famous relational DB