Created on 01-20-2018 09:56 AM - edited 09-16-2022 05:45 AM
Hi,
I have a simple table with range partitions defined by upper and lower bounds.
CREATE TABLE work.sales_by_year (
year INT, sale_id INT, amount INT,
PRIMARY KEY (sale_id, year)
)
PARTITION BY RANGE (year) (
PARTITION VALUES < 2015,
PARTITION 2015 <= VALUES < 2016,
PARTITION 2016 <= VALUES
)
STORED AS KUDU;
So this table has three partitions:
+--------+-----------+----------+-------------------------------------------------+------------+
| # Rows | Start Key | Stop Key | Leader Replica | # Replicas |
+--------+-----------+----------+-------------------------------------------------+------------+
| -1 | 800007DF | host1:7050 | 3 |
| -1 | 800007DF | 800007E0 | host2:7050 | 3 |
| -1 | 800007E0 | | host3:7050 | 3 |
+--------+-----------+----------+-------------------------------------------------+------------+
Now I would like to end the last range with 2017 and have another interval for values >= 2017.
I tried multiple syntaxes, but it does not work:
alter table work.sales_by_year add range partition 2016 <= VALUES < 2017; Query: alter table work.sales_by_year add range partition 2016 <= VALUES < 2017 ERROR: ImpalaRuntimeException: Error adding range partition in table sales_by_year CAUSED BY: NonRecoverableException: New range partition conflicts with existing range partition: 2016 <= VALUES < 2017
alter table work.sales_by_year add range partition VALUE = 2017; Query: alter table work.sales_by_year add range partition VALUE = 2017 ERROR: ImpalaRuntimeException: Error adding range partition in table sales_by_year CAUSED BY: NonRecoverableException: New range partition conflicts with existing range partition: 2017 <= VALUES < 2018
These error messages are misleading, if I run show partitions, I am having still those three intervals, so no 2017 and 2018.
Any hints how to extend the range partitons?
Thanks
Created 01-29-2018 01:52 AM
So the correct answer is that:
Tables with range partitions defined via upper and lower boundaries cannot be extended.
Tables with partitions defined as a single value can be extended.
Created on 01-22-2018 09:24 PM - edited 01-22-2018 09:27 PM
The error messages you're seeing seem to indicate that such alterations are not supported. AFAICT his would entail splitting tablets, which isn't supported at the moment. See here for more details on what's currently implemented: https://kudu.apache.org/docs/schema_design.html#range-partitioning-example
The doc gives two examples, noting that Example 2 is more flexible because it can add further partitions. Your partition schema more closely resembles Example 1, which can't.
Depending on how much data you have, you might consider creating a new table with a more flexible partition schema (e.g. 2014, 2015, 2016, 2017, a la Example 2 in the linked docs), and re-insert into this new table from the existing table.
Created 01-28-2018 05:00 AM
It is confusing, Apache Kudu User Guide, p.27:
Partitioning Limitations • Tables must be manually pre-split into tablets using simple or compound primary keys. Automatic splitting is not yet possible. Range partitions may be added or dropped after a table has been created. See Schema Design for more information.
Created 01-28-2018 05:02 AM
And also on p.29:
New Features in Kudu 0.10.0 • Users may now manually manage the partitioning of a range-partitioned table. When a table is created, the user may specify a set of range partitions that do not cover the entire available key space. A user may add or drop range partitions to existing tables. This feature can be particularly helpful with time series workloads in which new partitions can be created on an hourly or daily basis. Old partitions may be efficiently dropped if the application does not need to retain historical data past a certain point.
Created 01-29-2018 01:52 AM
So the correct answer is that:
Tables with range partitions defined via upper and lower boundaries cannot be extended.
Tables with partitions defined as a single value can be extended.
Created 01-29-2018 11:00 AM
No, it's actually the opposite:
From the range partitioning docs:
"The second example [with upper/lower bounds specified] is more flexible than the first [with a split defined], because it allows range partitions for future years to be added to the table. In the first example, all writes for times after 2016-01-01 will fall into the last partition, so the partition may eventually become too large for a single tablet server to handle."