Created on 08-20-2019 01:37 AM - edited 08-20-2019 01:38 AM
I would like to drop multiple partitions of an impala table at once.
Can I use the following statement in order to delete all the partitions of January ?
ALTER TABLE schema.Table DROP IF EXISTS PARTITION (part_col='201801*') PURGE ;
Can I use Star Wildcard just as unix?
Created on 09-24-2019 01:46 AM - edited 09-24-2019 01:49 AM
@eMazarakis, later releases do not support asterisk either, it will be treated as a literal. The expressions that are available can be found here in chapter 'To drop or alter multiple partitions'.
Previously, I was referring to the intention behind "part_col='201801*' ", it suggests that the desired outcome of this expression would be to remove all data from January 2018 in one operation. However, as it is not possible in CDH 5.9, I was proposing to choose a different partition strategy if multiple partitions have to be dropped frequently and the size of the data allows.
For example, if after ingestion only 1 analytic query is executed on the data, then the days have to be dropped one-by-one, which is 32 operations. Therefore, if the size of the data allows, the number of operations could be reduced to 2 with a different partition strategy where the table is partitioned by month.
Created 09-05-2019 01:59 AM
Hi eMazarakis,
Multiple partitions can be dropped with the following syntax:
alter table historical_data drop partition (year = 1996 and month between 1 and 6);
Please see our ALTER TABLE Statement documentation for more details, the multiple partition drop can be found in section: To drop or alter multiple partitions.
Created 09-05-2019 03:54 AM
@tmater my Cloudera version is 5.9 . I forgot to mentioned that.
Created 09-16-2019 01:45 AM
Hi @eMazarakis,
Thank you for the additional information. Altering multiple partitions was implemented in IMPALA-1654, this feature is available from Impala 2.8+ which is part of CDH 5.10+.
Although, I am not aware of the workflow and the amount of data behind a partition, this specific expression part_col='201801* removes a whole month. If these requests are frequent and the workload/workflow allows it re-partitioning based on months could be a feasible workaround.
Created 09-17-2019 12:11 AM
@tmater unfortunately this kind of expression ( part_col='201801*' ) does not work.
Created 09-17-2019 01:12 AM
@eMazarakis, apologies, I meant the part_col='201801*' intends to remove a whole month. If possible might worth reconsidering the partition strategy or the drop operation could be done in Hive separately.
From CDH 5.10+ partition expressions can be specified, please see my response here for details.
Created 09-23-2019 04:26 AM
@tmater the following command , did not remove the partitions/dirs in my cloudera version.
ALTER TABLE tblName DROP IF EXISTS PARTITION ( part_col='201801*' ) PURGE.
I think special characters such as * cannot be used in this version.
Created on 09-24-2019 01:46 AM - edited 09-24-2019 01:49 AM
@eMazarakis, later releases do not support asterisk either, it will be treated as a literal. The expressions that are available can be found here in chapter 'To drop or alter multiple partitions'.
Previously, I was referring to the intention behind "part_col='201801*' ", it suggests that the desired outcome of this expression would be to remove all data from January 2018 in one operation. However, as it is not possible in CDH 5.9, I was proposing to choose a different partition strategy if multiple partitions have to be dropped frequently and the size of the data allows.
For example, if after ingestion only 1 analytic query is executed on the data, then the days have to be dropped one-by-one, which is 32 operations. Therefore, if the size of the data allows, the number of operations could be reduced to 2 with a different partition strategy where the table is partitioned by month.