Support Questions
Find answers, ask questions, and share your expertise

How can I Alter multiple partitions at once on Impala ?

Contributor

I would like to drop multiple partitions of an impala table at once.

Can I use the following statement  in order to delete all the partitions of January ?

 

ALTER TABLE schema.Table  DROP IF EXISTS PARTITION (part_col='201801*') PURGE ;

 

Can I use Star Wildcard  just as unix?

1 ACCEPTED SOLUTION

Contributor

@eMazarakis, later releases do not support asterisk either, it will be treated as a literal. The expressions that are available can be found here in chapter 'To drop or alter multiple partitions'.

 

Previously, I was referring to the intention behind "part_col='201801*' ", it suggests that the desired outcome of this expression would be to remove all data from January 2018 in one operation. However, as it is not possible in CDH 5.9, I was proposing to choose a different partition strategy if multiple partitions have to be dropped frequently and the size of the data allows.

For example, if after ingestion only 1 analytic query is executed on the data, then the days have to be dropped one-by-one, which is 32 operations. Therefore, if the size of the data allows, the number of operations could be reduced to 2 with a different partition strategy where the table is partitioned by month.

View solution in original post

7 REPLIES 7

Contributor

Hi eMazarakis,

 

Multiple partitions can be dropped with the following syntax:

alter table historical_data drop partition (year = 1996 and month between 1 and 6);

Please see our ALTER TABLE Statement documentation for more details, the multiple partition drop can be found in section: To drop or alter multiple partitions.

 

Contributor

@tmater   my Cloudera version is 5.9 . I forgot to mentioned that.

Contributor

Hi @eMazarakis,

Thank you for the additional information. Altering multiple partitions was implemented in IMPALA-1654, this feature is available from Impala 2.8+ which is part of CDH 5.10+.

Although, I am not aware of the workflow and the amount of data behind a partition, this specific expression part_col='201801* removes a whole month. If these requests are frequent and the workload/workflow allows it re-partitioning based on months could be a feasible workaround.

 

 

Contributor

@tmater  unfortunately this kind of expression ( part_col='201801*' ) does not work.

Contributor

@eMazarakis, apologies, I meant the part_col='201801*' intends to remove a whole month. If possible might worth reconsidering the partition strategy or the drop operation could be done in Hive separately.

From CDH 5.10+ partition expressions can be specified, please see my response here for details.

Contributor

@tmater    the following command , did not remove the partitions/dirs in my cloudera version.

 

ALTER TABLE  tblName DROP IF EXISTS PARTITION ( part_col='201801*' ) PURGE.

 

I think special characters such as * cannot be used in this version.

 

Contributor

@eMazarakis, later releases do not support asterisk either, it will be treated as a literal. The expressions that are available can be found here in chapter 'To drop or alter multiple partitions'.

 

Previously, I was referring to the intention behind "part_col='201801*' ", it suggests that the desired outcome of this expression would be to remove all data from January 2018 in one operation. However, as it is not possible in CDH 5.9, I was proposing to choose a different partition strategy if multiple partitions have to be dropped frequently and the size of the data allows.

For example, if after ingestion only 1 analytic query is executed on the data, then the days have to be dropped one-by-one, which is 32 operations. Therefore, if the size of the data allows, the number of operations could be reduced to 2 with a different partition strategy where the table is partitioned by month.

; ;