Support Questions
Find answers, ask questions, and share your expertise

Hive Deletion of records across multiple partition on a condition

Hive Deletion of records across multiple partition on a condition

New Contributor

Hi All

I have a dataset csv file which arrives on daily basis under a date folder. This csv file further contain a date column say dt2.

Each date folder csv file can have 1 to many dt2 inside it.

Eg folder date 20210218
csv file contents

id,     dt2,             folder date     other columns..
1      20210218   20210218
2      20200215   20210218
3     20200214    20210218

 

Now 90% of queries are made on dt2, so I am thinking to create schema on dt2 as partition.

but I also have occasional requirement, when I need to delete all contents for a folder date.
Since my partition is based on dt2 and in this example these 3 records are spread across 3 partition,
what is best way I can design hive schema to support delete statement?

 

Thank you