Support Questions
Find answers, ask questions, and share your expertise

Hive 10TB table add partition performance issue

Hive 10TB table add partition performance issue

New Contributor

Techies,

Background - We have 10TB existing hive table which has been range partitioned on column A. Business case has changes which now require adding of partition column B in addition to Column A. Problem statement - Since data on hdfs is too huge and needs to be restructured to inherit the new partition column B, we are facing difficulty to copy over table onto backup and reingest using simple IMPALA INSERT OVERWRITE into main table.

We want to explore if there is/ are efficient way to handle adding over partition columns to such huge table