Support Questions

Find answers, ask questions, and share your expertise

How to change partition on existing Hive Table with data

avatar
Rising Star

Hi,

 

We have an existing external Hive Table containing millions of rows partitioned by columnA of type string.

 

We want to change this to ColumnB of type timestamp

 

What's the most efficient way to go about this, considering we have all this rows of data already stored in the existing partition structure

 

6 REPLIES 6

avatar
Super Guru
Hi,

The easiest way I can see is to:

1. drop the table (since it is external, data should NOT be dropped)
2. create a new table on top of it and specify as partitioned by ColumnA of type timestamp (the column name should remain the same as before, can't be changed to ColumnB, otherwise step 3 will not be able to pick it up)
3. run "msck repair table {tablename}" to recover the partitions

This assumes that the partition values will remain unchanged.

Hope this can help.

avatar
Rising Star

I was considering ....

 

1. Create new ext table with new partition

2.  insert into newtable select ... from oldtable ... to new hdfs location

3. drop old table and delete hdfs folders

 

problem here is... at some point both tables will have to exists

 

 

avatar
Super Guru
Have you tried my suggestion, which won't need to do data copying?

Cheers
Eric

avatar
Rising Star

Hi EricL

 

ColumnA is of a different data type than ColumnB

 

ColumnA contains Department Names (string) and ColumnB contains TimeStamps (Date-Time)

 

the table is already paritioned by the department names which is strings

 

now we want to change and partition by the TimeStamp column (date-time)

 

could you explain your process little more

avatar
Super Guru
Hi,

I thought you just update the partition column type. If they contain different data, how will you plan to manage the old partitions? This will affect how you implement the change.

Cheers
Eric

avatar
Rising Star

We do not want to keep the old partitions.

 

We just want to re-partition the data using the timestamps value.

 

The data only exists currently as partitioned by the string value