Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Partition rename in Hive & HDFS path

avatar

I have an external table created as TEXTFILE with partion on load_date. I have inserted data for one partitions say for example that particular hive table has partition (load_date='2017-11-09'). Now i wanted to rename the partition which I have did by using

ALTER TABLE tbl_name PARTITION (load_date='2017-11-09') RENAME TO PARTITION (load_date='2017-11-10');

After performing this operation, If i query the table Im able to see the new value for the partition. However the underlying HDFS still show the old partition path and its sub-directory still points to /hive/warehouse/default/tbl_name/load_date=2017-11-09.

Is this an known issue?

1 ACCEPTED SOLUTION

avatar
Explorer

@Bala Vignesh N V

This is normal behaviour for EXTERNAL tables since Hive is not managing the underlying data. You can see in the metastore database that Hive keeps a mapping of the partition name to the location on HDFS. Your ALTER command will update this mapping and change the PART_NAME value.

hive=> SELECT t."TBL_NAME", s."LOCATION", p."PART_NAME" FROM "SDS" s, "TBLS" t, "PARTITIONS" p  WHERE t."TBL_ID"=p."TBL_ID" AND s."SD_ID"=p."SD_ID";
 TBL_NAME |                   LOCATION                   | PART_NAME 
----------+----------------------------------------------+-----------
 mytable  | hdfs://my.example.com:8020/tmp/mytable/part=a | part=z
 mytable  | hdfs://my.example.com:8020/tmp/mytable/part=b | part=b

If you want to rename the partition name on HDFS and have it reflected in the data then you do the rename on HDFS and then DROP the partition in the external table and ADD the renamed one.

View solution in original post

3 REPLIES 3

avatar
Explorer

@Bala Vignesh N V

This is normal behaviour for EXTERNAL tables since Hive is not managing the underlying data. You can see in the metastore database that Hive keeps a mapping of the partition name to the location on HDFS. Your ALTER command will update this mapping and change the PART_NAME value.

hive=> SELECT t."TBL_NAME", s."LOCATION", p."PART_NAME" FROM "SDS" s, "TBLS" t, "PARTITIONS" p  WHERE t."TBL_ID"=p."TBL_ID" AND s."SD_ID"=p."SD_ID";
 TBL_NAME |                   LOCATION                   | PART_NAME 
----------+----------------------------------------------+-----------
 mytable  | hdfs://my.example.com:8020/tmp/mytable/part=a | part=z
 mytable  | hdfs://my.example.com:8020/tmp/mytable/part=b | part=b

If you want to rename the partition name on HDFS and have it reflected in the data then you do the rename on HDFS and then DROP the partition in the external table and ADD the renamed one.

avatar

Thanks @Chris Cotter. I understand for external table its not maintained by Hcatalog. But the folder mapping is done properly even im able to see the new partition value when I query the table. The only issue is the folder partition value is not changed which Im able to understand the reason. How come when I query the table I'm able to see the new partition value when its underlying folder value doesn't change?

avatar
Explorer

@Bala Vignesh N V

So you've renamed the partition in Hive and can see the new name there but when you look on HDFS it still has the original partition name, correct?

In my example in the previous post I originally had 2 partitions (part=a and part=b) in Hive and I renamed part=a to part=z. On HDFS, part=a never changed but the PART_NAME column in the metastore database was updated to part=z. In Hive, I can only see part=z and part=b and if I do a SELECT for the data in part=z then it will lookup the LOCATION column from the metastore database for part=z, which still points to the part=a directory on HDFS, and read data for part=z from there. So this way for external tables, you can rename the partitions in Hive to whatever you like without affecting the underlying data on HDFS.