Support Questions

Find answers, ask questions, and share your expertise

Bug: Partioning In Hive

avatar

I have a hive managed table which is partitioned by country. Now I have inserted data for different countries--> India, Japan, China. Total count of records in hive table is 1000. I have truncated a particular partition say for example I have truncated Japan partition, now the count of rows is reduced to 700. Then If I run the query select distinct country from mytable it displays India,China and Japan even i have no data in the partition country=Japan. I have checked the hadoop file location as well and there are no files underneath the partition. But the distinct of the partitioned column displays Japan as well which should not be. Distinct values of table should return based on the data available on the partitioned folder created in hdfs and no the folder name in HDFS.

1 ACCEPTED SOLUTION

avatar

Hi @Bala Vignesh N V,

Truncating the partition will not remove the partition metadata. To remove the metadata for all partitions you'll want to issue the CASCADE statement in an ALTER TABLE statement. This should remove the column metadata for all partitions.

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ChangeColumnNa...

View solution in original post

2 REPLIES 2

avatar

Hi @Bala Vignesh N V,

Truncating the partition will not remove the partition metadata. To remove the metadata for all partitions you'll want to issue the CASCADE statement in an ALTER TABLE statement. This should remove the column metadata for all partitions.

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ChangeColumnNa...

avatar

Thanks @Scott Shaw. Does it mean I have to update the metadata each time after I truncate the partition? Even if the metadata exists it should not display wrong results. In my case select distinct country from mytable should display only India.