Created 04-02-2017 09:25 AM
I have a hive managed table which is partitioned by country. Now I have inserted data for different countries--> India, Japan, China. Total count of records in hive table is 1000. I have truncated a particular partition say for example I have truncated Japan partition, now the count of rows is reduced to 700. Then If I run the query select distinct country from mytable it displays India,China and Japan even i have no data in the partition country=Japan. I have checked the hadoop file location as well and there are no files underneath the partition. But the distinct of the partitioned column displays Japan as well which should not be. Distinct values of table should return based on the data available on the partitioned folder created in hdfs and no the folder name in HDFS.
Created 04-02-2017 11:54 AM
Truncating the partition will not remove the partition metadata. To remove the metadata for all partitions you'll want to issue the CASCADE statement in an ALTER TABLE statement. This should remove the column metadata for all partitions.
Created 04-02-2017 11:54 AM
Truncating the partition will not remove the partition metadata. To remove the metadata for all partitions you'll want to issue the CASCADE statement in an ALTER TABLE statement. This should remove the column metadata for all partitions.
Created 04-02-2017 01:56 PM
Thanks @Scott Shaw. Does it mean I have to update the metadata each time after I truncate the partition? Even if the metadata exists it should not display wrong results. In my case select distinct country from mytable should display only India.