Support Questions

Find answers, ask questions, and share your expertise

Partition Retention Period not working on Hive 'Managed' tables

avatar
Contributor

I am trying to set partition retention times on existing Hive manged tables using the following: 

 

ALTER TABLE <table name> SET TBLPROPERTIES ('discover.partitions'='true');
ALTER TABLE <table name> SET TBLPROPERTIES ('partition.retention.period'='1d');

 

as stated on this page below, however I am still able to search partitions older than a day so it appears to not be working? It does mention on the page that this is for 'external' tables, can anyone let me know if this an 'age off' retention period is possible on managed tables? am I missing any commands etc? 

https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.4/using-hiveql/content/hive-set-partition-retent...

 

Thanks in advance

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Can you let me know your cluster version ?

Please check AutoPartitionDiscovery thread (PartitionManagementTask) is running in HMS logs.

If it is running, you should be seeing following log line in HMS. Make sure you don't have specific db/table pattern.

 

HMS
org.apache.hadoop.hive.metastore.PartitionManagementTask: [Metastore Scheduled Worker 3]: Looking for tables using catalog: hive dbPattern: * tablePattern: * found: 107

HS2/HMS
org.apache.hadoop.hive.metastore.HiveMetaStoreChecker: [HiveServer2-Background-Pool: Thread-123]: Number of partitionsNotInMs=[], partitionsNotOnFs=[], tablesNotInMs=[], tablesNotOnFs=[], expiredPartitions=[]

org.apache.hadoop.hive.metastore.Msck: [HiveServer2-Background-Pool: Thread-123]: hive.default.test_table - #partsNotInMs: 0 #partsNotInFs: 0 #expiredPartitions: 0 lockRequired: true (R: true A: true  true)

 

In CDP cluster managed by CM, set metastore.msck.repair.enable.partition.retention=true at Hive -> Configuration -> Hive Metastore Server Advanced Configuration Snippet (Safety Valve) for hive-site.xml

 

In HDP-3.x managed by Ambari, set metastore.msck.repair.enable.partition.retention=true in Hive -> custom hms-site

Restart applicable services. expiredPartitions should be visible in logs & should be removed by HMS at scheduled interval once they become eligible.

View solution in original post

9 REPLIES 9

avatar
Expert Contributor

Partition Retention is disabled by default in HDP-3.1.4.x

set metastore.msck.repair.enable.partition.retention=true in ambari -> Hive -> custom hms-site & restart HMS. HMS should automatically take care.

avatar
Contributor

Thanks for the reply @nramanaiah. I seem to be unable to find an option 'metastore.msck.repair.enable.partition.retention' does it need to be added as a custom option and if so under which drop down? Thanks

avatar
Expert Contributor

Can you let me know your cluster version ?

Please check AutoPartitionDiscovery thread (PartitionManagementTask) is running in HMS logs.

If it is running, you should be seeing following log line in HMS. Make sure you don't have specific db/table pattern.

 

HMS
org.apache.hadoop.hive.metastore.PartitionManagementTask: [Metastore Scheduled Worker 3]: Looking for tables using catalog: hive dbPattern: * tablePattern: * found: 107

HS2/HMS
org.apache.hadoop.hive.metastore.HiveMetaStoreChecker: [HiveServer2-Background-Pool: Thread-123]: Number of partitionsNotInMs=[], partitionsNotOnFs=[], tablesNotInMs=[], tablesNotOnFs=[], expiredPartitions=[]

org.apache.hadoop.hive.metastore.Msck: [HiveServer2-Background-Pool: Thread-123]: hive.default.test_table - #partsNotInMs: 0 #partsNotInFs: 0 #expiredPartitions: 0 lockRequired: true (R: true A: true  true)

 

In CDP cluster managed by CM, set metastore.msck.repair.enable.partition.retention=true at Hive -> Configuration -> Hive Metastore Server Advanced Configuration Snippet (Safety Valve) for hive-site.xml

 

In HDP-3.x managed by Ambari, set metastore.msck.repair.enable.partition.retention=true in Hive -> custom hms-site

Restart applicable services. expiredPartitions should be visible in logs & should be removed by HMS at scheduled interval once they become eligible.

avatar
Contributor

@nramanaiah thanks very much for the help!

avatar
Contributor

@nramanaiah I'm still experiencing some issues with this, I have applied metastore.msck.repair.enable.partition.retention=true and restarted, all looks good as below. I have applied the ALTER table statements to set the retention of 1 day to a test table without error, but when I do a SELECT statement in Beeline I can still see data from last week?

Griggsy_0-1663674220278.png

Griggsy_1-1663674319795.png

Any idea what I'm missing?

 

avatar
Expert Contributor

@Griggsy 

Partition discovery & retention is only for external table & data purge is disabled by default for external tables.

ie., If tblproperties has ('discover.partitions'='true' & 'partition.retention.period'='2m' ) without 'external.table.purge'='true'. Consecutive iterations will repeatedly add/remove partition till underlying HDFS partition folder is removed. I suspect you are seeing the same situation.

eg., 

 

==> Expired partition is removed from Metadata in Iteration 1 <== 
2022-09-23T02:43:36,786 INFO  [PartitionDiscoveryTask-1]: metastore.Msck (Msck.java:repair(135)) - hive.default.test - #partsNotInMs: 0 #partsNotInFs: 0 #expiredPartitions: 1 lockRequired: true (R: true A: true  true)
2022-09-23T02:43:36,822 INFO  [PartitionDiscoveryTask-1]: metastore.Msck (Msck.java:repair(287)) - Expired partitions: [test:dept=cse]


==> As HDFS partition folder exist, discover.partition added it back in Iteration 2 <==
2022-09-23T02:48:36,818 INFO  [PartitionDiscoveryTask-1]: metastore.HiveMetaStoreChecker (HiveMetaStoreChecker.java:checkMetastore(144)) - Number of partitionsNotInMs=[test:dept=cse], partitionsNotOnFs=[], tablesNotInMs=[], tablesNotOnFs=[], expiredPartitions=[]
2022-09-23T02:48:36,818 INFO  [PartitionDiscoveryTask-1]: metastore.Msck (Msck.java:repair(135)) - hive.default.test - #partsNotInMs: 1 #partsNotInFs: 0 #expiredPartitions: 0 lockRequired: true (R: true A: true  true)

==> Expired partition is removed from Metadata in Iteration 3 <== 
2022-09-23T02:53:36,805 INFO  [PartitionDiscoveryTask-1]: metastore.Msck (Msck.java:repair(135)) - hive.default.test - #partsNotInMs: 0 #partsNotInFs: 0 #expiredPartitions: 1 lockRequired: true (R: true A: true  true)
2022-09-23T02:53:36,837 INFO  [PartitionDiscoveryTask-1]: metastore.Msck (Msck.java:repair(287)) - Expired partitions: [test:dept=cse]

 

Please use following 3 tblproperties accordingly to make it work as you expect.

 1) 'discover.partitions'='true' -- To discover partitions from HDFS path or remove partition metadata from HMS when HDFS path is removed.

2) 'partition.retention.period'='<retention_period>' -- To remove partitions after retention period

3) 'external.table.purge'='true' -- To remove HDFS partition folder, so that partition discover won't add it back to HMS.

PS: To enable global external table purge, set hive.external.table.purge.default=true cluster wide in ambari -> hive -> custom hive-site

avatar
Expert Contributor

@Griggsy Hope above suggestion helped. Let me know if otherwise.

avatar
Contributor

@nramanaiah I haven't had a chance to do further testing yet, I will let you know ASAP. Thanks again for the help.

avatar
Contributor

@nramanaiah have been able to run further testing and confirm that my partitions are purging as expected! thanks again for the assistance!