Hi,I have problems with the automatic compactions triggering in Hive on HDP 3.0.1.
My Hive tables are populated by NiFi with the Hive3Streaming processor which is generating a lot of delta files. Hive is owner of the partitions (NiFi insert the data using Hive keytab).
What happens is :
- Hive triggers a first MAJOR compaction for each partition because there is no base file in the partition (the "no base" condition is fulfilled)
- Hive does not try to compact the same partition again (even doesn't seem to check the partition for compaction given the logs)
I have tried several other configurations :
- When hive is not owner of the partitions, it fails to impersonate the user using KERBEROS protocol during delta files cleaning process (even with users created by Ambari like nifi).
- Creating the DB + tables as Hive doesn't change anything as expected.
- Setting hive.compactor.delta.pct.threshold = 1000 doesn't prevent Hive to trigger the first MAJOR compaction as the "no base" condition is fulfilled.
Does anyone have any idea of what the problem can be ?
I am using the following configuration for Hive compactions:
hive.compactor.check.interval = 300
hive.compactor.delta.num.threshold = 10
hive.compactor.delta.pct.threshold = 0.1f
hive.compactor.worker.threads = 5
hive.compactor.abortedtxn.threshold = 1000
hive.compactor.initiator.on = true