Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive compactions not triggered automatically

Hive compactions not triggered automatically

New Contributor

Hi,

I have problems with the automatic compactions triggering in Hive on HDP 3.0.1.

My Hive tables are populated by NiFi with the Hive3Streaming processor which is generating a lot of delta files. Hive is owner of the partitions (NiFi insert the data using Hive keytab).

What happens is :

- Hive triggers a first MAJOR compaction for each partition because there is no base file in the partition (the "no base" condition is fulfilled)

- Hive does not try to compact the same partition again (even doesn't seem to check the partition for compaction given the logs)


I have tried several other configurations :

- When hive is not owner of the partitions, it fails to impersonate the user using KERBEROS protocol during delta files cleaning process (even with users created by Ambari like nifi).

- Creating the DB + tables as Hive doesn't change anything as expected.

- Setting hive.compactor.delta.pct.threshold = 1000 doesn't prevent Hive to trigger the first MAJOR compaction as the "no base" condition is fulfilled.


Does anyone have any idea of what the problem can be ?


I am using the following configuration for Hive compactions:

hive.compactor.check.interval = 300
hive.compactor.delta.num.threshold = 10
hive.compactor.delta.pct.threshold = 0.1f
hive.compactor.worker.threads = 5
hive.compactor.abortedtxn.threshold = 1000
hive.compactor.initiator.on = true
Don't have an account?
Coming from Hortonworks? Activate your account here