Support Questions

Find answers, ask questions, and share your expertise

Hive Compaction for ACID Transactions

avatar
Super Collaborator

Hello Hive SME's,

I am setting up Hive tables for ACID Transactions. There are fair(not high) number of inserts/ updates expected on each tables.

Should the compactions be scheduled every day? or let Hive manage compaction? Are there any pros/cons on hive managed compactions?

Hive Version 0.14

Thank You

Pranay Vyas

1 ACCEPTED SOLUTION

avatar
Master Mentor
4 REPLIES 4

avatar
Master Mentor

avatar
Super Collaborator

avatar
Super Collaborator

How often to run compaction is a function of how quickly you are generating delta files (see https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-BasicDesignfor more info). Less frequent compactions will make reads more expensive. Keep in mind that this system is designed for slowly changing data. Updating 1 row out of 1 billion row table every second will not work well. The cost of executing an SQL UPDATE statement that matches 1 row and 10K rows is roughly the same.

The other response regarding the state of this feature in Hive 0.14 is still valid.

avatar
Super Collaborator

Thank You @Eugene Koifman. this helps.