Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hive Compaction for ACID Transactions

avatar
Super Collaborator

Hello Hive SME's,

I am setting up Hive tables for ACID Transactions. There are fair(not high) number of inserts/ updates expected on each tables.

Should the compactions be scheduled every day? or let Hive manage compaction? Are there any pros/cons on hive managed compactions?

Hive Version 0.14

Thank You

Pranay Vyas

1 ACCEPTED SOLUTION

avatar
Master Mentor
4 REPLIES 4

avatar
Master Mentor

avatar
Super Collaborator

avatar
Super Collaborator

How often to run compaction is a function of how quickly you are generating delta files (see https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-BasicDesignfor more info). Less frequent compactions will make reads more expensive. Keep in mind that this system is designed for slowly changing data. Updating 1 row out of 1 billion row table every second will not work well. The cost of executing an SQL UPDATE statement that matches 1 row and 10K rows is roughly the same.

The other response regarding the state of this feature in Hive 0.14 is still valid.

avatar
Super Collaborator

Thank You @Eugene Koifman. this helps.