Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Hive Compaction for ACID Transactions

avatar
Super Collaborator

Hello Hive SME's,

I am setting up Hive tables for ACID Transactions. There are fair(not high) number of inserts/ updates expected on each tables.

Should the compactions be scheduled every day? or let Hive manage compaction? Are there any pros/cons on hive managed compactions?

Hive Version 0.14

Thank You

Pranay Vyas

1 ACCEPTED SOLUTION

avatar
Master Mentor
4 REPLIES 4

avatar
Master Mentor

avatar
Super Collaborator

avatar
Super Collaborator

How often to run compaction is a function of how quickly you are generating delta files (see https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-BasicDesignfor more info). Less frequent compactions will make reads more expensive. Keep in mind that this system is designed for slowly changing data. Updating 1 row out of 1 billion row table every second will not work well. The cost of executing an SQL UPDATE statement that matches 1 row and 10K rows is roughly the same.

The other response regarding the state of this feature in Hive 0.14 is still valid.

avatar
Super Collaborator

Thank You @Eugene Koifman. this helps.