Created 02-07-2016 09:25 PM
Hello Hive SME's,
I am setting up Hive tables for ACID Transactions. There are fair(not high) number of inserts/ updates expected on each tables.
Should the compactions be scheduled every day? or let Hive manage compaction? Are there any pros/cons on hive managed compactions?
Hive Version 0.14
Thank You
Pranay Vyas
Created 02-07-2016 09:34 PM
Please see this before you do this in prod
https://community.hortonworks.com/content/kbentry/4321/hive-acid-current-state.html
Created 02-07-2016 09:34 PM
Please see this before you do this in prod
https://community.hortonworks.com/content/kbentry/4321/hive-acid-current-state.html
Created 02-07-2016 09:39 PM
Thanks @Neeraj Sabharwal
Created 02-08-2016 07:24 PM
How often to run compaction is a function of how quickly you are generating delta files (see https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-BasicDesignfor more info). Less frequent compactions will make reads more expensive. Keep in mind that this system is designed for slowly changing data. Updating 1 row out of 1 billion row table every second will not work well. The cost of executing an SQL UPDATE statement that matches 1 row and 10K rows is roughly the same.
The other response regarding the state of this feature in Hive 0.14 is still valid.
Created 02-10-2016 04:57 PM
Thank You @Eugene Koifman. this helps.