Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

HIVE and ACID table performance for updates

avatar
Rising Star

All,

If I turn on ACID for HIVE, is there a performance impact? And if so, is there some best practices to mitigate or address this with tools like tuning in TEZ or the number of mappers?

The goal is to update tables/records and identify if it makes more sense to do nightly batches vs incremental updates throughout the day.

1 ACCEPTED SOLUTION

avatar
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
5 REPLIES 5

avatar

Hi @devers, ACID does require some performance considerations especially if you have a high number of inserts and deletes. The performance will slowly degrade over time until a compaction is executed. After compaction your performance will normalize. Based on your use case you'll want to adjust the compaction frequency to find your unique performance sweet spot.

https://community.hortonworks.com/questions/15095/hive-compaction-for-acid-transactions.html

avatar
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar
Rising Star

Good info. The concurrency metric you state is inline with other recommendations I have seen when dealing with large numbers of updates. Thankyou!

avatar
Rising Star

Thanks Scott! Does there exist a working example or set of metrics published to help admins with determining the frequency or identifying a threshold?

I see several generalized statements but not so much with test numbers. Maybe a follow up HCC article in the making.

avatar
Super Collaborator

https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-Configuration has a list of configuration options. For example, hive.compactor.delta.num.threshold, lets you control how often minor compaction runs wrt number of SQL operations. Each SQL insert/update/delete generates 1 delta file. Each minor compaction, combines whatever delta files it finds into a single new delta that includes all the info. There is not specific guidance available since depends on specific set up/requirements and has to tested for. There other options listed there that can help.