Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

HIVE and ACID table performance for updates

Solved Go to solution

HIVE and ACID table performance for updates

Contributor

All,

If I turn on ACID for HIVE, is there a performance impact? And if so, is there some best practices to mitigate or address this with tools like tuning in TEZ or the number of mappers?

The goal is to update tables/records and identify if it makes more sense to do nightly batches vs incremental updates throughout the day.

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: HIVE and ACID table performance for updates

https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-BasicDesignfor ... ACID transactions shouldn't impact analytical queries while inserts are happening. A read can take a version and a write can put a new version of the data without waiting on locks. But this adds overhead of reading the delta files when you read the main ORCFile. As delta files accumulate you'll need to compact/consolidate the edits which will use cluster resources. The impact depends on the number of updates. Hive ACID should be used for low concurrency 50 or fewer concurrent users.

In general, I recommend using Hive best practices for Hive query performance - http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_performance_tuning/content/ch_hive_hi_per...

View solution in original post

5 REPLIES 5
Highlighted

Re: HIVE and ACID table performance for updates

Hi @devers, ACID does require some performance considerations especially if you have a high number of inserts and deletes. The performance will slowly degrade over time until a compaction is executed. After compaction your performance will normalize. Based on your use case you'll want to adjust the compaction frequency to find your unique performance sweet spot.

https://community.hortonworks.com/questions/15095/hive-compaction-for-acid-transactions.html

Highlighted

Re: HIVE and ACID table performance for updates

https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-BasicDesignfor ... ACID transactions shouldn't impact analytical queries while inserts are happening. A read can take a version and a write can put a new version of the data without waiting on locks. But this adds overhead of reading the delta files when you read the main ORCFile. As delta files accumulate you'll need to compact/consolidate the edits which will use cluster resources. The impact depends on the number of updates. Hive ACID should be used for low concurrency 50 or fewer concurrent users.

In general, I recommend using Hive best practices for Hive query performance - http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_performance_tuning/content/ch_hive_hi_per...

View solution in original post

Highlighted

Re: HIVE and ACID table performance for updates

Contributor

Good info. The concurrency metric you state is inline with other recommendations I have seen when dealing with large numbers of updates. Thankyou!

Highlighted

Re: HIVE and ACID table performance for updates

Contributor

Thanks Scott! Does there exist a working example or set of metrics published to help admins with determining the frequency or identifying a threshold?

I see several generalized statements but not so much with test numbers. Maybe a follow up HCC article in the making.

Highlighted

Re: HIVE and ACID table performance for updates

Expert Contributor

https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-Configuration has a list of configuration options. For example, hive.compactor.delta.num.threshold, lets you control how often minor compaction runs wrt number of SQL operations. Each SQL insert/update/delete generates 1 delta file. Each minor compaction, combines whatever delta files it finds into a single new delta that includes all the info. There is not specific guidance available since depends on specific set up/requirements and has to tested for. There other options listed there that can help.

Don't have an account?
Coming from Hortonworks? Activate your account here