- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Write Conflict when multiple updates in Hive ACID
- Labels:
-
Apache Hive
-
Apache Tez
Created 11-04-2019 11:22 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We're currently looking to upgrade our production cluster to 3.1.0, but we're running into some pain points in a test environment that are stopping us from proceeding.
Versions we're interacting with:
HDFS 3.1.1.3.1
Hive 3.0.0.3.1
The problem:
When attempting to run multiple updates on the same table, it fails due to a write conflict. This worked in previous versions, but in the new Hive 3 if the second update statement occurs during the first, it will throw this exception and fail (schema and table generalize). This is thru the new hive client (beeline).
Error: Error while processing statement: FAILED: Hive Internal Error: org.apache.hadoop.hive.ql.lockmgr.LockException(Transaction manager has aborted the transaction txnid:306237872. Reason: Aborting [txnid:306237872,306237873] due to a write conflict on <SCHEMA>/<TABLE> committed by [txnid:306237871,306237872] u/u) (state=42000,code=12)
The table properties are:
TBLPROPERTIES( 'bucketing_version'='2','transactional'='true','transactional_properties'='default','transient_lastDdlTime'='1572894940')
Created 11-05-2019 07:39 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Basically, I need to know if this is a standard for hive going forward. We have a series of import that record the last value per table imported in a single table. However, this collision is breaking this process. The workaround solution I see is just splitting up these meta tables into multiple tables per import, but I don't like it.
Any ideas or am I missing something?
Created 11-08-2019 10:47 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Eric_B - This scenario on updating a table (even two different rows) by two different processes at the same time is not possible at the moment for ACID tables.
Currently, ACID concurrency management mechanism works at a partition level for partitioned tables and table level for non partitioned (which I believe is our case). Basically what the system wants to prevent is 2 parallel transactions updating the same row. Unfortunately, it can't keep track of this at individual row level, it does it at partition and table level respectively.
Created 11-11-2019 11:31 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is a new type of issue, though. Is this for Hive3 ACID tables only that will solved for in the future?
Regardless, it seems like partitioning the data (or just segmenting it) is the solution at this time.
Thank you.
Created 12-02-2019 06:58 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, i have the same issue. However, my code starts well and do the updates but after some minutes it stops with the same error message than you. Did you find any solution? or can you share the link of the partition part?
Thanks in advance.
Created 04-09-2020 08:24 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Ellyly
No permanent solution to this collision type issue. The workaround I did was to split up the table into several smaller tables. That way no collisions occur. Not a great solution, but worked for my need.
Created 04-14-2020 07:54 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
@Eric_B , thanks for the reply, after few search me to I have no permanente solution but for overcome this issue we work with insert (for all data) and make update gestion with materialized view hive. And work fine and fast for my usecase.
Best Regards