We're currently looking to upgrade our production cluster to 3.1.0, but we're running into some pain points in a test environment that are stopping us from proceeding.
Versions we're interacting with:
When attempting to run multiple updates on the same table, it fails due to a write conflict. This worked in previous versions, but in the new Hive 3 if the second update statement occurs during the first, it will throw this exception and fail (schema and table generalize). This is thru the new hive client (beeline).
Error: Error while processing statement: FAILED: Hive Internal Error: org.apache.hadoop.hive.ql.lockmgr.LockException(Transaction manager has aborted the transaction txnid:306237872. Reason: Aborting [txnid:306237872,306237873] due to a write conflict on <SCHEMA>/<TABLE> committed by [txnid:306237871,306237872] u/u) (state=42000,code=12)
The table properties are:
Basically, I need to know if this is a standard for hive going forward. We have a series of import that record the last value per table imported in a single table. However, this collision is breaking this process. The workaround solution I see is just splitting up these meta tables into multiple tables per import, but I don't like it.
Any ideas or am I missing something?
@Eric_B - This scenario on updating a table (even two different rows) by two different processes at the same time is not possible at the moment for ACID tables.
Currently, ACID concurrency management mechanism works at a partition level for partitioned tables and table level for non partitioned (which I believe is our case). Basically what the system wants to prevent is 2 parallel transactions updating the same row. Unfortunately, it can't keep track of this at individual row level, it does it at partition and table level respectively.
This is a new type of issue, though. Is this for Hive3 ACID tables only that will solved for in the future?
Regardless, it seems like partitioning the data (or just segmenting it) is the solution at this time.
Hello, i have the same issue. However, my code starts well and do the updates but after some minutes it stops with the same error message than you. Did you find any solution? or can you share the link of the partition part?
Thanks in advance.
No permanent solution to this collision type issue. The workaround I did was to split up the table into several smaller tables. That way no collisions occur. Not a great solution, but worked for my need.
@Eric_B , thanks for the reply, after few search me to I have no permanente solution but for overcome this issue we work with insert (for all data) and make update gestion with materialized view hive. And work fine and fast for my usecase.