Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to Update records in a Hive table concurrently?

avatar

I am trying to update two different records of an ACID transaction enabled hive table in two different sessions but getting a lockException showing write conflict. Is there any configuration parameter in Hive.

1 ACCEPTED SOLUTION

avatar

@Harish Nerella

This scenario on updating a table (even two different rows) by two different processes at the same time is not possible at the moment for ACID tables.

Currently, ACID concurrency management mechanism works at a partition level for partitioned tables and table level for non partitioned (which I believe is our case). Basically what the system wants to prevent is 2 parallel transactions updating the same row. Unfortunately, it can't keep track of this at individual row level, it does it at partition and table level respectively.

Refer Jira HIVE-13395 for more details.

View solution in original post

4 REPLIES 4

avatar
Contributor

@Harish Nerella

If a table is to be used in ACID writes (insert, update, delete) then the table property "transactional=true" must be set on that table.Also, hive.txn.manager must be set to org.apache.hadoop.hive.ql.lockmgr.DbTxnManager either in hive-site.xml or in the beginning of the session before any query is run.

Refer https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions for more details.

avatar

Thank you for your answer.

I did set all the ACID Transaction related configurations in both the sessions but still I am getting a write conflict. And caveat is if I am executing the update statement in sequential way (after completion of a query which is running in other session) it is allowing me to update.

avatar

@Harish Nerella

This scenario on updating a table (even two different rows) by two different processes at the same time is not possible at the moment for ACID tables.

Currently, ACID concurrency management mechanism works at a partition level for partitioned tables and table level for non partitioned (which I believe is our case). Basically what the system wants to prevent is 2 parallel transactions updating the same row. Unfortunately, it can't keep track of this at individual row level, it does it at partition and table level respectively.

Refer Jira HIVE-13395 for more details.

avatar

Thank You. This helped.