Support Questions
Find answers, ask questions, and share your expertise

Hive concurrency - lost update

Hi, I am seeing some situations where I have two Hive SQL commands running concurrently and I'm getting a lost update.  I am running Hive 2.3.6 on EMR with hive.support.concurrency = true and I believe this shouldn't be happening based on what I understand about Hive table locking.  (I am not using ACID transactions but the table locking should still prevent lost update as far as I know;)

 

Specifically I have a "load data" statement loading data into table T from an S3 location.  I have an "insert overwrite T select * from T" table running concurrently from another Hive connection that deletes some rows from T but should not be affecting rows from the load data statement.  I am seeing that the data from the load data statement disappears after the insert overwrite finishes.  My understanding is that the load data and insert overwrite should create an exclusive table lock on T so they should allow each other to finish before reading or writing data from T.  (I checked this using "show locks" and they do definitely create an exclusive lock.)

 

 

Has anyone seen this issue before and are there any Hive settings I can try changing to prevent this behavior?

1 ACCEPTED SOLUTION

This was a result of a bug in my code and not anything to do with Hive itself - please ignore.

View solution in original post

1 REPLY 1

This was a result of a bug in my code and not anything to do with Hive itself - please ignore.

; ;