Support Questions

Find answers, ask questions, and share your expertise

Any disadvantage to enabling concurrency in hive?

avatar
Rising Star

I've been asked to set hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager and hive.support.concurrency = true, because a subset of users is concerned about dirty reads on an external table while an external job runs to consolidate small files within a partition, so they want to do an exclusive lock during the consolidation....

Anyone no of a reason I should be wary of the above settings? Is there potential for performance impacts for other jobs/users that might have had no need for the above settings?

I guess another question would be does "lock table" even work on an external table?

Thx,

-Vince

3 REPLIES 3

avatar
Rising Star

From what I'm hearing from other sources this answer was inaccurate and totally fails to take into consideration how our cluster is being used. I disagree with it being tagged "best answer".

avatar

The two properties hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager and hive.support.concurrency = true are set for ACID tables. External tables cannot be ACID tables as the ACID compactor cannot control the data managed by them.

avatar
Rising Star

I thought those 2 settings pre-dated the introduction of ACID tables. I can understand the "External tables cannot be ACID tables..." part, but I would think those settings could be used to allow users to issue an "exclusive lock" on an external table to prevent reading from it thru hive while external jobs manipulate the underlying files....