Support Questions

Find answers, ask questions, and share your expertise

Question regarding Hive concurrency with insert overwrite

New Contributor

I have a couple of tables:

TBL_A (COL_1, COL_2) and

TBL_B (COL_1, COL_2) and

a view TBL_A_B_VIEW (select COL_1, COL_2 from TBL_A union all select COL_1, COL_2 from TBL_B).

 

Periodically, I want to move data from TBL_B into TBL_A and while the move is happening I want to avoid duplicates being returned when users are querying the view. If I do an insert overwrite:

 

FROM TBL_B 
INSERT INTO TABLE TBL_A SELECT COL_1, COL_2
INSERT OVERWRITE TABLE TBL_B SELECT COL_1, COL_2 WHERE 1=0

 

Hive Version: 1.10
hive.support.concurrency: true

 

Questions:
1. While the insert overwrite is running, is it possible that anyone querying the view will get duplicate rows? My understanding is it should not because of the way hive locking works.
2. Can the insert into either of the tables happen partially in case of unexpected errors/failures?
3. Is it possible that the data is moved from TBL_B into TBL_A but TBL_B is not updated at all?

1 REPLY 1

Guru
1. yes, table locking (exclusive lock) will be in place to prevent users from reading it

2. it should be either successful with data or no data at all, won't be partial

3. I think it is possible, but not 100% sure
Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.