Reply
Highlighted
New Contributor
Posts: 1
Registered: ‎08-14-2018

Question regarding Hive concurrency with insert overwrite

I have a couple of tables:

TBL_A (COL_1, COL_2) and

TBL_B (COL_1, COL_2) and

a view TBL_A_B_VIEW (select COL_1, COL_2 from TBL_A union all select COL_1, COL_2 from TBL_B).

 

Periodically, I want to move data from TBL_B into TBL_A and while the move is happening I want to avoid duplicates being returned when users are querying the view. If I do an insert overwrite:

 

FROM TBL_B 
INSERT INTO TABLE TBL_A SELECT COL_1, COL_2
INSERT OVERWRITE TABLE TBL_B SELECT COL_1, COL_2 WHERE 1=0

 

Hive Version: 1.10
hive.support.concurrency: true

 

Questions:
1. While the insert overwrite is running, is it possible that anyone querying the view will get duplicate rows? My understanding is it should not because of the way hive locking works.
2. Can the insert into either of the tables happen partially in case of unexpected errors/failures?
3. Is it possible that the data is moved from TBL_B into TBL_A but TBL_B is not updated at all?

Cloudera Employee
Posts: 375
Registered: ‎03-23-2015

Re: Question regarding Hive concurrency with insert overwrite

1. yes, table locking (exclusive lock) will be in place to prevent users from reading it

2. it should be either successful with data or no data at all, won't be partial

3. I think it is possible, but not 100% sure
Announcements