I have a couple of tables:
TBL_A (COL_1, COL_2) and
TBL_B (COL_1, COL_2) and
a view TBL_A_B_VIEW (select COL_1, COL_2 from TBL_A union all select COL_1, COL_2 from TBL_B).
Periodically, I want to move data from TBL_B into TBL_A and while the move is happening I want to avoid duplicates being returned when users are querying the view. If I do an insert overwrite:
FROM TBL_B
INSERT INTO TABLE TBL_A SELECT COL_1, COL_2
INSERT OVERWRITE TABLE TBL_B SELECT COL_1, COL_2 WHERE 1=0
Hive Version: 1.10
hive.support.concurrency: true
Questions:
1. While the insert overwrite is running, is it possible that anyone querying the view will get duplicate rows? My understanding is it should not because of the way hive locking works.
2. Can the insert into either of the tables happen partially in case of unexpected errors/failures?
3. Is it possible that the data is moved from TBL_B into TBL_A but TBL_B is not updated at all?