Created 10-04-2016 09:02 AM
I have a table `old_data` and a table `new_data`. I want to merge these table using so that `old_data` gets updated with `new_data`
1. Rows in old_data stay there
2. New rows in new_data get added to old_data
3. unique key is `id` so rows with `id` in `new_data` should update existing ones in `old_data`
I think it is possible using join so I can do something like
INSERT OVERWRITE old_data SELECT..Example:
Table a: id count 1 2 2 19 3 4
Table b:
id count 2 22 5 7
I need a SELECT statement that gives me
id count 1 2 2 22 3 4 5 7
Created 10-04-2016 11:24 AM
If you do not have to worry about partitions, it is as you state:
INSERT OVERWRITE old_data SELECT <statement> FROM new_data;
If you have a partition you must specify it as
INSERT OVERWRITE old_data PARTITION (id = <value>) SELECT <statement filtering by id> FROM new_data;
Note for the SELECT statement you have to select the same columns and column order as those you are inserting into.
See the following for more color:
https://community.hortonworks.com/questions/28683/overwriting-a-column-in-hive.html
https://community.hortonworks.com/questions/5579/insert-overwrite-of-2-gb-data.html
Created 10-04-2016 11:33 AM
PLease check the update. I think I didn't explain it properly
Created 10-04-2016 11:34 AM
PLease check the update. I think I didn't explain it properly
Created 10-04-2016 06:34 PM
Assuming 'id' is never null in either table
select
case when s.id is null then t.id else s.id end,
case when s.id is null then t.count else s.count end
from old_data t full outer join new_data s on t.id = s.id
though what you really want is the MERGE statement which is WIP (https://issues.apache.org/jira/browse/HIVE-10924)