- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
insert or update if exists in hive
- Labels:
-
Apache Hive
Created ‎10-04-2016 09:02 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a table `old_data` and a table `new_data`. I want to merge these table using so that `old_data` gets updated with `new_data`
1. Rows in old_data stay there
2. New rows in new_data get added to old_data
3. unique key is `id` so rows with `id` in `new_data` should update existing ones in `old_data`
I think it is possible using join so I can do something like
INSERT OVERWRITE old_data SELECT..Example:
Table a: id count 1 2 2 19 3 4
Table b:
id count 2 22 5 7
I need a SELECT statement that gives me
id count 1 2 2 22 3 4 5 7
Created ‎10-04-2016 11:24 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you do not have to worry about partitions, it is as you state:
INSERT OVERWRITE old_data SELECT <statement> FROM new_data;
If you have a partition you must specify it as
INSERT OVERWRITE old_data PARTITION (id = <value>) SELECT <statement filtering by id> FROM new_data;
Note for the SELECT statement you have to select the same columns and column order as those you are inserting into.
See the following for more color:
https://community.hortonworks.com/questions/28683/overwriting-a-column-in-hive.html
https://community.hortonworks.com/questions/5579/insert-overwrite-of-2-gb-data.html
Created ‎10-04-2016 11:33 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
PLease check the update. I think I didn't explain it properly
Created ‎10-04-2016 11:34 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
PLease check the update. I think I didn't explain it properly
Created ‎10-04-2016 06:34 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Assuming 'id' is never null in either table
select
case when s.id is null then t.id else s.id end,
case when s.id is null then t.count else s.count end
from old_data t full outer join new_data s on t.id = s.id
though what you really want is the MERGE statement which is WIP (https://issues.apache.org/jira/browse/HIVE-10924)
