Support Questions

simran_k · ‎10-04-2016

I have a table `old_data` and a table `new_data`. I want to merge these table using so that `old_data` gets updated with `new_data`

1. Rows in old_data stay there

2. New rows in new_data get added to old_data

3. unique key is `id` so rows with `id` in `new_data` should update existing ones in `old_data`

I think it is possible using join so I can do something like

INSERT OVERWRITE old_data
SELECT..Example:

Table a:
id          count 
1             2
2            19 
3              4

Table b:

    id             count
    2              22
    5              7

I need a SELECT statement that gives me

   id            count 
    1              2 
    2              22
    3              4
    5              7

gkeys · ‎10-04-2016

If you do not have to worry about partitions, it is as you state:

INSERT OVERWRITE old_data SELECT <statement> FROM new_data;

If you have a partition you must specify it as

INSERT OVERWRITE old_data PARTITION (id = <value>) SELECT <statement filtering by id> FROM new_data;

Note for the SELECT statement you have to select the same columns and column order as those you are inserting into.

See the following for more color:

simran_k · ‎10-04-2016

PLease check the update. I think I didn't explain it properly

simran_k · ‎10-04-2016

PLease check the update. I think I didn't explain it properly

ekoifman · ‎10-04-2016

Assuming 'id' is never null in either table

select

case when s.id is null then t.id else s.id end,

case when s.id is null then t.count else s.count end

from old_data t full outer join new_data s on t.id = s.id

though what you really want is the MERGE statement which is WIP (https://issues.apache.org/jira/browse/HIVE-10924)

insert or update if exists in hive