Member since
11-16-2018
1
Post
0
Kudos Received
0
Solutions
11-19-2018
10:21 PM
Here's the detailed implementation of slowly changing dimension type 2 in Hive using exclusive join approach. Assuming that the source is sending a complete data file i.e. old, updated and new records. Steps:
Load the recent file data to STG table Select all the expired records from HIST table select * from HIST_TAB where exp_dt != '2099-12-31' Select all the records which are not changed from STG and HIST using inner join and filter on HIST.column = STG.column as below select hist.* from HIST_TAB hist
inner join STG_TAB stg
on hist.key = stg.key
where hist.column = stg.column Select all the new and updated records which are changed from STG_TAB using exclusive left join with HIST_TAB and set expiry and effective date as below select stg.*, eff_dt (yyyy-MM-dd), exp_dt (2099-12-31)
from STG_TAB stg
left join
(select * from HIST_TAB where exp_dt = '2099-12-31') hist
on hist.key = stg.key
where hist.key is null
or hist.column != stg.column Select all updated old records from the HIST table using exclusive left join with STG table and set their expiry date as shown below: select hist.*, exp_dt(yyyy-MM-dd) from
(select * from HIST_TAB where exp_dt = '2099-12-31') hist
left join STG_TAB stg
on hist.key= stg.key
where hist.key is null
or hist.column!= stg.column unionall queries from 2-5 and insert overwrite result to HIST table More detailed implementation of SCD type 2 can be found here- https://github.com/sahilbhange/slowly-changing-dimension Hope this helps!
... View more