<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Best and Easy way to implement and create SCD2 in Hive and in Pig? in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Best-and-Easy-way-to-implement-and-create-SCD2-in-Hive-and/m-p/182060#M65103</link>
    <description>&lt;A rel="user" href="https://community.cloudera.com/users/22218/unnhadoop.html" nodeid="22218"&gt;@Upendra N&lt;/A&gt;&lt;P&gt;I think you probably realize what makes SCD type 2 difficult in Hadoop (hive/Pig) is that you cannot update records (With new Hive ACID you can but under the hood its doing the magic, that you can also do your self).&lt;/P&gt;&lt;P&gt;Rather than reprinting the process here, here is one link that describes implementing doing SCD Type 2 in Hadoop using Hive. Hope this helps.&lt;/P&gt;&lt;P&gt;&lt;A href="https://www.softserveinc.com/en-us/tech/blogs/process-slowly-changing-dimensions-hive/" target="_blank"&gt;https://www.softserveinc.com/en-us/tech/blogs/process-slowly-changing-dimensions-hive/&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 18 Jul 2017 08:33:21 GMT</pubDate>
    <dc:creator>mqureshi</dc:creator>
    <dc:date>2017-07-18T08:33:21Z</dc:date>
    <item>
      <title>Best and Easy way to implement and create SCD2 in Hive and in Pig?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Best-and-Easy-way-to-implement-and-create-SCD2-in-Hive-and/m-p/182059#M65102</link>
      <description>&lt;P&gt;How to write and learn SCD 2 type in Hive and Pig?&lt;/P&gt;</description>
      <pubDate>Tue, 18 Jul 2017 07:00:56 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Best-and-Easy-way-to-implement-and-create-SCD2-in-Hive-and/m-p/182059#M65102</guid>
      <dc:creator>horton</dc:creator>
      <dc:date>2017-07-18T07:00:56Z</dc:date>
    </item>
    <item>
      <title>Re: Best and Easy way to implement and create SCD2 in Hive and in Pig?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Best-and-Easy-way-to-implement-and-create-SCD2-in-Hive-and/m-p/182060#M65103</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/22218/unnhadoop.html" nodeid="22218"&gt;@Upendra N&lt;/A&gt;&lt;P&gt;I think you probably realize what makes SCD type 2 difficult in Hadoop (hive/Pig) is that you cannot update records (With new Hive ACID you can but under the hood its doing the magic, that you can also do your self).&lt;/P&gt;&lt;P&gt;Rather than reprinting the process here, here is one link that describes implementing doing SCD Type 2 in Hadoop using Hive. Hope this helps.&lt;/P&gt;&lt;P&gt;&lt;A href="https://www.softserveinc.com/en-us/tech/blogs/process-slowly-changing-dimensions-hive/" target="_blank"&gt;https://www.softserveinc.com/en-us/tech/blogs/process-slowly-changing-dimensions-hive/&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 18 Jul 2017 08:33:21 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Best-and-Easy-way-to-implement-and-create-SCD2-in-Hive-and/m-p/182060#M65103</guid>
      <dc:creator>mqureshi</dc:creator>
      <dc:date>2017-07-18T08:33:21Z</dc:date>
    </item>
    <item>
      <title>Re: Best and Easy way to implement and create SCD2 in Hive and in Pig?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Best-and-Easy-way-to-implement-and-create-SCD2-in-Hive-and/m-p/182061#M65104</link>
      <description>&lt;P&gt;Here's the detailed implementation of slowly changing dimension type 2 in Hive using exclusive join approach.&lt;/P&gt;&lt;P&gt;Assuming that the source is sending a complete data file i.e. old, updated and new records.&lt;/P&gt;&lt;P&gt;Steps:&lt;/P&gt;&lt;OL&gt;
&lt;LI&gt;Load the recent file data to STG table&lt;/LI&gt;&lt;LI&gt;Select all the expired records from HIST table&lt;P&gt;&lt;CODE&gt;select * from HIST_TAB where exp_dt != '2099-12-31'&lt;/CODE&gt;&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;Select all the records which are not changed from STG and HIST using inner join and filter on HIST.column = STG.column as below&lt;P&gt;&lt;CODE&gt;select hist.* from HIST_TAB hist
inner join STG_TAB stg
on hist.key = stg.key
where hist.column  = stg.column&lt;/CODE&gt;&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;Select all the new and updated records which are changed from STG_TAB using exclusive left join with HIST_TAB and set expiry and effective date as below&lt;P&gt;&lt;CODE&gt;select stg.*, eff_dt (yyyy-MM-dd), exp_dt (2099-12-31)
from STG_TAB stg
left join 
(select * from HIST_TAB where exp_dt = '2099-12-31') hist
&lt;BR /&gt;on hist.key = stg.key
where hist.key is null
or hist.column  != stg.column&lt;/CODE&gt;&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;Select all updated old records from the HIST table using exclusive left join with STG table and set their expiry date as shown below:&lt;P&gt;&lt;CODE&gt;select hist.*, exp_dt(yyyy-MM-dd) from
(select * from HIST_TAB where exp_dt = '2099-12-31') hist
left join STG_TAB stg
&lt;BR /&gt;on hist.key= stg.key
where hist.key is null
or hist.column!= stg.column&lt;/CODE&gt;&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;CODE&gt;unionall&lt;/CODE&gt; queries from 2-5 and insert overwrite result to HIST table&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;More detailed implementation of SCD type 2 can be found here-&lt;/P&gt;&lt;P&gt;&lt;A href="https://github.com/sahilbhange/slowly-changing-dimension" target="_blank"&gt;https://github.com/sahilbhange/slowly-changing-dimension&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Hope this helps!&lt;/P&gt;</description>
      <pubDate>Tue, 20 Nov 2018 06:21:39 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Best-and-Easy-way-to-implement-and-create-SCD2-in-Hive-and/m-p/182061#M65104</guid>
      <dc:creator>sahilbhange</dc:creator>
      <dc:date>2018-11-20T06:21:39Z</dc:date>
    </item>
  </channel>
</rss>

