<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: hive incremental approach in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/hive-incremental-approach/m-p/56810#M64149</link>
    <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/21351"&gt;@Freakabhi&lt;/a&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;You can consider few more points before choose one of the approach, like...&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;1.&amp;nbsp;&lt;STRONG&gt;Number of records:&lt;/STRONG&gt; approach 1 is fine for very huge records and approach 2 is ok for the less records&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;2. How to handle the issue if something goes wrong? :&lt;/STRONG&gt; The 4th step in approach 2 deletes base table and recreate with new data. Consider you have noticed an issue with data after couple of days, how do you get deleted base_table? if you have answer then go for approach 2&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;3. Approach 3:&lt;/STRONG&gt; You are choosing approach 1 because Hbase supports updates but hive does not support updates (I guess this is your understanding). Yes your understand was correct with old hive version. But Update is available in&amp;nbsp;starting Hive 0.14&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;A href="https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Update" target="_blank"&gt;https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Update&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 03 Jul 2017 02:33:54 GMT</pubDate>
    <dc:creator>saranvisa</dc:creator>
    <dc:date>2017-07-03T02:33:54Z</dc:date>
    <item>
      <title>hive incremental approach</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/hive-incremental-approach/m-p/56808#M64148</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I wanted to get the suggestion on the incremental strategy for tables be implemented :&lt;BR /&gt;We have set of source table which are getting refreshed on the daily basis in the source ( DB2 )&lt;BR /&gt;and we need to refresh then in hive db as well, which approach will you suggest.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Source table have new inserts as well as updates to existing records;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;1) approach 1: USe Hbase to store the data since updates are allowed and build hive external table referring to the same&amp;nbsp;I doubt if this will affect queries using the joins for hive-hbase table with large ORC hive tables?&lt;/P&gt;&lt;P&gt;2) approach 2 : USe 4 step incremental table approach suggested by HDP ?&lt;/P&gt;&lt;P&gt;&lt;A href="https://hortonworks.com/blog/four-step-strategy-incremental-updates-hive/" target="_blank"&gt;https://hortonworks.com/blog/four-step-strategy-incremental-updates-hive/&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 11:52:39 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/hive-incremental-approach/m-p/56808#M64148</guid>
      <dc:creator>Freakabhi</dc:creator>
      <dc:date>2022-09-16T11:52:39Z</dc:date>
    </item>
    <item>
      <title>Re: hive incremental approach</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/hive-incremental-approach/m-p/56810#M64149</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/21351"&gt;@Freakabhi&lt;/a&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;You can consider few more points before choose one of the approach, like...&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;1.&amp;nbsp;&lt;STRONG&gt;Number of records:&lt;/STRONG&gt; approach 1 is fine for very huge records and approach 2 is ok for the less records&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;2. How to handle the issue if something goes wrong? :&lt;/STRONG&gt; The 4th step in approach 2 deletes base table and recreate with new data. Consider you have noticed an issue with data after couple of days, how do you get deleted base_table? if you have answer then go for approach 2&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;3. Approach 3:&lt;/STRONG&gt; You are choosing approach 1 because Hbase supports updates but hive does not support updates (I guess this is your understanding). Yes your understand was correct with old hive version. But Update is available in&amp;nbsp;starting Hive 0.14&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;A href="https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Update" target="_blank"&gt;https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Update&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 03 Jul 2017 02:33:54 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/hive-incremental-approach/m-p/56810#M64149</guid>
      <dc:creator>saranvisa</dc:creator>
      <dc:date>2017-07-03T02:33:54Z</dc:date>
    </item>
  </channel>
</rss>

