<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Updating hive table with sqoop from mysql table in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Updating-hive-table-with-sqoop-from-mysql-table/m-p/218303#M66629</link>
    <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/17410/ingenieroandresangel.html" nodeid="17410"&gt;@Andres Urrego&lt;/A&gt;,&lt;/P&gt;&lt;P&gt;What you are looking for (UPSERTS) aren't available in SQOOP-import.&lt;/P&gt;&lt;P&gt;There are several approaches on how to actually update data in Hive. One of them is described here:&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_data-access/content/incrementally-updating-hive-table-with-sqoop-and-ext-table.html"&gt;https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_data-access/content/incrementally-updating-hive-table-with-sqoop-and-ext-table.html &lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_data-access/content/incrementally-updating-hive-table-with-sqoop-and-ext-table.html"&gt;&lt;/A&gt;Other approaches are also using side load and merge as post-sqoop or scheduled jobs/processes.&lt;/P&gt;&lt;P&gt;You can also check Hive ACID transactions, or using Hive-Hbase integration package.&lt;/P&gt;&lt;P&gt;Choosing right approach is not trivial and depends on: initial volume, incremental volumes, frequency or incremental jobs, probability of updates, ability to identify uniqueness of records, acceptable latency, etc...&lt;/P&gt;</description>
    <pubDate>Wed, 16 Aug 2017 05:54:02 GMT</pubDate>
    <dc:creator>eberezitsky</dc:creator>
    <dc:date>2017-08-16T05:54:02Z</dc:date>
    <item>
      <title>Updating hive table with sqoop from mysql table</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Updating-hive-table-with-sqoop-from-mysql-table/m-p/218302#M66628</link>
      <description>&lt;P&gt;Hi everyone, I have already a hive table called roles. I need to update this table with info coming up from mysql. So, I have used this script think that it will add and update new data on my hive table:`&lt;/P&gt;&lt;PRE&gt;sqoop import --connect jdbc:mysql://xxxx/retail_export --username xxxx --password xxx \ --table roles --split-by id_emp --check-column id_emp --last-value 5 --incremental append \ --target-dir /user/ingenieroandresangel/hive/roles --hive-import --hive-database poc --hive-table roles&lt;/PRE&gt;&lt;P&gt;Unfortunatelly that only insert the new data but I can't update the record that already exits. before you ask a couple of statements:&lt;/P&gt;&lt;P&gt;* the table doesn't have a PK&lt;/P&gt;&lt;P&gt;* if i dont specify --last-value as a parametter I will get duplicated records for those who already exist.&lt;/P&gt;&lt;P&gt;How could I figure it out without apply a truncate table or recreate the table using a PK? exist the way?&lt;/P&gt;&lt;P&gt;thanks guys.&lt;/P&gt;</description>
      <pubDate>Tue, 15 Aug 2017 02:54:22 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Updating-hive-table-with-sqoop-from-mysql-table/m-p/218302#M66628</guid>
      <dc:creator>AndresUrrego</dc:creator>
      <dc:date>2017-08-15T02:54:22Z</dc:date>
    </item>
    <item>
      <title>Re: Updating hive table with sqoop from mysql table</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Updating-hive-table-with-sqoop-from-mysql-table/m-p/218303#M66629</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/17410/ingenieroandresangel.html" nodeid="17410"&gt;@Andres Urrego&lt;/A&gt;,&lt;/P&gt;&lt;P&gt;What you are looking for (UPSERTS) aren't available in SQOOP-import.&lt;/P&gt;&lt;P&gt;There are several approaches on how to actually update data in Hive. One of them is described here:&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_data-access/content/incrementally-updating-hive-table-with-sqoop-and-ext-table.html"&gt;https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_data-access/content/incrementally-updating-hive-table-with-sqoop-and-ext-table.html &lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_data-access/content/incrementally-updating-hive-table-with-sqoop-and-ext-table.html"&gt;&lt;/A&gt;Other approaches are also using side load and merge as post-sqoop or scheduled jobs/processes.&lt;/P&gt;&lt;P&gt;You can also check Hive ACID transactions, or using Hive-Hbase integration package.&lt;/P&gt;&lt;P&gt;Choosing right approach is not trivial and depends on: initial volume, incremental volumes, frequency or incremental jobs, probability of updates, ability to identify uniqueness of records, acceptable latency, etc...&lt;/P&gt;</description>
      <pubDate>Wed, 16 Aug 2017 05:54:02 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Updating-hive-table-with-sqoop-from-mysql-table/m-p/218303#M66629</guid>
      <dc:creator>eberezitsky</dc:creator>
      <dc:date>2017-08-16T05:54:02Z</dc:date>
    </item>
  </channel>
</rss>

