<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Sqoop Hive-import not deleting old data in warehouse in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Sqoop-Hive-import-not-deleting-old-data-in-warehouse/m-p/301969#M220952</link>
    <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;when importing data from DB2 via Sqoop to Hive the stored data in /warehouse/tablespace/managed/hive/databasename/tablename/ is steadily growing.&lt;/P&gt;&lt;P&gt;For every import (with --hive-import and --hive-overwrite set) there is a new folder:&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV class="EllipsedText last-File"&gt;"base_000000n" created. Thus the parent folder is steadily growing. Any way to delete the old folders before importing new data with Sqoop?&lt;/DIV&gt;&lt;DIV class="EllipsedText last-File"&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class="EllipsedText last-File"&gt;regards&lt;/DIV&gt;&lt;/DIV&gt;</description>
    <pubDate>Thu, 27 Aug 2020 09:13:25 GMT</pubDate>
    <dc:creator>Muffex</dc:creator>
    <dc:date>2020-08-27T09:13:25Z</dc:date>
    <item>
      <title>Sqoop Hive-import not deleting old data in warehouse</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Sqoop-Hive-import-not-deleting-old-data-in-warehouse/m-p/301969#M220952</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;when importing data from DB2 via Sqoop to Hive the stored data in /warehouse/tablespace/managed/hive/databasename/tablename/ is steadily growing.&lt;/P&gt;&lt;P&gt;For every import (with --hive-import and --hive-overwrite set) there is a new folder:&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV class="EllipsedText last-File"&gt;"base_000000n" created. Thus the parent folder is steadily growing. Any way to delete the old folders before importing new data with Sqoop?&lt;/DIV&gt;&lt;DIV class="EllipsedText last-File"&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class="EllipsedText last-File"&gt;regards&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Thu, 27 Aug 2020 09:13:25 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Sqoop-Hive-import-not-deleting-old-data-in-warehouse/m-p/301969#M220952</guid>
      <dc:creator>Muffex</dc:creator>
      <dc:date>2020-08-27T09:13:25Z</dc:date>
    </item>
    <item>
      <title>Re: Sqoop Hive-import not deleting old data in warehouse</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Sqoop-Hive-import-not-deleting-old-data-in-warehouse/m-p/301981#M220963</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/69195"&gt;@Muffex&lt;/a&gt;&amp;nbsp; &amp;nbsp; My recommendation for ingestion process is to always use staging/temporary tables which are managed separately from the master table the data needs to arrive in. &amp;nbsp; This allows you operate on the staging tables before or after those results are added to the master table w/o effecting the master table. &amp;nbsp; &amp;nbsp;In your use case, your ingestion process would sqoop to temp, &amp;nbsp;insert from temp to master table, then drop temp location. &amp;nbsp; &amp;nbsp;In some of my past implementations of this manner, the temp tables were organized hourly, and they stay active for at least 7 days before a decoupled cleanup job removes anything 7 days old. &amp;nbsp; This idea was done for auditing purposes, but normally I would create and destroy the data during the ingestion procedure.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. &amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Steven&amp;nbsp;@ DFHZ&lt;/P&gt;</description>
      <pubDate>Thu, 27 Aug 2020 13:18:32 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Sqoop-Hive-import-not-deleting-old-data-in-warehouse/m-p/301981#M220963</guid>
      <dc:creator>stevenmatison</dc:creator>
      <dc:date>2020-08-27T13:18:32Z</dc:date>
    </item>
    <item>
      <title>Re: Sqoop Hive-import not deleting old data in warehouse</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Sqoop-Hive-import-not-deleting-old-data-in-warehouse/m-p/302200#M221024</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/60150"&gt;@stevenmatison&lt;/a&gt;&amp;nbsp;Thanks for your answer. As my tables are relatively small and only used to duplicate existing data - is there any way to remove the existing folders before importing new data?&lt;BR /&gt;regards&lt;/P&gt;</description>
      <pubDate>Tue, 01 Sep 2020 16:16:16 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Sqoop-Hive-import-not-deleting-old-data-in-warehouse/m-p/302200#M221024</guid>
      <dc:creator>Muffex</dc:creator>
      <dc:date>2020-09-01T16:16:16Z</dc:date>
    </item>
  </channel>
</rss>

