<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: How do I drop empty Impala partitions? in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/How-do-I-drop-empty-Impala-partitions/m-p/351812#M236378</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/83846"&gt;@amallegni&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Sorry for the delay in answering.&lt;BR /&gt;And thank you for your response!&lt;/P&gt;</description>
    <pubDate>Fri, 09 Sep 2022 02:23:59 GMT</pubDate>
    <dc:creator>yassan</dc:creator>
    <dc:date>2022-09-09T02:23:59Z</dc:date>
    <item>
      <title>How do I drop empty Impala partitions?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-do-I-drop-empty-Impala-partitions/m-p/346496#M234895</link>
      <description>&lt;P&gt;Hi, All !&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I am having the same problem as below.&lt;/P&gt;
&lt;P&gt;&lt;A href="https://stackoverflow.com/questions/62405549/drop-empty-impala-partitions" target="_blank" rel="noopener"&gt;Drop empty Impala partitions - Stack Overflow&lt;/A&gt;&lt;/P&gt;
&lt;DIV class="s-prose js-post-body"&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;Impala external table partitions still show up in stats with row count 0 after deleting the data in HDFS and altering (like &lt;SPAN class="code"&gt;ALTER TABLE table RECOVER PARTITIONS&lt;/SPAN&gt;) refreshing (&lt;SPAN class="code"&gt;REFRESH table&lt;/SPAN&gt;) and invalidation of metadata.&lt;/P&gt;
&lt;P&gt;Trying to drop partitions one by one works, but there are tens of partitions which should be removed and it would be quite tedious.&lt;/P&gt;
&lt;P&gt;Dropping and recreating the table would also be an option but that way all the statistics would be dropped together with the table.&lt;/P&gt;
&lt;P&gt;Is there any kind of other options in impala to get this done?&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Is this only a workaround using Hive below?&lt;/P&gt;
&lt;/DIV&gt;
&lt;DIV class="mt24 mb12"&gt;
&lt;DIV class="post-taglist d-flex gs4 gsy fd-column"&gt;
&lt;DIV class="d-flex ps-relative fw-wrap"&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;Found a workaround through HIVE.&lt;/P&gt;
&lt;P&gt;By issuing &lt;SPAN class="code"&gt;MSCK REPAIR TABLE tablename SYNC PARTITIONS&lt;/SPAN&gt; then refreshing the table in impala, the empty partitions disappear.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Also, I could not find an Issue in &lt;A href="https://issues.apache.org/jira/browse/IMPALA" target="_self"&gt;Impala JIRA&lt;/A&gt;.&lt;BR /&gt;If anyone knows of an Issue in Impala JIRA, please let me know.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;My environment is below.&lt;/P&gt;
&lt;P&gt;CDH v6.3.2&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;reference:&amp;nbsp;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;A href="https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/impala_refresh.html" target="_blank" rel="noopener"&gt;REFRESH Statement | 6.3.x | Cloudera Documentation&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/impala_alter_table.html#alter_table" target="_blank" rel="noopener"&gt;ALTER TABLE Statement | 6.3.x | Cloudera Documentation&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;</description>
      <pubDate>Fri, 16 Sep 2022 14:46:26 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-do-I-drop-empty-Impala-partitions/m-p/346496#M234895</guid>
      <dc:creator>yassan</dc:creator>
      <dc:date>2022-09-16T14:46:26Z</dc:date>
    </item>
    <item>
      <title>Re: How do I drop empty Impala partitions?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-do-I-drop-empty-Impala-partitions/m-p/348239#M235344</link>
      <description>&lt;P&gt;Hi Yassan,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;first recommendation I have: when you need to drop a partition, it is better to do it via SQL statement either on Impala/Hive or with Spark SQL.&lt;BR /&gt;For example, assuming that "year" is my partitioning field:&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;alter table my_partitioned_table drop partition (year = 2020);&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If you drop a partition at file system level, there are two things you should do in order to have everything aligned on Impala:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;first run "MSCK REPAIR TABLE my_partitioned_table" on Hive, in order to refresh the metastore with the correct partitions' information&lt;/LI&gt;&lt;LI&gt;once point 1 is done, run "INVALIDATE METADATA" on Impala, so to refresh Impala cache&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;Let me know if this helps.&lt;BR /&gt;Regards&lt;/P&gt;</description>
      <pubDate>Wed, 20 Jul 2022 15:16:25 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-do-I-drop-empty-Impala-partitions/m-p/348239#M235344</guid>
      <dc:creator>amallegni</dc:creator>
      <dc:date>2022-07-20T15:16:25Z</dc:date>
    </item>
    <item>
      <title>Re: How do I drop empty Impala partitions?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-do-I-drop-empty-Impala-partitions/m-p/348453#M235371</link>
      <description>&lt;P&gt;&lt;FONT color="#000000"&gt;Hi&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/83846"&gt;@amallegni&lt;/a&gt; ,&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;Thank you for your response.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;My important points are as follows&lt;/FONT&gt;&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;&lt;FONT color="#000000"&gt;&amp;nbsp;impala external table partitions&lt;/FONT&gt;&lt;/LI&gt;&lt;LI&gt;&lt;FONT color="#000000"&gt;Trying to drop partitions one by one work, but there are tens of partitions which should be removed and it would be quite tedious.&lt;/FONT&gt;&lt;/LI&gt;&lt;LI&gt;&lt;FONT color="#000000"&gt;&amp;nbsp;I don't want to Dropping and recreating tables.&lt;/FONT&gt;&lt;/LI&gt;&lt;LI&gt;&lt;FONT color="#000000"&gt;&amp;nbsp;is the only way to use Hive's "MSCK REPAIR TABLE tablename SYNC PARTITIONS"&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#000000"&gt;(i.e., can't Impala just complete it?)&lt;/FONT&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;Please let me know if you have any solutions for the above.&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 21 Jul 2022 16:34:34 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-do-I-drop-empty-Impala-partitions/m-p/348453#M235371</guid>
      <dc:creator>yassan</dc:creator>
      <dc:date>2022-07-21T16:34:34Z</dc:date>
    </item>
    <item>
      <title>Re: How do I drop empty Impala partitions?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-do-I-drop-empty-Impala-partitions/m-p/348526#M235382</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/30784"&gt;@yassan&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;let's recap some important concept, then I will add my comment to your points.&lt;BR /&gt;Generally speaking:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;If you issue a SQL statement Hive/Impala, the engine is fully aware of what has been requested and will do everything needed. E.g. if you drop a partition, the engine knows that metadata has to be updated and knows if data has to be purged to or not (e.g. if it's an external table, data on filesystem won't be deleted by default).&lt;BR /&gt;NOTE: if you want drop statements to delete data as well, you would need a managed (non external) table. You might also try to alter your table and set this&amp;nbsp;&lt;EM&gt;TBLPROPERTIES ("external.table.purge"="true")&lt;/EM&gt;. Honestly I'm not sure if this is available in your version of Hive, it is for sure in more up to date versions (e.g. Cloudera Data Platform).&lt;/LI&gt;&lt;LI&gt;If you delete data directly on the filesystem (e.g. via a Spark job or via hdfs CLI), there is no way for Hive/Impala engine to know that it happened unless you explicitly tell them. This is something you can do by launching a "MSCK REPAIR TABLE [...]" on Hive or by launching an "ALTER TABLE tablename DROP PARTITION [...]" either on Hive or Impala.&lt;BR /&gt;Well actually, if you are using Spark you could rely on sparksql and issue a drop partition statement (see the summary at the end of this post).&lt;/LI&gt;&lt;LI&gt;Impala relies on Hive metastore but caches metadata. If you make some changes to metadata via Hive, then you'll have to launch an "INVALIDATE METADATA" on Impala in order to refresh the cache.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;These are key points to be taken into account.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Commenting your last post:&lt;/P&gt;&lt;P&gt;If you have a lot of partitions you have a couple of ways to lower down the effort of launching a number of drop partitions statements:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;of course, you can script it (or you could develop a spark job, or you could maybe come up with some other automation strategy)&lt;/LI&gt;&lt;LI&gt;if the partition field allows for it, you could drop a range of partition with a single statement. E.g. it could be something like "ALTER TABLE my_table_partitioned_by_year DROP PARTITION (year &amp;lt; 2020)". You can do this from Impala if you prefer, so that you won't have to refresh the Impala cache.&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;This will drop partitions but won't ever drop the table.&lt;/P&gt;&lt;P&gt;Now, summarizing everything we've shared so far, you have two alternatives three possible ways to go:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Do it via Impala by using the drop partition SQL statement.&lt;/LI&gt;&lt;LI&gt;Delete data directly on the filesystem and lately tell Impala to drop the partition (drop partition statements in Impala or MSCK REPAIR on Hive + INVALIDATE METADATA on Impala).&lt;/LI&gt;&lt;LI&gt;Use a Spark job and issue a drop partition statement via Spark SQL + INVALIDATE METADATA on Impala (since the Spark job would directly act on the Hive metastore, out of Impala's line of sight).&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Hope this helps&lt;/P&gt;</description>
      <pubDate>Fri, 22 Jul 2022 10:44:46 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-do-I-drop-empty-Impala-partitions/m-p/348526#M235382</guid>
      <dc:creator>amallegni</dc:creator>
      <dc:date>2022-07-22T10:44:46Z</dc:date>
    </item>
    <item>
      <title>Re: How do I drop empty Impala partitions?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-do-I-drop-empty-Impala-partitions/m-p/351812#M236378</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/83846"&gt;@amallegni&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Sorry for the delay in answering.&lt;BR /&gt;And thank you for your response!&lt;/P&gt;</description>
      <pubDate>Fri, 09 Sep 2022 02:23:59 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-do-I-drop-empty-Impala-partitions/m-p/351812#M236378</guid>
      <dc:creator>yassan</dc:creator>
      <dc:date>2022-09-09T02:23:59Z</dc:date>
    </item>
  </channel>
</rss>

