<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Impala compute incremental stats on specific columns in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Impala-compute-incremental-stats-on-specific-columns/m-p/268560#M206281</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I want to gather stats on a big partition table,&lt;/P&gt;&lt;P&gt;but want to do it only on some of the partitions and not on all the columns because it can take lots of data.&lt;/P&gt;&lt;P&gt;I don't see in the documentation of "compute&amp;nbsp;incremental stats" option to do it,&lt;/P&gt;&lt;P&gt;How can I run stats only on some of the partitions and some/none of the columns?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
    <pubDate>Wed, 21 Aug 2019 15:17:08 GMT</pubDate>
    <dc:creator>hores</dc:creator>
    <dc:date>2019-08-21T15:17:08Z</dc:date>
    <item>
      <title>Impala compute incremental stats on specific columns</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Impala-compute-incremental-stats-on-specific-columns/m-p/268560#M206281</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I want to gather stats on a big partition table,&lt;/P&gt;&lt;P&gt;but want to do it only on some of the partitions and not on all the columns because it can take lots of data.&lt;/P&gt;&lt;P&gt;I don't see in the documentation of "compute&amp;nbsp;incremental stats" option to do it,&lt;/P&gt;&lt;P&gt;How can I run stats only on some of the partitions and some/none of the columns?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Wed, 21 Aug 2019 15:17:08 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Impala-compute-incremental-stats-on-specific-columns/m-p/268560#M206281</guid>
      <dc:creator>hores</dc:creator>
      <dc:date>2019-08-21T15:17:08Z</dc:date>
    </item>
    <item>
      <title>Re: Impala compute incremental stats on specific columns</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Impala-compute-incremental-stats-on-specific-columns/m-p/268923#M206495</link>
      <description>&lt;P&gt;&lt;SPAN&gt;How can I run stats &lt;U&gt;&lt;STRONG&gt;only on some of the partitions&lt;/STRONG&gt;&lt;/U&gt; and &lt;EM&gt;some/none of the columns&lt;/EM&gt;?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/24831"&gt;@hores&lt;/a&gt;&amp;nbsp; in order to take stats for a certain partition&amp;nbsp; as you mentioned above you have to run the following command&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;COMPUTE INCREMENTAL STATS [db_name.]table_name [PARTITION (partition_spec)]&lt;/PRE&gt;&lt;PRE&gt;partition_spec ::= partition_col=constant_value&lt;/PRE&gt;&lt;P&gt;For further info you can read the documentation :&amp;nbsp;&lt;A href="https://www.cloudera.com/documentation/enterprise/5-7-x/topics/impala_compute_stats.html" target="_blank" rel="noopener"&gt;impala_compute_stats.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 26 Aug 2019 09:08:46 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Impala-compute-incremental-stats-on-specific-columns/m-p/268923#M206495</guid>
      <dc:creator>eMazarakis</dc:creator>
      <dc:date>2019-08-26T09:08:46Z</dc:date>
    </item>
    <item>
      <title>Re: Impala compute incremental stats on specific columns</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Impala-compute-incremental-stats-on-specific-columns/m-p/269067#M206608</link>
      <description>&lt;P&gt;Thanks&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/32123"&gt;@eMazarakis&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;but I mean stats on specific partitions AND specific columns (BOTH).&lt;/P&gt;&lt;P&gt;If I'll run as you suggest It will collect statistics on all the columns which we don't want,&lt;/P&gt;&lt;P&gt;So how can I collect stats on specific partitions AND specific columns?&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 27 Aug 2019 14:27:57 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Impala-compute-incremental-stats-on-specific-columns/m-p/269067#M206608</guid>
      <dc:creator>hores</dc:creator>
      <dc:date>2019-08-27T14:27:57Z</dc:date>
    </item>
    <item>
      <title>Re: Impala compute incremental stats on specific columns</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Impala-compute-incremental-stats-on-specific-columns/m-p/269115#M206643</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/24831"&gt;@hores&lt;/a&gt;&amp;nbsp; As you can see the doc there is nothing about computing stats for specific table-columns. Stats are for the whole table.&lt;/P&gt;&lt;P&gt;How can you know which info for statistics impalad daemons might want to use during the join queries ?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 28 Aug 2019 07:35:51 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Impala-compute-incremental-stats-on-specific-columns/m-p/269115#M206643</guid>
      <dc:creator>eMazarakis</dc:creator>
      <dc:date>2019-08-28T07:35:51Z</dc:date>
    </item>
    <item>
      <title>Re: Impala compute incremental stats on specific columns</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Impala-compute-incremental-stats-on-specific-columns/m-p/269118#M206646</link>
      <description>You can't specify columns as Impala will collect all, but you can do it at partition level.</description>
      <pubDate>Wed, 28 Aug 2019 07:38:15 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Impala-compute-incremental-stats-on-specific-columns/m-p/269118#M206646</guid>
      <dc:creator>EricL</dc:creator>
      <dc:date>2019-08-28T07:38:15Z</dc:date>
    </item>
    <item>
      <title>Re: Impala compute incremental stats on specific columns</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Impala-compute-incremental-stats-on-specific-columns/m-p/269271#M206747</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/10115"&gt;@EricL&lt;/a&gt;&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/32123"&gt;@eMazarakis&lt;/a&gt;&amp;nbsp;We have tables with lots of non-filtered columns, so I know we don't want to collect statistics on them. Impala docs say that:&lt;/P&gt;&lt;P&gt;"&lt;SPAN&gt;For a table with a huge number of partitions and many columns, the approximately 400 bytes of metadata per column per partition can add up to significant memory overhead, as it must be cached on the C&lt;/SPAN&gt;&lt;SPAN class="keyword cmdname"&gt;atalogD&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;host and on every I&lt;/SPAN&gt;&lt;SPAN class="keyword cmdname"&gt;mpalaD&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;host that is eligible to be a coordinator. If this metadata for all tables combined exceeds 2 GB, you might experience service downtime."&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;so for me, it's strange user don't have options to minimize the statistics on tables.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Hive has this option but if I use it it won't sync to Impala:&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;If you run the Hive statement&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;ANALYZE TABLE COMPUTE STATISTICS FOR COLUMNS&lt;SPAN&gt;, Impala can only use the resulting column statistics if the table is unpartitioned. Impala cannot use Hive-generated column statistics for a partitioned table."&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 29 Aug 2019 07:37:18 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Impala-compute-incremental-stats-on-specific-columns/m-p/269271#M206747</guid>
      <dc:creator>hores</dc:creator>
      <dc:date>2019-08-29T07:37:18Z</dc:date>
    </item>
    <item>
      <title>Re: Impala compute incremental stats on specific columns</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Impala-compute-incremental-stats-on-specific-columns/m-p/269276#M206749</link>
      <description>&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/24831"&gt;@hores&lt;/a&gt;,&lt;BR /&gt;&lt;BR /&gt;You are right! Looks like latest Impala in CDH6.x supports column level stats:&lt;BR /&gt;&lt;A href="https://www.cloudera.com/documentation/enterprise/latest/topics/impala_compute_stats.html" target="_blank"&gt;https://www.cloudera.com/documentation/enterprise/latest/topics/impala_compute_stats.html&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;COMPUTE STATS [db_name.]table_name [ ( column_list ) ] [TABLESAMPLE SYSTEM(percentage) [REPEATABLE(seed)]]&lt;BR /&gt;&lt;BR /&gt;column_list ::= column_name [ , column_name, ... ]&lt;BR /&gt;&lt;BR /&gt;What version are you using?&lt;BR /&gt;&lt;BR /&gt;Cheers&lt;BR /&gt;Eric</description>
      <pubDate>Thu, 29 Aug 2019 08:20:18 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Impala-compute-incremental-stats-on-specific-columns/m-p/269276#M206749</guid>
      <dc:creator>EricL</dc:creator>
      <dc:date>2019-08-29T08:20:18Z</dc:date>
    </item>
    <item>
      <title>Re: Impala compute incremental stats on specific columns</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Impala-compute-incremental-stats-on-specific-columns/m-p/269311#M206760</link>
      <description>&lt;P&gt;It only supports on table stats but not on per partitions stats (incremental stats),&lt;/P&gt;&lt;P&gt;it says in your link:&lt;/P&gt;&lt;P&gt;"&lt;SPAN&gt;For non-incremental&amp;nbsp;&lt;/SPAN&gt;COMPUTE STATS&lt;SPAN&gt;&amp;nbsp;statement, the columns for which statistics are computed can be specified with an optional comma-separated list of columns."&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;So it looks like column specific is only on a table without partitions (&lt;SPAN&gt;non-incremental&lt;/SPAN&gt;)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;It really strange that it works only in this way&lt;/P&gt;</description>
      <pubDate>Thu, 29 Aug 2019 09:11:02 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Impala-compute-incremental-stats-on-specific-columns/m-p/269311#M206760</guid>
      <dc:creator>hores</dc:creator>
      <dc:date>2019-08-29T09:11:02Z</dc:date>
    </item>
    <item>
      <title>Re: Impala compute incremental stats on specific columns</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Impala-compute-incremental-stats-on-specific-columns/m-p/277966#M207749</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;P&gt;So it looks like column specific is only on a table without partitions (&lt;SPAN&gt;non-incremental&lt;/SPAN&gt;)&lt;/P&gt;&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/24831"&gt;@hores&lt;/a&gt;&amp;nbsp;that's incorrect, non-incremental compute stats works on partitioned tables and is generally the preferred method for collecting stats on partitioned tables.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We've generally tried to steer people away from incremental stats because of the size issues on large tables,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;It would also be error-prone to use correctly and complex to implement - what happens if you compute incremental stats with different subsets of the columns? You can end up with different subsets of the columns on different partitions and then you have to somehow reconcile it all each time.&lt;/P&gt;</description>
      <pubDate>Fri, 20 Sep 2019 17:11:27 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Impala-compute-incremental-stats-on-specific-columns/m-p/277966#M207749</guid>
      <dc:creator>Tim Armstrong</dc:creator>
      <dc:date>2019-09-20T17:11:27Z</dc:date>
    </item>
  </channel>
</rss>

