<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Impala won't update stats on Hive Avro table in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-won-t-update-stats-on-Hive-Avro-table/m-p/29553#M6544</link>
    <description>&lt;P&gt;Impala is able to handle snappy compressed Avro files, so I don't think that's the problem.&lt;/P&gt;&lt;P&gt;You may be hitting&amp;nbsp;&lt;A href="https://issues.apache.org/jira/browse/HIVE-6308" target="_blank"&gt;https://issues.apache.org/jira/browse/HIVE-6308&lt;/A&gt; since you created the tables through Hive without column defiitions.&lt;/P&gt;&lt;P&gt;You could try to create the tables through Impala, or create them with Hive but with column definitions.&lt;/P&gt;</description>
    <pubDate>Tue, 14 Jul 2015 17:56:52 GMT</pubDate>
    <dc:creator>alex.behm</dc:creator>
    <dc:date>2015-07-14T17:56:52Z</dc:date>
    <item>
      <title>Impala won't update stats on Hive Avro table</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-won-t-update-stats-on-Hive-Avro-table/m-p/29495#M6541</link>
      <description>&lt;P&gt;Using Cloudera Express 5.4.1, I have created a Snappy compressed Avro Hive table partitioned by year, month, day, hour:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;set hive.exec.compress.output=true;
set avro.output.codec=snappy;
CREATE EXTERNAL TABLE my_table
PARTITIONED BY (year smallint, month tinyint, day tinyint, hour tinyint)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION 'hdfs://localhost:8020/my-folder/'
TBLPROPERTIES ('avro.schema.url'='hdfs://localhost/avro/my_table.avsc');
ALTER TABLE my_table ADD IF NOT EXISTS PARTITION (year=2015, month=7, day=1, hour=0);
INVALIDATE METADATA;&lt;/PRE&gt;&lt;P&gt;Then using the impala-shell, I compute the stats for a partition:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;REFRESH my_table;
COMPUTE INCREMENTAL STATS my_table PARTITION (year=2015, month=7, day=1, hour=0);&lt;/PRE&gt;&lt;P&gt;The summary returns as follows:&lt;/P&gt;&lt;PRE&gt;+------------------------------------------+
| summary                                  |
+------------------------------------------+
| Updated 0 partition(s) and 46 column(s). |
+------------------------------------------+&lt;/PRE&gt;&lt;P&gt;Then I run SHOW TABLE STATS my_table; I get the following:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;+-------+-------+-----+------+----------+--------+----------+--------------+-------------------+--------+-------------------+
| year  | month | day | hour | #Rows    | #Files | Size     | Bytes Cached | Cache Replication | Format | Incremental stats |
+-------+-------+-----+------+----------+--------+----------+--------------+-------------------+--------+-------------------+
| 2015  | 7     | 1   | 0    | -1       | 259    | 15.58GB  | NOT CACHED   | NOT CACHED        | AVRO   | false             |&lt;/PRE&gt;&lt;P&gt;This is showing that the stats are not updated. Furthermore, when I run an Impala query and look at the profile I see the following:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;WARNING: The following tables are missing relevant table and/or column statistics. default.my_table&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am not sure this is because my Avro files are SNAPPY compressed and Impala is unable to COMPUTE STATS on Hive tables with compressed Avro files.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any help would be greatly appreciated - thanks!&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 09:33:54 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-won-t-update-stats-on-Hive-Avro-table/m-p/29495#M6541</guid>
      <dc:creator>kylebush</dc:creator>
      <dc:date>2022-09-16T09:33:54Z</dc:date>
    </item>
    <item>
      <title>Re: Impala won't update stats on Hive Avro table</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-won-t-update-stats-on-Hive-Avro-table/m-p/29532#M6542</link>
      <description>&lt;P&gt;I cannot say for sure what is going wrong here, but I suspect that you are hitting an edge case for an empty table/partition.&lt;/P&gt;&lt;P&gt;Have you tried the same thing with a non-empty table/partition?&lt;/P&gt;</description>
      <pubDate>Tue, 14 Jul 2015 01:53:21 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-won-t-update-stats-on-Hive-Avro-table/m-p/29532#M6542</guid>
      <dc:creator>alex.behm</dc:creator>
      <dc:date>2015-07-14T01:53:21Z</dc:date>
    </item>
    <item>
      <title>Re: Impala won't update stats on Hive Avro table</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-won-t-update-stats-on-Hive-Avro-table/m-p/29548#M6543</link>
      <description>The partitions I tested had several non-empty AVRO files in them. Do you think it could be because they are SNAPPY compressed?</description>
      <pubDate>Tue, 14 Jul 2015 13:39:02 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-won-t-update-stats-on-Hive-Avro-table/m-p/29548#M6543</guid>
      <dc:creator>kylebush</dc:creator>
      <dc:date>2015-07-14T13:39:02Z</dc:date>
    </item>
    <item>
      <title>Re: Impala won't update stats on Hive Avro table</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-won-t-update-stats-on-Hive-Avro-table/m-p/29553#M6544</link>
      <description>&lt;P&gt;Impala is able to handle snappy compressed Avro files, so I don't think that's the problem.&lt;/P&gt;&lt;P&gt;You may be hitting&amp;nbsp;&lt;A href="https://issues.apache.org/jira/browse/HIVE-6308" target="_blank"&gt;https://issues.apache.org/jira/browse/HIVE-6308&lt;/A&gt; since you created the tables through Hive without column defiitions.&lt;/P&gt;&lt;P&gt;You could try to create the tables through Impala, or create them with Hive but with column definitions.&lt;/P&gt;</description>
      <pubDate>Tue, 14 Jul 2015 17:56:52 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-won-t-update-stats-on-Hive-Avro-table/m-p/29553#M6544</guid>
      <dc:creator>alex.behm</dc:creator>
      <dc:date>2015-07-14T17:56:52Z</dc:date>
    </item>
    <item>
      <title>Re: Impala won't update stats on Hive Avro table</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-won-t-update-stats-on-Hive-Avro-table/m-p/29619#M6545</link>
      <description>&lt;P&gt;The best solution I found was to run:&lt;/P&gt;&lt;PRE&gt;compute INCREMENTAL STATS my_table;&lt;/PRE&gt;&lt;P&gt;without specifying the partitions.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 16 Jul 2015 01:48:53 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-won-t-update-stats-on-Hive-Avro-table/m-p/29619#M6545</guid>
      <dc:creator>kylebush</dc:creator>
      <dc:date>2015-07-16T01:48:53Z</dc:date>
    </item>
    <item>
      <title>Re: Impala won't update stats on Hive Avro table</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-won-t-update-stats-on-Hive-Avro-table/m-p/29650#M6546</link>
      <description>&lt;P&gt;just in case, did you try to run:&lt;/P&gt;&lt;PRE&gt;invalidate metadata;&lt;/PRE&gt;&lt;P&gt;statement?&lt;/P&gt;</description>
      <pubDate>Thu, 16 Jul 2015 04:02:00 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-won-t-update-stats-on-Hive-Avro-table/m-p/29650#M6546</guid>
      <dc:creator>fil</dc:creator>
      <dc:date>2015-07-16T04:02:00Z</dc:date>
    </item>
    <item>
      <title>Re: Impala won't update stats on Hive Avro table</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-won-t-update-stats-on-Hive-Avro-table/m-p/29658#M6547</link>
      <description>&lt;P&gt;My apologies, but I am losing track of the steps you followed to produce a good/bad outcome with compute stats and compute incremental stats.&lt;/P&gt;&lt;P&gt;Would you be able to list a series of steps that can reproduce the problem (on a non-empty partition)?&lt;/P&gt;</description>
      <pubDate>Thu, 16 Jul 2015 07:05:13 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-won-t-update-stats-on-Hive-Avro-table/m-p/29658#M6547</guid>
      <dc:creator>alex.behm</dc:creator>
      <dc:date>2015-07-16T07:05:13Z</dc:date>
    </item>
    <item>
      <title>Re: Impala won't update stats on Hive Avro table</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-won-t-update-stats-on-Hive-Avro-table/m-p/29665#M6548</link>
      <description>&lt;P&gt;Here are the steps:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Using beeline:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;set hive.exec.compress.output=true;
set avro.output.codec=snappy;

CREATE EXTERNAL TABLE my_table
PARTITIONED BY (year smallint, month tinyint, day tinyint, hour tinyint)
STORED AS AVRO
LOCATION 'hdfs://localhost:8020/my_data'
TBLPROPERTIES ('avro.schema.url'='hdfs://localhost:8020/my_table.avsc');&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;ALTER TABLE my_table&lt;/SPAN&gt; &lt;SPAN&gt;ADD &lt;/SPAN&gt;IF &lt;SPAN&gt;NOT EXISTS &lt;/SPAN&gt;PARTITION (&lt;SPAN&gt;year&lt;/SPAN&gt;=&lt;SPAN&gt;2015&lt;/SPAN&gt;, &lt;SPAN&gt;month&lt;/SPAN&gt;=&lt;SPAN&gt;7&lt;/SPAN&gt;, &lt;SPAN&gt;day&lt;/SPAN&gt;=&lt;SPAN&gt;15&lt;/SPAN&gt;, &lt;SPAN&gt;hour&lt;/SPAN&gt;=&lt;SPAN&gt;0&lt;/SPAN&gt;);&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Then using impala-shell:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;INVALIDATE METADATA my_table;
REFRESH my_table;

COMPUTE INCREMENTAL STATS my_table;&lt;BR /&gt;+------------------------------------------+&lt;BR /&gt;| summary |&lt;BR /&gt;+------------------------------------------+&lt;BR /&gt;| Updated 1 partition(s) and 46 column(s). |&lt;BR /&gt;+------------------------------------------+&lt;/PRE&gt;&lt;P&gt;If I try to compute incremental stats directly on the partition it does not update the partition stats:&lt;/P&gt;&lt;PRE&gt;DROP INCREMENTAL STATS my_table PARTITION (year=2015, month=7, day=15, hour=0);&lt;/PRE&gt;&lt;PRE&gt;COMPUTE INCREMENTAL STATS my_table PARTITION (year=2015, month=7, day=15, hour=0);&lt;BR /&gt;+------------------------------------------+&lt;BR /&gt;| summary |&lt;BR /&gt;+------------------------------------------+&lt;BR /&gt;| Updated 0 partition(s) and 46 column(s). |&lt;BR /&gt;+------------------------------------------+&lt;/PRE&gt;</description>
      <pubDate>Thu, 16 Jul 2015 12:29:01 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-won-t-update-stats-on-Hive-Avro-table/m-p/29665#M6548</guid>
      <dc:creator>kylebush</dc:creator>
      <dc:date>2015-07-16T12:29:01Z</dc:date>
    </item>
    <item>
      <title>Re: Impala won't update stats on Hive Avro table</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-won-t-update-stats-on-Hive-Avro-table/m-p/29718#M6549</link>
      <description>&lt;P&gt;Thanks for the update. I can reproduce the issue, but only when the target partition is empty. As soon as I add some data, compute incremental stats works as expected.&lt;/P&gt;&lt;P&gt;So I'm still thinking you are hitting an edge case with an empty partition?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 17 Jul 2015 07:11:07 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-won-t-update-stats-on-Hive-Avro-table/m-p/29718#M6549</guid>
      <dc:creator>alex.behm</dc:creator>
      <dc:date>2015-07-17T07:11:07Z</dc:date>
    </item>
  </channel>
</rss>

