<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Cannot generate stats for partitioned Hive table in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Cannot-generate-stats-for-partitioned-Hive-table/m-p/175971#M61284</link>
    <description>&lt;P&gt;Ah, I see. Thank you &lt;A rel="user" href="https://community.cloudera.com/users/222/deepesh.html" nodeid="222"&gt;@Deepesh&lt;/A&gt;. I was able to confirm viewing the statistics of interest when specifying a specific partition. I have a follow up question based on that.&lt;/P&gt;&lt;P&gt;Is there a single command to view the stats summed for all partitions or would I need to execute the above for each partition in the table and total the values myself?&lt;/P&gt;</description>
    <pubDate>Thu, 18 May 2017 23:51:10 GMT</pubDate>
    <dc:creator>john3</dc:creator>
    <dc:date>2017-05-18T23:51:10Z</dc:date>
    <item>
      <title>Cannot generate stats for partitioned Hive table</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Cannot-generate-stats-for-partitioned-Hive-table/m-p/175967#M61280</link>
      <description>&lt;P&gt;
	I am attempting to perform an ANALYZE on a partitioned table to generate statistics for numRows and totalSize. For a non-partitioned table I get the results I am looking for but for a dynamic partitioned table it does not provide the information I am seeking. As a newbie to Hive, I assume I am doing something wrong.&lt;/P&gt;&lt;P&gt;
	We are running Hive 1.2.1.2.5.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;
	Test Case 1:&lt;/STRONG&gt;&lt;/P&gt;&lt;PRE&gt;CREATE TABLE `test_table` (`date` DATE,`name` VARCHAR(255));

INSERT INTO `test_table` VALUES
('2016-12-30','john'),('2016-12-30','sarah'),
('2016-12-31','bill'),
('2017-01-01','mary'),('2017-01-01','pete');

ANALYZE TABLE `test_table` COMPUTE STATISTICS;

SHOW tblproperties `test_table`;
&lt;/PRE&gt;&lt;P&gt;The results are as I would expect:&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;prpt_name&lt;/TD&gt;&lt;TD&gt;prpt_value&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;COLUMN_STATS_ACCURATE&lt;/TD&gt;&lt;TD&gt;{"BASIC_STATS":"true"}&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;numFiles&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;numRows&lt;/TD&gt;&lt;TD&gt;5&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;rawDataSize&lt;/TD&gt;&lt;TD&gt;76&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;totalSize&lt;/TD&gt;&lt;TD&gt;81&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;transient_lastDdlTime&lt;/TD&gt;&lt;TD&gt;1494980403&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&lt;STRONG&gt;Test Case 2:&lt;/STRONG&gt;&lt;/P&gt;&lt;PRE&gt;CREATE TABLE `test_table` (`name` VARCHAR(255)) PARTITIONED BY (`date` DATE);

INSERT INTO `test_table` PARTITION (`date`='2016-12-30')
VALUES ('john'),('sarah');

INSERT INTO `test_table` PARTITION (`date`='2016-12-31')
VALUES ('bill');

INSERT INTO `test_table` PARTITION (`date`='2017-01-01')
VALUES ('mary'),('pete');

ANALYZE TABLE `test_table` PARTITION(`date`) COMPUTE STATISTICS;

SHOW tblproperties `test_table`;
&lt;/PRE&gt;&lt;P&gt;However, I do not see any of the statistics I would expect to see:&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;prpt_name&lt;/TD&gt;&lt;TD&gt;prpt_value&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;transient_lastDdlTime&lt;/TD&gt;&lt;TD&gt;1494978354&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;Is there something I am doing wrong with the table creation or with the analyze statement?&lt;/P&gt;&lt;P&gt;EDIT: Per suggestion, I did test using the following analyze statement as well with the same results:&lt;/P&gt;&lt;PRE&gt;ANALYZE TABLE `test_table`
PARTITION(`date`='2016-12-30',`date`='2016-12-31',`date`='2017-01-01')
COMPUTE STATISTICS;&lt;/PRE&gt;</description>
      <pubDate>Wed, 17 May 2017 09:48:08 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Cannot-generate-stats-for-partitioned-Hive-table/m-p/175967#M61280</guid>
      <dc:creator>john3</dc:creator>
      <dc:date>2017-05-17T09:48:08Z</dc:date>
    </item>
    <item>
      <title>Re: Cannot generate stats for partitioned Hive table</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Cannot-generate-stats-for-partitioned-Hive-table/m-p/175968#M61281</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/18369/john3.html" nodeid="18369"&gt;@John Glorioso&lt;/A&gt; &lt;/P&gt;&lt;P&gt;Can you try ANALYZE TABLE test_table PARTITION('2016-12-30') COMPUTE STATISTICS; and check the stats again ?&lt;/P&gt;</description>
      <pubDate>Wed, 17 May 2017 11:24:19 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Cannot-generate-stats-for-partitioned-Hive-table/m-p/175968#M61281</guid>
      <dc:creator>srai1</dc:creator>
      <dc:date>2017-05-17T11:24:19Z</dc:date>
    </item>
    <item>
      <title>Re: Cannot generate stats for partitioned Hive table</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Cannot-generate-stats-for-partitioned-Hive-table/m-p/175969#M61282</link>
      <description>&lt;P&gt;Invalid syntax:&lt;/P&gt;
&lt;PRE&gt;org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: ParseException line 1:35 cannot recognize input near ''2016-12-30'' ')' 'COMPUTE' in analyze statement&amp;lt;br&amp;gt;
&lt;/PRE&gt;&lt;P&gt;
	According to the Hive documentation the partition column and name need to be specified if you are analyzing a particular partition. Specifying the column name only can be used to analyze all partitions.&lt;A href="https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables–ANALYZE"&gt;https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables–ANALYZE&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 17 May 2017 11:55:08 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Cannot-generate-stats-for-partitioned-Hive-table/m-p/175969#M61282</guid>
      <dc:creator>john3</dc:creator>
      <dc:date>2017-05-17T11:55:08Z</dc:date>
    </item>
    <item>
      <title>Re: Cannot generate stats for partitioned Hive table</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Cannot-generate-stats-for-partitioned-Hive-table/m-p/175970#M61283</link>
      <description>&lt;P&gt;The stats for partitioned table are available per partition, you can do desc formatted, example:&lt;/P&gt;&lt;PRE&gt;hive&amp;gt; desc formatted `test_table` partition(`date`='2016-12-30');
...
Partition Parameters:
	COLUMN_STATS_ACCURATE	{\"BASIC_STATS\":\"true\"}
	numFiles            	1
	numRows             	2
	rawDataSize         	9
	totalSize           	11
	transient_lastDdlTime	1495086721
&lt;/PRE&gt;</description>
      <pubDate>Thu, 18 May 2017 13:07:10 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Cannot-generate-stats-for-partitioned-Hive-table/m-p/175970#M61283</guid>
      <dc:creator>deepesh1</dc:creator>
      <dc:date>2017-05-18T13:07:10Z</dc:date>
    </item>
    <item>
      <title>Re: Cannot generate stats for partitioned Hive table</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Cannot-generate-stats-for-partitioned-Hive-table/m-p/175971#M61284</link>
      <description>&lt;P&gt;Ah, I see. Thank you &lt;A rel="user" href="https://community.cloudera.com/users/222/deepesh.html" nodeid="222"&gt;@Deepesh&lt;/A&gt;. I was able to confirm viewing the statistics of interest when specifying a specific partition. I have a follow up question based on that.&lt;/P&gt;&lt;P&gt;Is there a single command to view the stats summed for all partitions or would I need to execute the above for each partition in the table and total the values myself?&lt;/P&gt;</description>
      <pubDate>Thu, 18 May 2017 23:51:10 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Cannot-generate-stats-for-partitioned-Hive-table/m-p/175971#M61284</guid>
      <dc:creator>john3</dc:creator>
      <dc:date>2017-05-18T23:51:10Z</dc:date>
    </item>
    <item>
      <title>Re: Cannot generate stats for partitioned Hive table</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Cannot-generate-stats-for-partitioned-Hive-table/m-p/175972#M61285</link>
      <description>&lt;P&gt;Overall information is available across partitions only if you are on HDP-2.6 and using Hive2 which i see you are currently not from your description (1.2.1.2.5). If you are using hive 2 the information would be something like:&lt;/P&gt;&lt;PRE&gt;beeline&amp;gt; desc formatted `test_table`;
| Table Parameters: |                       |                            |
|                   | COLUMN_STATS_ACCURATE | {\"BASIC_STATS\":\"true\"} |
|                   | numFiles              | 3                          |
|                   | numRows               | 5                          |
|                   | rawDataSize           | 21                         |
|                   | totalSize             | 26                         |
|                   | transient_lastDdlTime | 1495086467                 |&lt;/PRE&gt;&lt;P&gt;In case of Hive 1 unfortunately you will have to run the command per partition and aggregate manually.&lt;/P&gt;</description>
      <pubDate>Fri, 19 May 2017 02:57:16 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Cannot-generate-stats-for-partitioned-Hive-table/m-p/175972#M61285</guid>
      <dc:creator>deepesh1</dc:creator>
      <dc:date>2017-05-19T02:57:16Z</dc:date>
    </item>
    <item>
      <title>Re: Cannot generate stats for partitioned Hive table</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Cannot-generate-stats-for-partitioned-Hive-table/m-p/278059#M61286</link>
      <description>&lt;P&gt;Using the smae example od data, I've got something like that in my result, trying to have details of one or more than one partition with (desc formatted operation)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-left" image-alt="bad_analyze.PNG" style="width: 999px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/24695i6A122BE076D33CA9/image-size/large?v=v2&amp;amp;px=999" role="button" title="bad_analyze.PNG" alt="bad_analyze.PNG" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The problem here si that each time I've got only information about one and only one partition, even if I'm specifying two or three.&amp;nbsp;&lt;BR /&gt;Do you know please how HIVE behave in this case ? Is it a bug ?&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you&amp;nbsp;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 23 Sep 2019 16:12:07 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Cannot-generate-stats-for-partitioned-Hive-table/m-p/278059#M61286</guid>
      <dc:creator>Xena</dc:creator>
      <dc:date>2019-09-23T16:12:07Z</dc:date>
    </item>
  </channel>
</rss>

