<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Counting rows in multiple partitions in Hive query in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Counting-rows-in-multiple-partitions-in-Hive-query/m-p/97713#M11192</link>
    <description>&lt;P&gt;Some few months ago I asked a similar question and I got that reply:&lt;/P&gt;&lt;P&gt;&lt;A href="https://issues.apache.org/jira/browse/HIVE-11937" target="_blank"&gt;https://issues.apache.org/jira/browse/HIVE-11937&lt;/A&gt;&lt;/P&gt;&lt;P&gt;So, I don't think you can use the stats in Hive 0.14 for the kind of query you want to do. Maybe with the next Hive version.&lt;/P&gt;&lt;P&gt;A possible workaround would be to get the names of all your partitions in that table, and to have a script (in python, bash or  a java program) that generates a query for each partition. Not sure it works but you might give it a try.&lt;/P&gt;</description>
    <pubDate>Tue, 01 Dec 2015 22:34:08 GMT</pubDate>
    <dc:creator>sluangsay</dc:creator>
    <dc:date>2015-12-01T22:34:08Z</dc:date>
    <item>
      <title>Counting rows in multiple partitions in Hive query</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Counting-rows-in-multiple-partitions-in-Hive-query/m-p/97712#M11191</link>
      <description>&lt;P&gt;For a partitioned Hive table (stored as ORC), I can count the rows in a partition very quickly with a query like this, presumably because Hive gets the count directly from table statistics:&lt;/P&gt;&lt;PRE&gt;select count(*) from db.table where partition_date = '12-01-2015'&lt;/PRE&gt;&lt;P&gt;How can I just as quickly get counts from multiple partitions?  A query like this launches a full tez job and takes a couple dozen seconds to run depending on the date range I choose:&lt;/P&gt;&lt;PRE&gt;select partition_date, count(*) from db.table where partition_date &amp;gt;= '11-01-2015' group by partition_date&lt;/PRE&gt;&lt;P&gt;Thanks!&lt;/P&gt;&lt;P&gt;I am running Hive 0.14 if that is relevant.&lt;/P&gt;</description>
      <pubDate>Tue, 01 Dec 2015 22:14:25 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Counting-rows-in-multiple-partitions-in-Hive-query/m-p/97712#M11191</guid>
      <dc:creator>Aaron_Dossett</dc:creator>
      <dc:date>2015-12-01T22:14:25Z</dc:date>
    </item>
    <item>
      <title>Re: Counting rows in multiple partitions in Hive query</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Counting-rows-in-multiple-partitions-in-Hive-query/m-p/97713#M11192</link>
      <description>&lt;P&gt;Some few months ago I asked a similar question and I got that reply:&lt;/P&gt;&lt;P&gt;&lt;A href="https://issues.apache.org/jira/browse/HIVE-11937" target="_blank"&gt;https://issues.apache.org/jira/browse/HIVE-11937&lt;/A&gt;&lt;/P&gt;&lt;P&gt;So, I don't think you can use the stats in Hive 0.14 for the kind of query you want to do. Maybe with the next Hive version.&lt;/P&gt;&lt;P&gt;A possible workaround would be to get the names of all your partitions in that table, and to have a script (in python, bash or  a java program) that generates a query for each partition. Not sure it works but you might give it a try.&lt;/P&gt;</description>
      <pubDate>Tue, 01 Dec 2015 22:34:08 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Counting-rows-in-multiple-partitions-in-Hive-query/m-p/97713#M11192</guid>
      <dc:creator>sluangsay</dc:creator>
      <dc:date>2015-12-01T22:34:08Z</dc:date>
    </item>
  </channel>
</rss>

