Support Questions

Aaron_Dossett · ‎12-01-2015

For a partitioned Hive table (stored as ORC), I can count the rows in a partition very quickly with a query like this, presumably because Hive gets the count directly from table statistics:

select count(*) from db.table where partition_date = '12-01-2015'

How can I just as quickly get counts from multiple partitions? A query like this launches a full tez job and takes a couple dozen seconds to run depending on the date range I choose:

select partition_date, count(*) from db.table where partition_date >= '11-01-2015' group by partition_date

Thanks!

I am running Hive 0.14 if that is relevant.

sluangsay · ‎12-01-2015

Some few months ago I asked a similar question and I got that reply:

https://issues.apache.org/jira/browse/HIVE-11937

So, I don't think you can use the stats in Hive 0.14 for the kind of query you want to do. Maybe with the next Hive version.

A possible workaround would be to get the names of all your partitions in that table, and to have a script (in python, bash or a java program) that generates a query for each partition. Not sure it works but you might give it a try.

View solution in original post

sluangsay · ‎12-01-2015

Some few months ago I asked a similar question and I got that reply:

https://issues.apache.org/jira/browse/HIVE-11937

So, I don't think you can use the stats in Hive 0.14 for the kind of query you want to do. Maybe with the next Hive version.

A possible workaround would be to get the names of all your partitions in that table, and to have a script (in python, bash or a java program) that generates a query for each partition. Not sure it works but you might give it a try.

Cloudera Community

Support Questions

Counting rows in multiple partitions in Hive query

hive Insert to Dynamic Partition query Generating ...

query on partitioning

Query To Return Result Only If Data Exists in Mult...

Hive 3.1.4 Multiple rows from COUNT(*) query

Hive insert query optimization

Creating HIVE partitioned tables using sqoop

HIVE - Duplicate table and merge partitions from ...

multiple row key hbase

Importing and Querying JSON data in Hive

Parsing multiple pipe delimited columns into rows ...