Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Counting rows in multiple partitions in Hive query

Solved Go to solution

Counting rows in multiple partitions in Hive query

Contributor

For a partitioned Hive table (stored as ORC), I can count the rows in a partition very quickly with a query like this, presumably because Hive gets the count directly from table statistics:

select count(*) from db.table where partition_date = '12-01-2015'

How can I just as quickly get counts from multiple partitions? A query like this launches a full tez job and takes a couple dozen seconds to run depending on the date range I choose:

select partition_date, count(*) from db.table where partition_date >= '11-01-2015' group by partition_date

Thanks!

I am running Hive 0.14 if that is relevant.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Counting rows in multiple partitions in Hive query

Expert Contributor

Some few months ago I asked a similar question and I got that reply:

https://issues.apache.org/jira/browse/HIVE-11937

So, I don't think you can use the stats in Hive 0.14 for the kind of query you want to do. Maybe with the next Hive version.

A possible workaround would be to get the names of all your partitions in that table, and to have a script (in python, bash or a java program) that generates a query for each partition. Not sure it works but you might give it a try.

1 REPLY 1

Re: Counting rows in multiple partitions in Hive query

Expert Contributor

Some few months ago I asked a similar question and I got that reply:

https://issues.apache.org/jira/browse/HIVE-11937

So, I don't think you can use the stats in Hive 0.14 for the kind of query you want to do. Maybe with the next Hive version.

A possible workaround would be to get the names of all your partitions in that table, and to have a script (in python, bash or a java program) that generates a query for each partition. Not sure it works but you might give it a try.