Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

How to (Hive) Query to pull data under a same parition with different timestamps ? for example partition_dt=2017-01-03 10%3A20%3A13, partition_dt=2017-01-03 10%3A20%3A14, partition_dt=2017-01-03 10%3A20%3A15

Contributor
 
1 ACCEPTED SOLUTION

Contributor

@clukasik- Thanks for your reply.

Table is loaded 3 times today.I would like to know count for each partition. I tried to run a query "select count(*) from db.tablename where partition_dt="2017-01-03 10%3A20%3A13" , tried like keyword both did not give results.

Note: select count(*)f rom db.table where partition_dt ='2017-01-03' gives count for the day.

I Agree with your comment, but it is an existing application. I will make sure new design won't have too many partitions.

View solution in original post

4 REPLIES 4

Expert Contributor

To query across multiple partitions, you should not need to anything special, other than make sure your where clause is not forcing you into a specific folder. E.g. do not include partition_dt in your query.

One comment: Avoid too many partitions. Avoid partitioning that is too granular (unless you are pruning old data) as you will suffer performance problems. I recommend taking a look at this article for some best practices on Hive partitioning.

Contributor

@clukasik- Thanks for your reply.

Table is loaded 3 times today.I would like to know count for each partition. I tried to run a query "select count(*) from db.tablename where partition_dt="2017-01-03 10%3A20%3A13" , tried like keyword both did not give results.

Note: select count(*)f rom db.table where partition_dt ='2017-01-03' gives count for the day.

I Agree with your comment, but it is an existing application. I will make sure new design won't have too many partitions.

Expert Contributor

Would this work?

select partition_dt, count(*) from db.tablename group by partition_dt

Contributor

@clukasik - Thanks for the reply. I tried your query and it worked , here is my observations.

show partitions tablename;

partition_dt=2017-01-03 10%3A20%3A13 partition_dt=2017-01-03 10%3A20%3A14 partition_dt=2017-01-03 10%3A20%3A15

select patition_dt, count(1) from tablename group by partition_dt;

2017-01-03 10:20:13.0 | 2191 2017-01-03 10:20:14.0 | 7888 2017-01-03 10:20:15.0 | 1647

Note: %3A should be replaced with : while quering.

select count(1) from tablename where partition_dt="2017-01-03 10:20:13"; +-------+--+ | _c0 | +-------+--+ | 2191 | +-------+--+ 1 row selected (3.643 seconds)

select count(1) from tablename where partition_dt="2017-01-03 10:20:14"; +-------+--+ | _c0 | +-------+--+ | 7888 | +-------+--+ 1 row selected (0.101 seconds)

select count(1) from tablename where partition_dt="2017-01-03 10:20:15"; +-------+--+ | _c0 | +-------+--+ | 1647 | +-------+--+ 1 row selected (0.101 seconds)

Sampling i used below query.

select * from tablename where partition_dt="2017-01-03 10:20:15" limit 5;

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.