Support Questions

Find answers, ask questions, and share your expertise

How to (Hive) Query to pull data under a same parition with different timestamps ? for example partition_dt=2017-01-03 10%3A20%3A13, partition_dt=2017-01-03 10%3A20%3A14, partition_dt=2017-01-03 10%3A20%3A15

avatar
Contributor
 
1 ACCEPTED SOLUTION

avatar
Contributor

@clukasik- Thanks for your reply.

Table is loaded 3 times today.I would like to know count for each partition. I tried to run a query "select count(*) from db.tablename where partition_dt="2017-01-03 10%3A20%3A13" , tried like keyword both did not give results.

Note: select count(*)f rom db.table where partition_dt ='2017-01-03' gives count for the day.

I Agree with your comment, but it is an existing application. I will make sure new design won't have too many partitions.

View solution in original post

4 REPLIES 4

avatar
Super Collaborator

To query across multiple partitions, you should not need to anything special, other than make sure your where clause is not forcing you into a specific folder. E.g. do not include partition_dt in your query.

One comment: Avoid too many partitions. Avoid partitioning that is too granular (unless you are pruning old data) as you will suffer performance problems. I recommend taking a look at this article for some best practices on Hive partitioning.

avatar
Contributor

@clukasik- Thanks for your reply.

Table is loaded 3 times today.I would like to know count for each partition. I tried to run a query "select count(*) from db.tablename where partition_dt="2017-01-03 10%3A20%3A13" , tried like keyword both did not give results.

Note: select count(*)f rom db.table where partition_dt ='2017-01-03' gives count for the day.

I Agree with your comment, but it is an existing application. I will make sure new design won't have too many partitions.

avatar
Super Collaborator

Would this work?

select partition_dt, count(*) from db.tablename group by partition_dt

avatar
Contributor

@clukasik - Thanks for the reply. I tried your query and it worked , here is my observations.

show partitions tablename;

partition_dt=2017-01-03 10%3A20%3A13 partition_dt=2017-01-03 10%3A20%3A14 partition_dt=2017-01-03 10%3A20%3A15

select patition_dt, count(1) from tablename group by partition_dt;

2017-01-03 10:20:13.0 | 2191 2017-01-03 10:20:14.0 | 7888 2017-01-03 10:20:15.0 | 1647

Note: %3A should be replaced with : while quering.

select count(1) from tablename where partition_dt="2017-01-03 10:20:13"; +-------+--+ | _c0 | +-------+--+ | 2191 | +-------+--+ 1 row selected (3.643 seconds)

select count(1) from tablename where partition_dt="2017-01-03 10:20:14"; +-------+--+ | _c0 | +-------+--+ | 7888 | +-------+--+ 1 row selected (0.101 seconds)

select count(1) from tablename where partition_dt="2017-01-03 10:20:15"; +-------+--+ | _c0 | +-------+--+ | 1647 | +-------+--+ 1 row selected (0.101 seconds)

Sampling i used below query.

select * from tablename where partition_dt="2017-01-03 10:20:15" limit 5;