- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How to (Hive) Query to pull data under a same parition with different timestamps ? for example partition_dt=2017-01-03 10%3A20%3A13, partition_dt=2017-01-03 10%3A20%3A14, partition_dt=2017-01-03 10%3A20%3A15
- Labels:
-
Apache Hive
Created ‎01-03-2017 08:56 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created ‎01-04-2017 01:21 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@clukasik- Thanks for your reply.
Table is loaded 3 times today.I would like to know count for each partition. I tried to run a query "select count(*) from db.tablename where partition_dt="2017-01-03 10%3A20%3A13" , tried like keyword both did not give results.
Note: select count(*)f rom db.table where partition_dt ='2017-01-03' gives count for the day.
I Agree with your comment, but it is an existing application. I will make sure new design won't have too many partitions.
Created ‎01-03-2017 09:23 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
To query across multiple partitions, you should not need to anything special, other than make sure your where clause is not forcing you into a specific folder. E.g. do not include partition_dt in your query.
One comment: Avoid too many partitions. Avoid partitioning that is too granular (unless you are pruning old data) as you will suffer performance problems. I recommend taking a look at this article for some best practices on Hive partitioning.
Created ‎01-04-2017 01:21 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@clukasik- Thanks for your reply.
Table is loaded 3 times today.I would like to know count for each partition. I tried to run a query "select count(*) from db.tablename where partition_dt="2017-01-03 10%3A20%3A13" , tried like keyword both did not give results.
Note: select count(*)f rom db.table where partition_dt ='2017-01-03' gives count for the day.
I Agree with your comment, but it is an existing application. I will make sure new design won't have too many partitions.
Created ‎01-04-2017 05:13 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Would this work?
select partition_dt, count(*) from db.tablename group by partition_dt
Created ‎01-04-2017 07:56 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@clukasik - Thanks for the reply. I tried your query and it worked , here is my observations.
show partitions tablename;
partition_dt=2017-01-03 10%3A20%3A13 partition_dt=2017-01-03 10%3A20%3A14 partition_dt=2017-01-03 10%3A20%3A15
select patition_dt, count(1) from tablename group by partition_dt;
2017-01-03 10:20:13.0 | 2191 2017-01-03 10:20:14.0 | 7888 2017-01-03 10:20:15.0 | 1647
Note: %3A should be replaced with : while quering.
select count(1) from tablename where partition_dt="2017-01-03 10:20:13"; +-------+--+ | _c0 | +-------+--+ | 2191 | +-------+--+ 1 row selected (3.643 seconds)
select count(1) from tablename where partition_dt="2017-01-03 10:20:14"; +-------+--+ | _c0 | +-------+--+ | 7888 | +-------+--+ 1 row selected (0.101 seconds)
select count(1) from tablename where partition_dt="2017-01-03 10:20:15"; +-------+--+ | _c0 | +-------+--+ | 1647 | +-------+--+ 1 row selected (0.101 seconds)
Sampling i used below query.
select * from tablename where partition_dt="2017-01-03 10:20:15" limit 5;
