Is it possible to sort data within a partitioned Hive table without bucketing?
Scenario is a monster set of time series messages. I just need all messages between time T1 and T2.
Would partition on a higher level date (maybe day, or half or quarter day), and sort by timestamp within that partition.
Using the partition key as the bucket key would be a hack (one bucket per partition?).
What about using a day as a partition key, and then bucketing by hour? Is that common practice?
There are no parameters on the time difference between T1 and T2, although it will most like be hours, resulting in a few million rows being returned.