Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Possible to sort data within a partitioned Hive table without bucketing?

Highlighted

Possible to sort data within a partitioned Hive table without bucketing?

New Contributor

Is it possible to sort data within a partitioned Hive table without bucketing?

Scenario is a monster set of time series messages. I just need all messages between time T1 and T2.

Would partition on a higher level date (maybe day, or half or quarter day), and sort by timestamp within that partition.

Using the partition key as the bucket key would be a hack (one bucket per partition?).

What about using a day as a partition key, and then bucketing by hour? Is that common practice?

There are no parameters on the time difference between T1 and T2, although it will most like be hours, resulting in a few million rows being returned.

Don't have an account?
Coming from Hortonworks? Activate your account here