Support Questions
Find answers, ask questions, and share your expertise

The best approach to the thousands of small partitions

Solved Go to solution
Highlighted

Re: The best approach to the thousands of small partitions

@Alena Melnikova Good to hear that you are happy with the results:)

Answers:

1. You can go as low as 1k. Choose a balanced option on the average number of rows you query.

2. The usage of function to_date I believe will cause the orc index to stop working (Haven't tested that). Google "why function based index?"

Highlighted

Re: The best approach to the thousands of small partitions

Rising Star

got it, thanks!

Re: The best approach to the thousands of small partitions

Great job @Alena Melnikova! Nice work with the data and visualization. Really helpful, confirms some longstanding assumptions I've had.

Highlighted

Re: The best approach to the thousands of small partitions

Super Collaborator

Hey everyone,
I have a somewhat similar question, which I posted here:
https://community.hortonworks.com/questions/155681/how-to-defragment-hdfs-data.html

I would really appreciate any ideas.

cc @Lester Martin @Jagatheesh Ramakrishnan @rbiswas

Don't have an account?