Wanted to know whether there's a best practice to the limit of partitions one Impala table can handle ?
If so, what's the penalties of going beyond that limit ?
the Impala Cookbook describes best practices around schema design:
We recommend that a table should have at most 100k partitions and that is already stretching it. Keeiping it around 10k is definitely desirable.
Having too many partitions can have several negative effects, including:
- Slow metadata loading and distribution time. You may even hit some hard limits (e.g. in the JVM) which make dealing wich such a table very difficult or nearly impossible.
- Excessive memory usage on all daemons due to the large metadata which is cached.
- Slow partition pruning time for certain queries if the index-based pruning is not feasible (e.g. for complex partition-pruning predicates)