Created 10-30-2015 02:42 PM
What is the maximum number of partitions allowed for a Hive table? E.g. 2k ... 10k?
Are there any performance implications we should consider as we get close to this number?
Created 10-30-2015 02:46 PM
Wes, current Hive versions with RDBMS metastore backend should be able to handle 10 000+ partitions.
For numerous reasons, the community is moving away from this design to leverage HBase for the metastore. Follow https://issues.apache.org/jira/browse/HIVE-9452 . Overall design document is available here: https://issues.apache.org/jira/secure/attachment/12697601/HBaseMetastoreApproach.pdf
Created 10-30-2015 02:46 PM
Wes, current Hive versions with RDBMS metastore backend should be able to handle 10 000+ partitions.
For numerous reasons, the community is moving away from this design to leverage HBase for the metastore. Follow https://issues.apache.org/jira/browse/HIVE-9452 . Overall design document is available here: https://issues.apache.org/jira/secure/attachment/12697601/HBaseMetastoreApproach.pdf
Created 10-30-2015 09:37 PM
@Andrew Grande Thanks for sharing the HBASE approach. Nice!!!
Created 10-30-2015 05:53 PM
What database are you using for Metastore? @Wes Floyd
Created 11-02-2015 08:16 AM
When working with a table of 1000 partitions and having the Hive concurrency enabled, I once ran into some problems. I don't know if it is still an issue (the problem appeared last year with Hive 0.13) but I think it can be worth mentioning it here:
Created 11-09-2015 05:03 PM
The performance implications mostly come at read time. If you have queries that read many (>2k) partitions you will see long (30+ sec) times to plan queries. As Andrew mentioned, the work on the HBase metastore should improve this.
Created 11-09-2015 06:18 PM
Thanks @gates@hortonworks.com for chimmig in .
Created 03-14-2016 09:44 PM
what if I only open less than 50 partitions out of 1M at any given time??