- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Maximum Hive Table Partitions allowed & recommended
- Labels:
-
Apache Hive
Created 10-30-2015 02:42 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What is the maximum number of partitions allowed for a Hive table? E.g. 2k ... 10k?
Are there any performance implications we should consider as we get close to this number?
Created 10-30-2015 02:46 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Wes, current Hive versions with RDBMS metastore backend should be able to handle 10 000+ partitions.
For numerous reasons, the community is moving away from this design to leverage HBase for the metastore. Follow https://issues.apache.org/jira/browse/HIVE-9452 . Overall design document is available here: https://issues.apache.org/jira/secure/attachment/12697601/HBaseMetastoreApproach.pdf
Created 10-30-2015 02:46 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Wes, current Hive versions with RDBMS metastore backend should be able to handle 10 000+ partitions.
For numerous reasons, the community is moving away from this design to leverage HBase for the metastore. Follow https://issues.apache.org/jira/browse/HIVE-9452 . Overall design document is available here: https://issues.apache.org/jira/secure/attachment/12697601/HBaseMetastoreApproach.pdf
Created 10-30-2015 09:37 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Andrew Grande Thanks for sharing the HBASE approach. Nice!!!
Created 10-30-2015 05:53 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What database are you using for Metastore? @Wes Floyd
Created 11-02-2015 08:16 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When working with a table of 1000 partitions and having the Hive concurrency enabled, I once ran into some problems. I don't know if it is still an issue (the problem appeared last year with Hive 0.13) but I think it can be worth mentioning it here:
Created 11-09-2015 05:03 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The performance implications mostly come at read time. If you have queries that read many (>2k) partitions you will see long (30+ sec) times to plan queries. As Andrew mentioned, the work on the HBase metastore should improve this.
Created 11-09-2015 06:18 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks @gates@hortonworks.com for chimmig in .
Created 03-14-2016 09:44 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
what if I only open less than 50 partitions out of 1M at any given time??