- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Hive Partitioning - maximum for cluster
- Labels:
-
Apache Hive
Created on ‎06-27-2017 01:01 AM - edited ‎09-16-2022 04:50 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Our cluster is in CDH 5.7. We want to know the maximum partitions we can create for a hive/table and maximum allowed for a cluster ?
Best regards,
Olivier
Created ‎06-27-2017 07:19 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For the full cluster, the backend RDBMS hosting the metastore will dictate this somewhat. Again there is no hard limit. I have seen some near 10 million partitions across all tables. Granted HMS, HS2, and CatalogD were not stable due to the large partitions count. A single or set of large queries or full table scans would bring them down each time. Your HMS heap will also be large. Hive does have settings now to prevent full partitions grabs or limiting the partition count per query.
The Hive community is moving HMS to be backed by HBase to address the scalability of partitions, tables, and databases.
Created ‎06-28-2017 05:04 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank for your answer mbigelow
Created ‎06-28-2017 05:45 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Would you consider using Bucketing instead of partitioning because it
decomposes data sets into much more manageable parts but still depends on the use case.
insertion may take some time but intended for fast read .
Created ‎06-27-2017 07:19 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For the full cluster, the backend RDBMS hosting the metastore will dictate this somewhat. Again there is no hard limit. I have seen some near 10 million partitions across all tables. Granted HMS, HS2, and CatalogD were not stable due to the large partitions count. A single or set of large queries or full table scans would bring them down each time. Your HMS heap will also be large. Hive does have settings now to prevent full partitions grabs or limiting the partition count per query.
The Hive community is moving HMS to be backed by HBase to address the scalability of partitions, tables, and databases.
Created ‎06-28-2017 05:04 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank for your answer mbigelow
Created ‎07-05-2017 02:55 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@mbigelow i belive you can enforce partition count per query with this parameter
hive.metastore.limit.partition.request
correct me if I am wrong.
Created ‎03-09-2018 04:57 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For other people reading this in 2018 and beyond NB https://issues.apache.org/jira/browse/HIVE-9452 and https://issues.apache.org/jira/browse/HIVE-17234. Essentially AFAIK development for an HBase backed metastore has stalled.
Created ‎06-28-2017 05:45 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Would you consider using Bucketing instead of partitioning because it
decomposes data sets into much more manageable parts but still depends on the use case.
insertion may take some time but intended for fast read .
