Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hive Partitioning - maximum for cluster

avatar
New Contributor

Hi,

 

Our cluster is in CDH 5.7. We want to know the maximum partitions we can create for a hive/table and maximum allowed for a cluster ?

 

Best regards,

Olivier

3 ACCEPTED SOLUTIONS

avatar
Champion
I don't know of any hard limits. There are limitations as a table with 10k+ partitions will likely fail on operations against all partitions like 'drop table'. That is generally the soft cap on partitions per table.

For the full cluster, the backend RDBMS hosting the metastore will dictate this somewhat. Again there is no hard limit. I have seen some near 10 million partitions across all tables. Granted HMS, HS2, and CatalogD were not stable due to the large partitions count. A single or set of large queries or full table scans would bring them down each time. Your HMS heap will also be large. Hive does have settings now to prevent full partitions grabs or limiting the partition count per query.

The Hive community is moving HMS to be backed by HBase to address the scalability of partitions, tables, and databases.

View solution in original post

avatar
New Contributor

Thank for your answer mbigelow

View solution in original post

avatar
Champion

Would you consider using Bucketing instead of partitioning because it

decomposes data sets into much more manageable parts but still depends on the use case. 

insertion may take some time but intended for fast read .

 

View solution in original post

5 REPLIES 5

avatar
Champion
I don't know of any hard limits. There are limitations as a table with 10k+ partitions will likely fail on operations against all partitions like 'drop table'. That is generally the soft cap on partitions per table.

For the full cluster, the backend RDBMS hosting the metastore will dictate this somewhat. Again there is no hard limit. I have seen some near 10 million partitions across all tables. Granted HMS, HS2, and CatalogD were not stable due to the large partitions count. A single or set of large queries or full table scans would bring them down each time. Your HMS heap will also be large. Hive does have settings now to prevent full partitions grabs or limiting the partition count per query.

The Hive community is moving HMS to be backed by HBase to address the scalability of partitions, tables, and databases.

avatar
New Contributor

Thank for your answer mbigelow

avatar
Champion

@mbigelow i belive you can enforce partition count per query with this parameter 

hive.metastore.limit.partition.request

correct me if I am wrong. 

avatar
New Contributor

For other people reading this in 2018 and beyond NB https://issues.apache.org/jira/browse/HIVE-9452 and https://issues.apache.org/jira/browse/HIVE-17234.  Essentially AFAIK development for an HBase backed metastore has stalled.

avatar
Champion

Would you consider using Bucketing instead of partitioning because it

decomposes data sets into much more manageable parts but still depends on the use case. 

insertion may take some time but intended for fast read .