Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive Partitioning - maximum for cluster

Solved Go to solution

Hive Partitioning - maximum for cluster

New Contributor

Hi,

 

Our cluster is in CDH 5.7. We want to know the maximum partitions we can create for a hive/table and maximum allowed for a cluster ?

 

Best regards,

Olivier

3 ACCEPTED SOLUTIONS

Accepted Solutions

Re: Hive Partitioning - maximum for cluster

Champion
I don't know of any hard limits. There are limitations as a table with 10k+ partitions will likely fail on operations against all partitions like 'drop table'. That is generally the soft cap on partitions per table.

For the full cluster, the backend RDBMS hosting the metastore will dictate this somewhat. Again there is no hard limit. I have seen some near 10 million partitions across all tables. Granted HMS, HS2, and CatalogD were not stable due to the large partitions count. A single or set of large queries or full table scans would bring them down each time. Your HMS heap will also be large. Hive does have settings now to prevent full partitions grabs or limiting the partition count per query.

The Hive community is moving HMS to be backed by HBase to address the scalability of partitions, tables, and databases.

Re: Hive Partitioning - maximum for cluster

New Contributor

Thank for your answer mbigelow

Highlighted

Re: Hive Partitioning - maximum for cluster

Champion

Would you consider using Bucketing instead of partitioning because it

decomposes data sets into much more manageable parts but still depends on the use case. 

insertion may take some time but intended for fast read .

 

5 REPLIES 5

Re: Hive Partitioning - maximum for cluster

Champion
I don't know of any hard limits. There are limitations as a table with 10k+ partitions will likely fail on operations against all partitions like 'drop table'. That is generally the soft cap on partitions per table.

For the full cluster, the backend RDBMS hosting the metastore will dictate this somewhat. Again there is no hard limit. I have seen some near 10 million partitions across all tables. Granted HMS, HS2, and CatalogD were not stable due to the large partitions count. A single or set of large queries or full table scans would bring them down each time. Your HMS heap will also be large. Hive does have settings now to prevent full partitions grabs or limiting the partition count per query.

The Hive community is moving HMS to be backed by HBase to address the scalability of partitions, tables, and databases.

Re: Hive Partitioning - maximum for cluster

New Contributor

Thank for your answer mbigelow

Re: Hive Partitioning - maximum for cluster

Champion

@mbigelow i belive you can enforce partition count per query with this parameter 

hive.metastore.limit.partition.request

correct me if I am wrong. 

Re: Hive Partitioning - maximum for cluster

New Contributor

For other people reading this in 2018 and beyond NB https://issues.apache.org/jira/browse/HIVE-9452 and https://issues.apache.org/jira/browse/HIVE-17234.  Essentially AFAIK development for an HBase backed metastore has stalled.

Highlighted

Re: Hive Partitioning - maximum for cluster

Champion

Would you consider using Bucketing instead of partitioning because it

decomposes data sets into much more manageable parts but still depends on the use case. 

insertion may take some time but intended for fast read .