Reply
Highlighted
New Contributor
Posts: 3
Registered: ‎10-10-2016
Accepted Solution

Hive Partitioning - maximum for cluster

Hi,

 

Our cluster is in CDH 5.7. We want to know the maximum partitions we can create for a hive/table and maximum allowed for a cluster ?

 

Best regards,

Olivier

Posts: 642
Topics: 3
Kudos: 105
Solutions: 67
Registered: ‎08-16-2016

Re: Hive Partitioning - maximum for cluster

I don't know of any hard limits. There are limitations as a table with 10k+ partitions will likely fail on operations against all partitions like 'drop table'. That is generally the soft cap on partitions per table.

For the full cluster, the backend RDBMS hosting the metastore will dictate this somewhat. Again there is no hard limit. I have seen some near 10 million partitions across all tables. Granted HMS, HS2, and CatalogD were not stable due to the large partitions count. A single or set of large queries or full table scans would bring them down each time. Your HMS heap will also be large. Hive does have settings now to prevent full partitions grabs or limiting the partition count per query.

The Hive community is moving HMS to be backed by HBase to address the scalability of partitions, tables, and databases.
New Contributor
Posts: 3
Registered: ‎10-10-2016

Re: Hive Partitioning - maximum for cluster

Thank for your answer mbigelow

Champion
Posts: 564
Registered: ‎05-16-2016

Re: Hive Partitioning - maximum for cluster

Would you consider using Bucketing instead of partitioning because it

decomposes data sets into much more manageable parts but still depends on the use case. 

insertion may take some time but intended for fast read .

 

Champion
Posts: 564
Registered: ‎05-16-2016

Re: Hive Partitioning - maximum for cluster

@mbigelow i belive you can enforce partition count per query with this parameter 

hive.metastore.limit.partition.request

correct me if I am wrong. 

Announcements