Reply
New Contributor
Posts: 5
Registered: ‎02-26-2018

Impala Statestore and Catalog Service host resource sizing

Dear All,

 

I am in the process of setting up a CDH based cluster (Installation Path B - Parcels/Packages) including Impala.

 

I would like to configure Impala Statestore and Catalog Service appropriately (maybe even on a dedicated host), however I cannot really find any documentation or best practices regarding the resource needs of these services. 

 

For example I do not know how much memory or disk space should I reserve for these services: Based on my understanding they should be of relatively small footprint compared to other big data components, but I am not sure I would be able make any estimation on my own.

 

Could someone please point me into the right direction?

Highlighted
Expert Contributor
Posts: 105
Registered: ‎07-17-2017

Re: Impala Statestore and Catalog Service host resource sizing

Hi @Peter

I think you need to see this link abount the Cluster Hosts and Role Assignments.

New Contributor
Posts: 5
Registered: ‎02-26-2018

Re: Impala Statestore and Catalog Service host resource sizing

As it turns out, Implala documentation and Cloudera Side deck on SlideShare do contain contain some hints for the Catalog Service host, which is basically this formula:

 

Catalog memory usage:
  • Metadata cache heap memory usage can be calculated by 
  • num of tables * 5KB + num of partitions * 2KB + num of files * 750B + num of file blocks * 300B + sum(incremental col stats per table) 
  • Incremental stats 
  • For each table, num columns * num partitions * 400B
At the same time, I haven't found any recommendation regarding the statestore.
 
I'm wondering if there's any other option than finding it by the trial-and-error approach.
 
 
 
Announcements