Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Impala Statestore and Catalog Service host resource sizing

Impala Statestore and Catalog Service host resource sizing

New Contributor

Dear All,

 

I am in the process of setting up a CDH based cluster (Installation Path B - Parcels/Packages) including Impala.

 

I would like to configure Impala Statestore and Catalog Service appropriately (maybe even on a dedicated host), however I cannot really find any documentation or best practices regarding the resource needs of these services. 

 

For example I do not know how much memory or disk space should I reserve for these services: Based on my understanding they should be of relatively small footprint compared to other big data components, but I am not sure I would be able make any estimation on my own.

 

Could someone please point me into the right direction?

2 REPLIES 2

Re: Impala Statestore and Catalog Service host resource sizing

Expert Contributor

Hi @Peter

I think you need to see this link abount the Cluster Hosts and Role Assignments.

Re: Impala Statestore and Catalog Service host resource sizing

New Contributor

As it turns out, Implala documentation and Cloudera Side deck on SlideShare do contain contain some hints for the Catalog Service host, which is basically this formula:

 

Catalog memory usage:
  • Metadata cache heap memory usage can be calculated by 
  • num of tables * 5KB + num of partitions * 2KB + num of files * 750B + num of file blocks * 300B + sum(incremental col stats per table) 
  • Incremental stats 
  • For each table, num columns * num partitions * 400B
At the same time, I haven't found any recommendation regarding the statestore.
 
I'm wondering if there's any other option than finding it by the trial-and-error approach.