Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Impala Slow CatalogD - StateStore Metadata Performance - What I did to resolve it

Impala Slow CatalogD - StateStore Metadata Performance - What I did to resolve it

Rising Star

Hi,

I thought it might be useful to post about a problem i've expereinced with the impala catalogd / statestore and how changing a setting has dramtaically improved performance 

Background
18 node cluster, impalaD on 13 nodes.
Load catalog in background = false
Average JVM usage for CatalogD is around 100GB
Tens of thousands of tables, with biggests tables containing over 30K partitions.
Not running anytype of load balancer for ImapalaD
REFRESH commands issued ever 5 minutes for hundreds of tables due to Kafka imports directly into HDFS


When creating a table using impala shell (or HUE) on a ImpalaD node, it would take upto 2 minutes for that table to be available to all the other impalaD instances (monitoed using ImpalaD WebUi's, catalog tab).  The same was true for loading of metadata across ImpalaD's after issueing REFRESH command.

There was nothing obvious within the logs as to why the delay.

After reading Cloduera's Documentation about Scalability for Impala i noticed the following text 

-statestore_num_update_threads
The number of threads inside the statestore dedicated to sending topic updates. You should not typically need to change this value.
Default: 10


After changing this to 50, this 2 minute delay has dropped down to sub 3 seconds for the majority of ImpalaD's.