Posts: 57
Registered: ‎04-26-2017

Impala Slow CatalogD - StateStore Metadata Performance - What I did to resolve it

[ Edited ]


I thought it might be useful to post about a problem i've expereinced with the impala catalogd / statestore and how changing a setting has dramtaically improved performance 

18 node cluster, impalaD on 13 nodes.
Load catalog in background = false
Average JVM usage for CatalogD is around 100GB
Tens of thousands of tables, with biggests tables containing over 30K partitions.
Not running anytype of load balancer for ImapalaD
REFRESH commands issued ever 5 minutes for hundreds of tables due to Kafka imports directly into HDFS

When creating a table using impala shell (or HUE) on a ImpalaD node, it would take upto 2 minutes for that table to be available to all the other impalaD instances (monitoed using ImpalaD WebUi's, catalog tab).  The same was true for loading of metadata across ImpalaD's after issueing REFRESH command.

There was nothing obvious within the logs as to why the delay.

After reading Cloduera's Documentation about Scalability for Impala i noticed the following text 

The number of threads inside the statestore dedicated to sending topic updates. You should not typically need to change this value.
Default: 10

After changing this to 50, this 2 minute delay has dropped down to sub 3 seconds for the majority of ImpalaD's.