In the past we had an issue when catalogd ran out of memory. Lots of DDL statements would not finish / queries would crash.
We want to avoid running into a catalogd out of memory situation again and would like to monitor that.
We already be happy if someone has an idea for any one of those questions.
Did you try increasing the Catalog heap size? https://www.cloudera.com/documentation/enterprise/latest/topics/impala_scalability.html#scalability_... has some basic info.
The metrics you should be looking at are the JVM heap metrics - all of the catalog data is stored on the JVM heap. You can look at jvm.heap.current-usage-bytes compared to the configured heap size or jvm.heap.max-usage-bytes. You need to be careful interpreting it though because it's the JVM and memory is garbage collected - it isn't necessarily bad if the heap fills up so long as the next GC is able to clean up garbage. The "bad" pattern that would warn you that the heap is close to capacity is if the heap repeatedly filling up and each GC is only bringing down the heap size a little bit.
There are a bunch of improvements in CDH5.16 and CDH6.1 to reduce the size of incremental stats and to, optionally, automatically invalidate tables when the heap is filling up - see https://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_rn_new_in_cdh_516.html#me...
Thanks for your reply.
The improvements you have linked to look quite interesting to us.
For monitoring we will probably go with an average of jvm_total_current_usage_bytes over time. That way we can extract that info out of Impala quite easily into our monitoring system as well.
FYI, you can create a trigger on CM:
1) Go to CM -> Clusters -> Impala service -> Status;
2) Click the button "Create Trigger";
3) Enter the name of the new trigger;
4) Enter "impala_catalogserver_jvm_heap_current_usage_bytes" in the Metric box;
5) Choose METRIC CONDITIONS, ACTION, Metric Evaluation Window etc;
6) Click the button "Create Trigger" to save it.
Then the action will be triggered if the condition is met. For example, you can see an alert if the current JVM usage is higher than the threshold. I hope it can help you monitor the memory usage.