Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Cluster utilization metrics / time series table metrics

Highlighted

Cluster utilization metrics / time series table metrics

New Contributor

We are trying to identify cluster utilization metrics (specific to Cloudera), which can provide good KPI for predicting future workloads, managing users etc with a large number of Data Scientists in place keeping in mind a heavy utilization on Impala, Spark and YARN processes. Is there an analysis done by DataScience CoE with Cloudera?

Some examples could be CPU Utilization, YARN/Impala Utilization, Long running Impala jobs etc with a prediction model.

We explored a little bit of Cloudera Workload XM, but that doesn’t give any predictive analysis on Cluster utilization.

Has someone tried Time Series Table Metrics option and derived a conclusion based on that? Any tips/suggestions?