New Contributor
Posts: 5
Registered: ‎09-20-2018

Cluster utilization metrics / time series table metrics

We are trying to identify cluster utilization metrics (specific to Cloudera), which can provide good KPI for predicting future workloads, managing users etc with a large number of Data Scientists in place keeping in mind a heavy utilization on Impala, Spark and YARN processes. Is there an analysis done by DataScience CoE with Cloudera?

Some examples could be CPU Utilization, YARN/Impala Utilization, Long running Impala jobs etc with a prediction model.

We explored a little bit of Cloudera Workload XM, but that doesn’t give any predictive analysis on Cluster utilization.

Has someone tried Time Series Table Metrics option and derived a conclusion based on that? Any tips/suggestions?