Created on 05-07-2026 07:42 AM
Have you ever hit "Submit" on a Spark job and felt that nagging urge to double the executor memory just in case?
You’re not alone. As developers, our primary KPI is successful delivery. If a job fails at 3:00 AM because of an OutOfMemory error, that’s a direct hit to our productivity. To sleep better at night, we adopt a "Safety First" strategy: we request the maximum possible CPU and memory we think the job could ever need, or simply the maximum the cluster policy allows.
But here’s the reality: those "just in case" resources usually sit idle. They are reserved, billed, and locked away from other workloads, but never actually used. When you multiply this "resource hoarding" across thousands of jobs and multi-tenant clusters, you aren't just over-budget, you're creating massive scheduling bottlenecks.
At Cloudera, we realized that developers don't over-provision because they want to be wasteful; they do it because they lack visibility. You can't optimize what you can't measure. That’s why we built the Resource Efficiency Analysis feature in Cloudera Observability. We wanted to close the gap between what you asked for and what your Spark executors actually consumed.
In this post, we’re going to dive into how we calculate Spark resource wastage metrics under the hood.
To fix resource wastage, we first had to define it. In Cloudera Observability, we use a deceptively simple formula:
Resource Wastage = Requested Resources - Used Resources
To make this actionable for platform teams and "FinOps" enthusiasts, we standardize these across all engines into two universal units: Core-Millis and GB-Millis. By attaching a dollar value to these units in your chargeback settings, we can turn idle CPU cycles directly into a "Potential Savings" figure.
But how do we get these numbers accurately? Let’s look under the hood.
Since Spark is a distributed engine with long-lived executors, calculating "Allocation" is more than just looking at one number.
We calculate the total resource "rent" by summing the durations of every executor and multiplying by their individual footprints.
CPU Usage Calculation
For CPU usage, we aggregate task-level metrics like ExecutorRunTime, ExecutorDeserializeTime, and ResultSerializationTime across all tasks to calculate the actual utilisation of CPUs allocated to executors.
The Logic For Peak Memory Computation
We don't just look at the JVM heap. If you're using PySpark or off-heap memory, a simple heap analysis would give you a dangerously low recommendation.
We analyze the Resident Set Size (RSS) from the process tree. This captures the total physical memory footprint of the executor, including the JVM, Python sidecars, and the overhead. By basing our recommendation on the actual peak footprint of the entire process, we ensure the value we give you is grounded in physical reality.
Identifying "wastage" is a great first step, but let’s be honest: as an engineer, you don't just want to know you're wasting 40GB of RAM, you want to know the exact config changes to fix the wastage so you can move on with your day.
We realized that for our recommendations to be useful, they had to be prescriptive. We shifted from "you're using too much" to "change this value to X." Here is how we built that recommendation engine.
The biggest fear in down-sizing any job is the dreaded OutOfMemory (OOM) error. If our tool recommends a value that causes your job to crash, we’ve failed.
To prevent this, we built a Safety Factor into our logic. We don't just match your peak usage; we add a 20% buffer (1.2x multiplier).
A good mentor knows when not to give advice. Our recommendation engine includes several "anti-patterns" where we intentionally withhold a recommendation:
Imagine you have a Spark job configured with 8GB per executor.
You get a faster-scheduling job (due to lower resource demand), the cluster gets 4GB back (per executor) for other users, and the CFO stays happy. It’s a win-win-win.
Currently, our recommendations are based on the most recent run. But we know that data volumes fluctuate. Maybe your "Monday morning" run is 5x larger than your "Tuesday" run.
Our next evolution is Historical Trend Analysis. We are working on looking at the past runs to provide a recommendation that is "seasonally aware," ensuring your settings are robust enough for month-end processing while remaining efficient for daily tasks.