Created 01-31-2026 09:59 PM
Hi everyone.
Just want to check as of latest CML version, has there been any support for using GPU for Spark in CML ?
I have seen:
1. GPU used for Spark in CDE, or
2. GPU used in general Python project ( using torch, tensorflow, rapids' cudf/cuml) in CML,
but can't find any for Spark in CML.
If there is no such support; is there any specific reason why it cannot be done while it can be done for general Spark on k8s.
Thanks for any info...
Created 02-02-2026 07:14 PM
Seems like the message was there all along when we choose the runtime, e.g. Jupyterlab -> Python xx -> Edition: Nvidia GPU, and enable Spark; the message will appear:
"Spark is not compatible with the selected Edition. If you enable Spark for the session, it can be used independently but it will not be accelerated"
I didnt see the warning message before because have only allowed our own customized runtime, which didn't display this warning message.
Created on 02-02-2026 08:53 AM - edited 02-02-2026 08:54 AM
Hello @backtohome,
So far I know, we do support GPUs for Spark workloads on CML.
The documentation talks about that:
Autoscaling: Cloudera AI also supports native cloud autoscaling via Kubernetes. When clusters do not have the required capacity to run workloads, they can automatically scale up additional nodes. Administrators can configure auto-scaling upper limits, which determine how large a compute cluster can grow. Since compute costs increase as cluster size increases, having a way to configure upper limits gives administrators a method to stay within a budget. Autoscaling policies can also account for heterogeneous node types such as GPU nodes.
https://docs.cloudera.com/machine-learning/1.5.5/spark/topics/ml-apache-spark-overview.html
You have to configure them by following this doc:
https://docs.cloudera.com/machine-learning/1.5.5/gpu/topics/ml-gpu.html
If you do not have the GPUs configured on CML, the UI will not show you the options, such like this:
Created 02-02-2026 04:56 PM
Thanks for the reply @vafs ; I can't be sure that was the definite answer though, because it is combining answers from two different sources:
- first one is about supporting spark (nothing about gpu mentioned).
- second one is about gpu (nothing about spark mentioned).
And when I tried a Spark code that works in YARN+GPU, with slight modification to fit into CML, it just didnt go well. Not sure if I've done something wrong, that's why I am looking for a definite answer, probably with some github example like what Cloudera have provided for pytorch and tensorflow for CML. Hence, me raising this question.
Somehow, I kind of remembering seeing somewhere that in CDE, it is only Technical Preview, but in CML it is not yet; but can't seem to be able to find where was that page :).
Created 02-02-2026 07:14 PM
Seems like the message was there all along when we choose the runtime, e.g. Jupyterlab -> Python xx -> Edition: Nvidia GPU, and enable Spark; the message will appear:
"Spark is not compatible with the selected Edition. If you enable Spark for the session, it can be used independently but it will not be accelerated"
I didnt see the warning message before because have only allowed our own customized runtime, which didn't display this warning message.