Created on
03-31-2025
06:54 PM
- edited on
04-02-2025
10:34 PM
by
VidyaSargur
Deploying Dask Cuda Clusters on Kubernetes for distributed GPU workloads can require time, effort, and money, especially at Enterprise scale. You'd have to set up a Kubernetes cluster with GPU support, manage docker images, and dask cuda workers via complex deployment files.
Cloudera AI simplifies distributed compute use cases in the context of Machine Learning and AI. In this article, you will learn how to quickly deploy a Dask Cuda Cluster in Cloudera AI using the cmlextensions library.
Dask CUDA clusters enable scalable parallel computing on NVIDIA GPUs by leveraging Dask, a flexible parallel computing framework, in conjunction with the power of CUDA for GPU acceleration. These clusters allow users to distribute computation across multiple GPUs, significantly speeding up data processing and machine learning workflows, particularly for tasks involving large datasets or complex algorithms.
Cloudera AI is a suite of artificial intelligence and machine learning solutions designed to help organizations harness the power of their data to drive innovation and optimize decision-making. Built on top of the Cloudera Data Platform, it enables businesses to seamlessly integrate AI and ML models into their existing data workflows, providing advanced analytics capabilities.
Enterprise Data Scientists and Engineers utilize Cloudera AI to launch distributed CPU and GPU sessions with frameworks such as Tensorflow, PyTorch, Spark, and Dask. In this context, Cloudera AI simplifies the installation, configuration, and management of dependencies by providing out-of-the-box, customizable Runtimes. These facilitate the deployment of ML workflows and ensure consistency in model execution, from development to production.
The cmlextensions library is an open-source package maintained by Cloudera AI developers that provides a wrapper for the CAI Workers SDK to easily allow the deployment of distributed CPU and GPU sessions. With cmlextensions, options CAI developers can deploy Dask Cuda clusters at scale.
To reproduce this example you will need:
Create a CAI project and clone the repository located at this GitHub URL:
https://github.com/pdefusco/cmlextensions
Launch a CML Session with the following Resource Profile:
Editor: PBJ Workbench
Kernel: Python 3.10
Edition: Nvidia GPU
Version: 2025.01 or above
Spark Runtime Add-On: disabled
Resource Profile: 4 vCPU / 16 GiB Memory / 0 GPU
In the session, install Dask and CUDA requirements by running the ```install_requirements.py``` script.
Then, install the CML extensions package:
pip install git+https://github.com/cloudera/cmlextensions.git
Deploy a Dask Cuda cluster with two worker pods, each with two GPUs, by running the following code.
Shortly after running this, you should notice the Dask Scheduler and Workers on the right side of the screen.
from src.cmlextensions.dask_cuda_cluster.dask_cuda_cluster import DaskCudaCluster
cluster = DaskCudaCluster(num_workers=2, worker_cpu=4, nvidia_gpu=2, worker_memory=12, scheduler_cpu=4, scheduler_memory=12)
cluster.init()
Connect to the cluster via the Client constructor. Also on the right side of the screen, notice the Cluster has started and the Client has connected successfully.
from dask.distributed import Client
client = Client(cluster.get_client_url())
Perform some basic data manipulations:
import dask.array as da
# Create a dask array from a NumPy array
x = da.from_array([[1, 2, 3], [4, 5, 6], [7, 8, 9]], chunks=(2, 2))
# Perform a computation on the dask array
y = (x + 1) * 2
# Submit the computation to the cluster for execution
future = client.submit(y.compute)
# Wait for the computation to complete and retrieve the result
result = future.result()
print(result)
Monitor your work in the Dask Dashboard. The URL is provided in the output when the Dask Cluster is started.
print("https://"+os.environ["CDSW_ENGINE_ID"]+"."+os.environ["CDSW_DOMAIN"])
In this article, you learned how to easily deploy a distributed GPU Dask Cuda Cluster on Kubernetes in Cloudera AI in just a few steps. For more information, blogs, and documentation please visit the following sites.