Data scientists on CML Workspaces have access to GPUs to accelerate their machine learning projects and reduce the time it takes to build and train predictive models. NVIDIA GPU nodes are available for administrators to configure for CML Workspaces in both AWS and Azure.
CML now supports adding new GPU nodes to existing CML Workspaces created without GPUs, so data scientists can access GPU acceleration without having to recreate CML Workspaces. Administrators can also replace GPU nodes in CML Workspaces to switch to the latest generation GPUs.
With these new capabilities, it's easier for administrators to manage GPU nodes in CML Workspaces and enable data scientists to use the newest generation of GPUs.
The Data Discovery and Visualization experience ships with preconfigured Data Connections, a database browser, interactive SQL editor, drag-and-drop Visual Dashboarding, and Connection Snippets. These new capabilities speed up the development process by cutting down the time spent finding, exploring, understanding, and accessing the data.
Data Scientists need to fully understand their data in order to analyze it properly, build models, and power ML use cases. To reduce the time to insights, CML ships all tools required to integrate these tools to reduce the friction between the different steps and to speed up the development process for data science teams.
These new capabilities are built on top of Cloudera Data Visualization, giving state-of-the-art visual capabilities at the hand of Data Scientists. To get started, you can step into any Project in a CML May or newer Workspace and hit the Data tab.
You can read more about the new capabilities in the documentation here.
The ML Runtimes 2022.04-1 Release includes a technical preview version of the new workbench architecture, the PBJ (Powered by Jupyter) Workbench. In the previous Workbench editor, the code and output shown in the console (the right-hand pane in the image below) were passed to and from Python, R, or Scala via a Cloudera-specific, custom messaging protocol. In the PBJ Workbench, on the other hand, the code and output are now passed to and from the target language runtime via the Jupyter messaging protocol. They are handled inside the runtime container by a Jupyter kernel and rendered in your browser by JupyterLab’s client code.
This may seem like a subtle change, but it will provide CML users with some major benefits. First, the behavior of user code and third-party libraries on CML will be more consistent with its behavior in Jupyter-based environments. That means that a wider variety of rich visualization libraries will work out of the box, and in cases where rich visualization libraries do not work, error messages in the CML console and the browser console will be easier to google. Likewise, dependency conflicts between kernel code and user code will be rarer, and when they do occur they will be easier for customers to diagnose and fix. To give you a taste of what this higher degree of consistency is like, note that Python 3’s input() function now works. Go ahead and try it out!
Second, customers will no longer need to build runtime images starting from Cloudera base images and will no longer need to restrict themselves to languages and versions that Cloudera has packaged. Any combination of base image, target language, and language version can be used with the PBJ Workbench as long as a Jupyter kernel is available for that combination.
You can try it out by running a PBJ Workbench Python session using a CML April or newer Workspace. The look and feel of the workbench will be more or less unchanged. Under the hood, however, the way that code and outputs are rendered and passed between the web app and the Python interpreter have been re-engineered to better align with the Jupyter ecosystem.
The Technical Preview documentation is available here.
Cloudera Machine Learning’s new APIv2 provides all CML users with the ability to programmatically create, read, update and delete projects and workloads, including jobs, models and applications. This means that users can automate creation and setup of projects, or trigger actions such as retraining or deploying a new version of a model as part of the project lifecycle, all from within the product or from an external scheduling or CI/CD tool, using the Python client library or HTTPS REST API.
Administrators can customize the Cloudera provided ML Runtimes to support Data Scientists’ specific use-cases. They can install additional OS packages, Python and R libraries, third-party drivers to enable connecting to external data stores, or even a new editor to be used. CML now enables registering these custom Runtimes and making them available for Data Scientists to use in their projects.
Created on 07-15-202103:34 AM - edited 07-15-202103:39 AM
Cloudera ML Runtimes are the default and recommended solution for running user workloads. New Projects will be created with ML Runtimes configured by default and we recommend migrating existing Projects to use ML Runtimes. Legacy Engines are deprecated and will be removed in a future release but workloads running on them remain fully supported.