What's New @ Cloudera

Find the latest Cloudera product news
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.
Rising Star

We are excited to announce a new "Add Data" action on CML’s Data Connections that allows users to easily upload data into CDP. This new capability simplifies the process of bringing data to CDP and enables Data Scientists to directly ingest their data without depending on administrators or data engineers.

 

peter_ableda_0-1682578276550.png

 

Read More

Rising Star

We are excited to announce the launch of Custom Connection Support in Cloudera Machine Learning (CML) that enables data scientists to seamlessly connect to external data stores like legacy on-prem databases (Oracle, MSSQL, MySQL), serverless cloud databases (Redshift, Snowflake, SAP HANA Cloud, BigQuery), APIs, and specialized data stores (Neo4j), all from within CML. This feature helps data scientists discover all of their data independently, without worrying about implementation and connectivity details, unlocking their machine learning use cases from the get-go.

 

peter_ableda_0-1682578449725.png

 

Read More

Rising Star

ML Runtimes are the containerized execution environment for all workloads in CML, including interactive sessions, scheduled jobs, deployed models, and applications. Beginning February 2023, we are thrilled to release our ML Runtimes as open source. By open sourcing ML runtimes, we enable greater flexibility, transparency, and collaboration opportunities for our customers and partners. This will also enable CML to be extended with accelerated innovation in Machine Learning and AI ecosystems.

Read More

We're excited to announce the launch of the Tech Preview of Cloudera Model Registry, a centralized repository for storing, managing, and deploying machine learning models and their associated metadata. This powerful new tool is designed to simplify the MLOps process, making it easier for organizations to develop, deploy, and maintain machine learning models in a production environment.

Read More

PBJ Workbench Runtimes are GA, rebased on Jupyter for ecosystem compatibility and openness.

Read More

CML Experiments have been rebuilt, leveraging the MLflow ecosystem to complement CML’s existing strengths in model development and deployment. CML now ships the mlflow SDK and an integrated visual experience that enables experiment tracking and comparison via flexible visuals. 

Read More

As businesses continue to adopt and build open lakehouses built with Apache Iceberg on CDP, data scientists need easy access to these new table formats, so they don’t spend their time figuring out connection dependencies and configurations.

 

Cloudera Machine Learning’s Data Connection and Snippet support simplify data access in CDP. Data scientists can use the cml.data library to gain access to a Data Lake via Spark or query their Virtual Warehouse with Hive or Impala. With recent improvements to the cml.data library, CML Snippets now fully support the Iceberg table format for all Spark, Hive, and Impala data connections. 


To learn more about Data Connection and Snippet read the following article:
https://blog.cloudera.com/one-line-away-from-your-data/

Rising Star

Cloudera Machine Learning now provides a built-in dashboard for monitoring technical metrics relating to deployed CML Models, such as request throughput, latency, and resource consumption.

Read More

CML's Backup and Restore feature is now generally available in the public cloud on AWS. Administrators can backup their CML Workspaces and ensure business continuity in case of failures and outages. 

Read More

CML gives new controls for Administrators to disable certain ML Runtime variants or specific versions

Read More

Data scientists on CML Workspaces have access to GPUs to accelerate their machine learning projects and reduce the time it takes to build and train predictive models. NVIDIA GPU nodes are available for administrators to configure for CML Workspaces in both AWS and Azure.

 

CML now supports adding new GPU nodes to existing CML Workspaces created without GPUs, so data scientists can access GPU acceleration without having to recreate CML Workspaces. Administrators can also replace GPU nodes in CML Workspaces to switch to the latest generation GPUs.

Screenshot 2022-06-14 at 10.25.41.png

With these new capabilities, it's easier for administrators to manage GPU nodes in CML Workspaces and enable data scientists to use the newest generation of GPUs. 

The Data Discovery and Visualization experience ships with preconfigured Data Connections, a database browser, interactive SQL editor, drag-and-drop Visual Dashboarding, and Connection Snippets. These new capabilities speed up the development process by cutting down the time spent finding, exploring, understanding, and accessing the data.

 

Data Scientists need to fully understand their data in order to analyze it properly, build models, and power ML use cases. To reduce the time to insights, CML ships all tools required to integrate these tools to reduce the friction between the different steps and to speed up the development process for data science teams.

 

peter_ableda_0-1653488319852.png

 

These new capabilities are built on top of Cloudera Data Visualization, giving state-of-the-art visual capabilities at the hand of Data Scientists. To get started, you can step into any Project in a CML May or newer Workspace and hit the Data tab.

 

You can read more about the new capabilities in the documentation here.

The ML Runtimes 2022.04-1 Release includes a technical preview version of the new workbench architecture, the PBJ (Powered by Jupyter) Workbench. In the previous Workbench editor, the code and output shown in the console (the right-hand pane in the image below) were passed to and from Python, R, or Scala via a Cloudera-specific, custom messaging protocol. In the PBJ Workbench, on the other hand, the code and output are now passed to and from the target language runtime via the Jupyter messaging protocol. They are handled inside the runtime container by a Jupyter kernel and rendered in your browser by JupyterLab’s client code.

 

This may seem like a subtle change, but it will provide CML users with some major benefits. First, the behavior of user code and third-party libraries on CML will be more consistent with its behavior in Jupyter-based environments. That means that a wider variety of rich visualization libraries will work out of the box, and in cases where rich visualization libraries do not work, error messages in the CML console and the browser console will be easier to google. Likewise, dependency conflicts between kernel code and user code will be rarer, and when they do occur they will be easier for customers to diagnose and fix. To give you a taste of what this higher degree of consistency is like, note that Python 3’s input() function now works. Go ahead and try it out!

 

Second, customers will no longer need to build runtime images starting from Cloudera base images and will no longer need to restrict themselves to languages and versions that Cloudera has packaged. Any combination of base image, target language, and language version can be used with the PBJ Workbench as long as a Jupyter kernel is available for that combination.

 

peter_ableda_1-1653487296917.png

 

You can try it out by running a PBJ Workbench Python session using a CML April or newer Workspace. The look and feel of the workbench will be more or less unchanged. Under the hood, however, the way that code and outputs are rendered and passed between the web app and the Python interpreter have been re-engineered to better align with the Jupyter ecosystem.

 

The Technical Preview documentation is available here.

With CML's multi-version Spark support, CML users can now run different versions of Spark side by side, even within a single project. 

Read More

Cloudera Machine Learning’s new APIv2 provides all CML users with the ability to programmatically create, read, update and delete projects and workloads, including jobs, models and applications. This means that users can automate creation and setup of projects, or trigger actions such as retraining or deploying a new version of a model as part of the project lifecycle, all from within the product or from an external scheduling or CI/CD tool, using the Python client library or HTTPS REST API.

Read More

Don't have an account?
Your experience may be limited. Sign in to explore more.