Member since
07-09-2015
68
Posts
24
Kudos Received
12
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
8934 | 11-23-2018 03:38 AM | |
2304 | 10-07-2018 11:44 PM | |
2942 | 09-24-2018 12:09 AM | |
4651 | 09-13-2018 02:27 AM | |
2978 | 09-12-2018 02:27 AM |
10-20-2023
02:41 PM
1 Kudo
We're excited to announce the General Availability (GA) of the Cloudera Model Registry, your centralized hub for storing, managing, and deploying machine learning models and their associated metadata. This robust tool simplifies the MLOps process, enabling organizations to develop, deploy, and maintain machine learning models effortlessly in a production environment. Cloudera's Model Registry addresses the challenges of fragmentation and lack of visibility in MLOps workflows. Serving as the single source of truth for model versions and lineage, it streamlines workflows, enhancing the traceability and reproducibility of model development. The Model Registry is now officially accessible in CML Public Cloud. To harness the full potential of General Availability (GA), upgrade your CML Workspaces and deploy your new Model Registry! You can find more information about the new Model Registry in our documentation.
... View more
Labels:
04-26-2023
11:50 PM
3 Kudos
Data ingestion is a common task in the data science workflow, which often involves coordinating with multiple teams. With the "Add Data" action on CDP Data Connections, data scientists can now easily upload data into CDP Data Stores such as Impala or Hive Virtual Warehouses to manage and govern data at scale. This means that data scientists can focus on analyzing and working with their own data rather than dealing with the complexities of data ingestion. To get started with this feature, users can simply open the “Data” tab in their CML Project and click on the "Add Data" action on the CDP Data Connection they wish to use, and follow the prompts to upload their data into a CDP Data Store. In addition to simplifying the data ingestion process, the "Add Data" action also provides users with several options for customizing the data import. These options include selecting the database and table name for the data, as well as selecting the column delimiter and locale. Users can also change the column names and types during the import process, giving them greater flexibility in how they want to land their data. These options make it easier for data scientists to import their data into CDP in a way that is customized to their specific needs, reducing the time and effort required to prepare their data for analysis. For more information about the "Add Data" action on CDP Data Connections, users can refer to Cloudera's documentation.
... View more
Labels:
04-26-2023
11:47 PM
1 Kudo
We are excited to announce the launch of Custom Connection Support in Cloudera Machine Learning (CML) that enables data scientists to seamlessly connect to external data stores from within CML. This feature helps data scientists discover all of their data independently, without worrying about implementation and connectivity details, unlocking their machine learning use cases from the get-go. Many users struggle with accessing data from various sources, such as legacy on-prem databases (Oracle, MSSQL, MySQL), serverless cloud databases (Redshift, Snowflake, SAP HANA Cloud, BigQuery), APIs, and specialized data stores (Neo4j). Until now, CDP administrators set up data replication or ingestion pipelines, to make data discoverable and accessible within CDP or work directly with data scientists to provide the necessary endpoints, configurations, and authentication details to set up connections manually. This process not only delayed machine learning use cases but also burdened CDP administrators with additional work. We believe that data scientists should be able to focus on solving complex business problems without being hindered by data accessibility challenges. Our new Custom Data Connections feature effectively addresses the aforementioned obstacles and facilitates effortless access to all possible external data stores. Data scientists can now concentrate on what they do best and start working on their machine learning projects as soon as they gain access to CML, without the need to wait for IT teams to transfer data to CDP or request assistance. Moreover, Custom Connection Support opens up a realm of new possibilities by enabling use cases such as processing graph data from graph databases, enriching data via APIs, and directly working with data that has been classified as 'unmovable' in legacy databases. In summary, Custom Connection Support in CML will bring unparalleled efficiency and flexibility to data scientists and organizations alike. To learn more about Custom Connection Support in CML, check out our documentation and example connections.
... View more
Labels:
02-23-2023
09:42 AM
2 Kudos
Machine Learning and data science libraries and frameworks have grown at an exponential pace along with algorithmic advancements with the introduction and evolution of Neural Networks to Transformer libraries. To keep up with this innovation, our customers have always asked for a pluggable architecture where the libraries can be chosen and hand-selected by their users, yet works within the ML platform that is enabled by CML from data exploration to model operations. Open source ML Runtimes afford that extensibility to our partners and customers. They can extend and create purpose-built runtimes for data science teams and projects. With our prior investment in the PBJ (Powered by Jupyter) architecture, we can now rely on open source, community-supported protocols and release a new family of our ML Runtimes to better align with the Jupyter ecosystem. With this rebuilt infrastructure, customers and partners will no longer need to build runtime images starting from Cloudera base images. They will no longer need to restrict themselves to languages and versions that Cloudera has packaged. Any combination of the base image, target language, and language version can be used. By releasing the PBJ ML Runtimes as open source, we can provide more transparency and detail to our customers regarding the environment they are working in. The Dockerfiles used to build the container images act as detailed documentation for customers to understand their working environment fully. Additionally, the open-sourced ML Runtimes serve as a blueprint to create custom Runtimes, supporting building Runtimes on a custom OS, using a custom kernel, or integrating their existing ML container images with CML. You can access the first release of PBJ ML Runtimes in our public GitHub repository: https://github.com/cloudera/ml-runtimes To learn more about Cloudera ML Runtimes, please visit our documentation.
... View more
Labels:
02-14-2023
11:05 AM
1 Kudo
We're excited to announce the launch of the Tech Preview of Cloudera Model Registry, a centralized repository for storing, managing, and deploying machine learning models and their associated metadata. This powerful new tool is designed to simplify the MLOps process, making it easier for organizations to develop, deploy, and maintain machine learning models in a production environment. The Model Registry solves the problem of fragmentation and lack of visibility in MLOps workflows. It provides a single source of truth for model versions and lineage, enabling organizations to streamline their workflows and enhance the traceability and reproducibility of model development. The new Model Registry is now available in CML Private Cloud 1.5 and in CML Public Cloud. To get started with the Tech Preview version, reach out to us! You can learn more about the new Model Registry in our documentation.
... View more
Labels:
12-09-2022
08:55 AM
1 Kudo
The ML Runtimes 2022.11-1 Release includes the GA version of the new workbench architecture, the PBJ (Powered by Jupyter) Workbench. In the previous Workbench editor, the code and output shown in the console (the right-hand pane in the image below) were passed to and from Python, R, or Scala via a Cloudera-specific, custom messaging protocol. In the PBJ Workbench, on the other hand, the code and output are now passed to and from the target language runtime via the Jupyter messaging protocol. They are handled inside the runtime container by a Jupyter kernel and rendered in your browser by JupyterLab’s client code. This may seem like a subtle change, but it will provide CML users with some major benefits. First, the behavior of user code and third-party libraries on CML will be more consistent with its behavior in Jupyter-based environments. That means that a wider variety of rich visualization libraries will work out of the box, and in cases where rich visualization libraries do not work, error messages in the CML console and the browser console will be easier to google. Likewise, dependency conflicts between kernel code and user code will be rarer, and when they do occur, they will be easier for customers to diagnose and fix. To give you a taste of what this higher degree of consistency is like, note that Python 3’s input() function now works. Go ahead and try it out! Second, customers will no longer need to build runtime images starting from Cloudera base images and will no longer need to restrict themselves to languages and versions that Cloudera has packaged. Any combination of base image, target language, and language version can be used with the PBJ Workbench as long as a Jupyter kernel is available for that combination. You can try it out by running a PBJ Workbench Python session using a CML November or newer Workspace. The look and feel of the workbench will be more or less unchanged. Under the hood, however, the way that code and outputs are rendered and passed between the web app and the Python interpreter have been re-engineered to better align with the Jupyter ecosystem. To learn how to construct BPJ ML Runtimes, follow the documentation.
... View more
Labels:
11-30-2022
11:46 PM
1 Kudo
CML Experiments have been rebuilt, leveraging the MLflow ecosystem to complement CML’s existing strengths in model development and deployment. CML now ships the mlflow SDK and an integrated visual experience that enables experiment tracking and comparison via flexible visuals. Model development is an iterative process that involves many “experiments” to determine the right combination of data, algorithm, and hyperparameters to maximize accuracy. Multiple versions of the “model” are created as the iterative process continues after the initial deployment. To efficiently develop models while addressing the technical and governance needs for traceability and repeatability, data scientists need to be able to track these different experiments and model versions and see which experiments produced which models. CML Experiment Tracking through MLflow API CML’s experiment tracking features allow you to use the MLflow client library for logging parameters, code versions, metrics, and output files when running your machine learning code. The MLflow library is available in CML Sessions without you having to install it. CML also provides a native UI for later visualizing the experiment results. You can learn more about how Cloudera enables data scientists to deliver AI Applications faster here.
... View more
Labels:
10-04-2022
09:04 AM
As businesses continue to adopt and build open lakehouses built with Apache Iceberg on CDP, data scientists need easy access to these new table formats, so they don’t spend their time figuring out connection dependencies and configurations. Cloudera Machine Learning’s Data Connection and Snippet support simplify data access in CDP. Data scientists can use the cml.data library to gain access to a Data Lake via Spark or query their Virtual Warehouse with Hive or Impala. With recent improvements to the cml.data library, CML Snippets now fully support the Iceberg table format for all Spark, Hive, and Impala data connections. To learn more about Data Connection and Snippet read the following article: https://blog.cloudera.com/one-line-away-from-your-data/
... View more
Labels:
09-30-2022
06:25 AM
Cloudera Machine Learning now provides a built-in dashboard for monitoring technical metrics relating to deployed CML Models, such as request throughput, latency, and resource consumption. When machine learning models are deployed in production, it’s essential to know whether the model is successfully providing responses to all queries within the required timeframe and to be able to investigate and find the root cause for any failed responses or other downstream issues. Further, it can be challenging to know ahead of time the resource requirements for the model, such as the number of replicas, and amount of memory and CPU allocated to each replica. This can make it difficult to find the balance between the risk of underprovisioning resources leading to slow responses or timed out requests, and the risk of unnecessarily wasting resources that could be used for other workloads. The new monitoring features for CML Models provide observability that makes these challenges much easier to manage, allowing ML Engineers to be confident that their Model is right-sized and performing within SLAs. The dashboard is available to any end-user with access to the Model, and allows users to view these technical metrics over custom time windows, either aggregated or per-replica, making it easier for developers to understand the resource needs of their Models and monitor the health of production deployments. To view the dashboard, select the Monitoring tab of the deployed Model.
... View more
Labels:
08-01-2022
12:29 AM
Over the last few quarters, more and more of our customers deployed their production workloads on Cloudera Machine Learning. Some of them rely on CML with their predictive maintenance use case, others predict churn or detect fraudulent transactions. The common thing between them is that the ml workloads running on CML are critical for their business’s success. The new CML Backup and Restore capability gives you an extra layer of protection by giving you the ability to resume operations in a timely manner following an outage or crisis. Now, administrators can take on-demand backups of CML workspaces before cluster operations like an upgrade and do periodic backups during off-peak hours. Backed-up CML workspaces can be restored into a new CML workspace in the same or a new CDP environment, and all project artifacts like deployed models and applications will be recovered. The Backup and Restore capabilities are available on AWS today, and we are planning to roll out the same capabilities on Azure in the future. To learn more about these new capabilities, visit our documentation.
... View more
Labels: