Member since
07-09-2015
70
Posts
29
Kudos Received
12
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
12471 | 11-23-2018 03:38 AM | |
2925 | 10-07-2018 11:44 PM | |
3625 | 09-24-2018 12:09 AM | |
5819 | 09-13-2018 02:27 AM | |
3937 | 09-12-2018 02:27 AM |
10-31-2024
07:20 AM
2 Kudos
We’re pleased to announce that Cloudera Copilot for Cloudera AI Workbench is now generally available, bringing AI-powered productivity to the data science and machine learning development workflow. Cloudera Copilot is designed to accelerate development by generating code snippets, offering real-time assistance with troubleshooting, and enabling collaboration by answering questions about the project. These capabilities streamline the workflow from data exploration to model training, helping data scientists work more efficiently and reducing time spent on routine coding and debugging. Copilots have two main architectural components: the tooling and the LLM model. The tooling integrates with the developer’s IDE to deliver intelligent assistance in the coding environment, offering features like code generation, inline suggestions, and contextual guidance to streamline development. The model serves as the brain, powering these capabilities by interpreting user input and generating responses based on trained knowledge. In Cloudera Copilot, the tooling is embedded directly within Cloudera ML Runtimes with JupyterLab, while the model is configurable by administrators to fit organizational needs. Deploying the LLM with Cloudera AI Inference service ensures a completely private setup where no data—such as code and IP—leaves the developer’s environment, making it ideal for even the most sensitive projects. To get started with Cloudera Copilot, administrators simply need to configure the LLM model to power the assistant hosted on Cloudera AI Inference service or Amazon Bedrock. Cloudera Copilot is available in JupyterLab environments running ML Runtime 2024.10.1 or later. To learn more, you can visit the Cloudera Copilot documentation.
... View more
Labels:
10-08-2024
08:10 AM
3 Kudos
We are excited to announce the General Availability of the Cloudera AI Inference service, with NVIDIA NIM microservices in the Public Cloud. This new service enables enterprises to rapidly deploy and scale traditional, Large Language Models (LLM) and generative AI models to power AI applications. Designed with enterprise-grade security and performance optimizations, this service helps businesses unlock the full potential of AI with increased flexibility, speed, and control. As enterprises rapidly move from AI experimentation to production, the need for scalable, high-performance infrastructure becomes critical. The Cloudera AI Inference service directly addresses these needs, providing a seamless environment for deploying advanced AI models, such as LLaMA 3.1, Mistral, and Mixtral, with the ability to handle real-time inference workloads at scale. By leveraging NVIDIA NIM and high-performance GPUs enterprises can achieve up to 36x faster model inference, drastically reducing decision-making time. Additionally, the service ensures enterprise-grade security by running models within the customer's Virtual Private Cloud (VPC), ensuring they maintain complete control and privacy over sensitive data. The Cloudera AI Inference service is an essential tool for any enterprise looking to harness the power of generative AI at scale without compromising on privacy, performance, or control. Getting started with the Cloudera AI Inference service is simple. Begin by exploring the Model Hub, where you can select from a curated list of top-performing LLMs and deploy production-ready models with just a few clicks. Once a model is deployed, you can interact with the model endpoints using the OpenAI API and library and integrate with your AI application. For more information on how to get started, explore the Cloudera AI Inference documentation.
... View more
Labels:
04-18-2024
01:20 AM
Great article! However there are some complications when attempting this in a Kerberised cluster. When following the guide to the t' we get an error already in these lines: SparkContext.setSystemProperty('spark.executor.cores', '4')
SparkContext.setSystemProperty('spark.executor.memory', '8g') Exception in thread "main" java.lang.IllegalArgumentException: Can't get Kerberos realm
at ...
at ...
at ...
Caused by: java.lang.IllegalArgumentException: KrbException: Cannot locate default realm
at ...
at ...
... 12 more Are there any particular steps regarding this matter?
... View more
10-20-2023
02:41 PM
1 Kudo
We're excited to announce the General Availability (GA) of the Cloudera Model Registry, your centralized hub for storing, managing, and deploying machine learning models and their associated metadata. This robust tool simplifies the MLOps process, enabling organizations to develop, deploy, and maintain machine learning models effortlessly in a production environment. Cloudera's Model Registry addresses the challenges of fragmentation and lack of visibility in MLOps workflows. Serving as the single source of truth for model versions and lineage, it streamlines workflows, enhancing the traceability and reproducibility of model development. The Model Registry is now officially accessible in CML Public Cloud. To harness the full potential of General Availability (GA), upgrade your CML Workspaces and deploy your new Model Registry! You can find more information about the new Model Registry in our documentation.
... View more
Labels:
04-26-2023
11:50 PM
3 Kudos
Data ingestion is a common task in the data science workflow, which often involves coordinating with multiple teams. With the "Add Data" action on CDP Data Connections, data scientists can now easily upload data into CDP Data Stores such as Impala or Hive Virtual Warehouses to manage and govern data at scale. This means that data scientists can focus on analyzing and working with their own data rather than dealing with the complexities of data ingestion. To get started with this feature, users can simply open the “Data” tab in their CML Project and click on the "Add Data" action on the CDP Data Connection they wish to use, and follow the prompts to upload their data into a CDP Data Store. In addition to simplifying the data ingestion process, the "Add Data" action also provides users with several options for customizing the data import. These options include selecting the database and table name for the data, as well as selecting the column delimiter and locale. Users can also change the column names and types during the import process, giving them greater flexibility in how they want to land their data. These options make it easier for data scientists to import their data into CDP in a way that is customized to their specific needs, reducing the time and effort required to prepare their data for analysis. For more information about the "Add Data" action on CDP Data Connections, users can refer to Cloudera's documentation.
... View more
Labels:
04-26-2023
11:47 PM
1 Kudo
We are excited to announce the launch of Custom Connection Support in Cloudera Machine Learning (CML) that enables data scientists to seamlessly connect to external data stores from within CML. This feature helps data scientists discover all of their data independently, without worrying about implementation and connectivity details, unlocking their machine learning use cases from the get-go. Many users struggle with accessing data from various sources, such as legacy on-prem databases (Oracle, MSSQL, MySQL), serverless cloud databases (Redshift, Snowflake, SAP HANA Cloud, BigQuery), APIs, and specialized data stores (Neo4j). Until now, CDP administrators set up data replication or ingestion pipelines, to make data discoverable and accessible within CDP or work directly with data scientists to provide the necessary endpoints, configurations, and authentication details to set up connections manually. This process not only delayed machine learning use cases but also burdened CDP administrators with additional work. We believe that data scientists should be able to focus on solving complex business problems without being hindered by data accessibility challenges. Our new Custom Data Connections feature effectively addresses the aforementioned obstacles and facilitates effortless access to all possible external data stores. Data scientists can now concentrate on what they do best and start working on their machine learning projects as soon as they gain access to CML, without the need to wait for IT teams to transfer data to CDP or request assistance. Moreover, Custom Connection Support opens up a realm of new possibilities by enabling use cases such as processing graph data from graph databases, enriching data via APIs, and directly working with data that has been classified as 'unmovable' in legacy databases. In summary, Custom Connection Support in CML will bring unparalleled efficiency and flexibility to data scientists and organizations alike. To learn more about Custom Connection Support in CML, check out our documentation and example connections.
... View more
Labels:
02-23-2023
09:42 AM
2 Kudos
Machine Learning and data science libraries and frameworks have grown at an exponential pace along with algorithmic advancements with the introduction and evolution of Neural Networks to Transformer libraries. To keep up with this innovation, our customers have always asked for a pluggable architecture where the libraries can be chosen and hand-selected by their users, yet works within the ML platform that is enabled by CML from data exploration to model operations. Open source ML Runtimes afford that extensibility to our partners and customers. They can extend and create purpose-built runtimes for data science teams and projects. With our prior investment in the PBJ (Powered by Jupyter) architecture, we can now rely on open source, community-supported protocols and release a new family of our ML Runtimes to better align with the Jupyter ecosystem. With this rebuilt infrastructure, customers and partners will no longer need to build runtime images starting from Cloudera base images. They will no longer need to restrict themselves to languages and versions that Cloudera has packaged. Any combination of the base image, target language, and language version can be used. By releasing the PBJ ML Runtimes as open source, we can provide more transparency and detail to our customers regarding the environment they are working in. The Dockerfiles used to build the container images act as detailed documentation for customers to understand their working environment fully. Additionally, the open-sourced ML Runtimes serve as a blueprint to create custom Runtimes, supporting building Runtimes on a custom OS, using a custom kernel, or integrating their existing ML container images with CML. You can access the first release of PBJ ML Runtimes in our public GitHub repository: https://github.com/cloudera/ml-runtimes To learn more about Cloudera ML Runtimes, please visit our documentation.
... View more
Labels:
02-14-2023
11:05 AM
1 Kudo
We're excited to announce the launch of the Tech Preview of Cloudera Model Registry, a centralized repository for storing, managing, and deploying machine learning models and their associated metadata. This powerful new tool is designed to simplify the MLOps process, making it easier for organizations to develop, deploy, and maintain machine learning models in a production environment. The Model Registry solves the problem of fragmentation and lack of visibility in MLOps workflows. It provides a single source of truth for model versions and lineage, enabling organizations to streamline their workflows and enhance the traceability and reproducibility of model development. The new Model Registry is now available in CML Private Cloud 1.5 and in CML Public Cloud. To get started with the Tech Preview version, reach out to us! You can learn more about the new Model Registry in our documentation.
... View more
Labels:
12-09-2022
08:55 AM
1 Kudo
The ML Runtimes 2022.11-1 Release includes the GA version of the new workbench architecture, the PBJ (Powered by Jupyter) Workbench. In the previous Workbench editor, the code and output shown in the console (the right-hand pane in the image below) were passed to and from Python, R, or Scala via a Cloudera-specific, custom messaging protocol. In the PBJ Workbench, on the other hand, the code and output are now passed to and from the target language runtime via the Jupyter messaging protocol. They are handled inside the runtime container by a Jupyter kernel and rendered in your browser by JupyterLab’s client code. This may seem like a subtle change, but it will provide CML users with some major benefits. First, the behavior of user code and third-party libraries on CML will be more consistent with its behavior in Jupyter-based environments. That means that a wider variety of rich visualization libraries will work out of the box, and in cases where rich visualization libraries do not work, error messages in the CML console and the browser console will be easier to google. Likewise, dependency conflicts between kernel code and user code will be rarer, and when they do occur, they will be easier for customers to diagnose and fix. To give you a taste of what this higher degree of consistency is like, note that Python 3’s input() function now works. Go ahead and try it out! Second, customers will no longer need to build runtime images starting from Cloudera base images and will no longer need to restrict themselves to languages and versions that Cloudera has packaged. Any combination of base image, target language, and language version can be used with the PBJ Workbench as long as a Jupyter kernel is available for that combination. You can try it out by running a PBJ Workbench Python session using a CML November or newer Workspace. The look and feel of the workbench will be more or less unchanged. Under the hood, however, the way that code and outputs are rendered and passed between the web app and the Python interpreter have been re-engineered to better align with the Jupyter ecosystem. To learn how to construct BPJ ML Runtimes, follow the documentation.
... View more
Labels:
11-30-2022
11:46 PM
1 Kudo
CML Experiments have been rebuilt, leveraging the MLflow ecosystem to complement CML’s existing strengths in model development and deployment. CML now ships the mlflow SDK and an integrated visual experience that enables experiment tracking and comparison via flexible visuals. Model development is an iterative process that involves many “experiments” to determine the right combination of data, algorithm, and hyperparameters to maximize accuracy. Multiple versions of the “model” are created as the iterative process continues after the initial deployment. To efficiently develop models while addressing the technical and governance needs for traceability and repeatability, data scientists need to be able to track these different experiments and model versions and see which experiments produced which models. CML Experiment Tracking through MLflow API CML’s experiment tracking features allow you to use the MLflow client library for logging parameters, code versions, metrics, and output files when running your machine learning code. The MLflow library is available in CML Sessions without you having to install it. CML also provides a native UI for later visualizing the experiment results. You can learn more about how Cloudera enables data scientists to deliver AI Applications faster here.
... View more
Labels: