What's New @ Cloudera

peter_ableda · ‎10-08-2024

We are excited to announce the General Availability of the Cloudera AI Inference service, with NVIDIA NIM microservices in the Public Cloud. This new service enables enterprises to rapidly deploy and scale traditional, Large Language Models (LLM) and generative AI models to power AI applications. Designed with enterprise-grade security and performance optimizations, this service helps businesses unlock the full potential of AI with increased flexibility, speed, and control.

As enterprises rapidly move from AI experimentation to production, the need for scalable, high-performance infrastructure becomes critical. The Cloudera AI Inference service directly addresses these needs, providing a seamless environment for deploying advanced AI models, such as LLaMA 3.1, Mistral, and Mixtral, with the ability to handle real-time inference workloads at scale. By leveraging NVIDIA NIM and high-performance GPUs enterprises can achieve up to 36x faster model inference, drastically reducing decision-making time. Additionally, the service ensures enterprise-grade security by running models within the customer's Virtual Private Cloud (VPC), ensuring they maintain complete control and privacy over sensitive data. The Cloudera AI Inference service is an essential tool for any enterprise looking to harness the power of generative AI at scale without compromising on privacy, performance, or control.

Getting started with the Cloudera AI Inference service is simple. Begin by exploring the Model Hub, where you can select from a curated list of top-performing LLMs and deploy production-ready models with just a few clicks. Once a model is deployed, you can interact with the model endpoints using the OpenAI API and library and integrate with your AI application.

For more information on how to get started, explore the Cloudera AI Inference documentation.

Cloudera Community

What's New @ Cloudera

Accelerate Your AI with the Cloudera AI Inference Service with NVIDIA NIM