Developer Blogs

Announcements
We’ve updated our product names and community labels - click here for full details

Running Cloudera Data Platform on Red Hat OpenShift Virtualization

avatar
Cloudera Employee

As enterprises modernize their on-premises data infrastructure, the combination of Cloudera Data Platform and Red Hat OpenShift Virtualization (OCP-V) is emerging as a transformative architecture.
This joint engineering work demonstrates:

  • Production-grade deployment of Cloudera on-premises Base (a.k.a Cloudera Private Cloud Base) on OCP-V
  • Successful validation at 100+ OCP-V Virtualized nodes
  • End-to-end integration with Data Services (CDW, CDE, CAI)
  • Enterprise-grade security with TLS, Kerberos, LDAP, Ranger, and Knox

This document brings together the complete deployment workflow, architecture overview, a real-world use case deployed for functional validation, along with validation results and key outcomes - serving as a practical blueprint for field, engineering, and customer teams.

Why OpenShift Virtualization for Cloudera?

This converged architecture encompasses the following:

  • Unified, Cloud-Native Platform: OpenShift Virtualization enables Cloudera Base VMs and containerized workloads to run together on a single Kubernetes platform, eliminating split operational models and simplifying lifecycle management.
  • Elastic, Automated, and Secure Operations: Dynamic resource scaling, rapid VM provisioning, built-in high availability, strong security controls, and GitOps/Ansible-driven automation address the rigidity and operational overhead of legacy VM and bare-metal environments.
  • Proven at Scale: This architecture is validated through large-scale testing, integrating a 100+ node OpenShift Virtualization–based Cloudera Base cluster with a dedicated 7-node bare-metal Data Services cluster, demonstrating robustness and real-world readiness.

The Technical Stack at a Glance

Red Hat and Cloudera components:

Component

Key Version

Role

Red Hat OpenShift Container Platform,

Red Hat OpenShift Virtualization

4.17.42

Virtualization & Containerization

Red Hat Enterprise Linux (RHEL)

9.5

Operating system

Cloudera Manager

7.13.1 CHF4+

Centralized cluster management

Cloudera Base on Premises

7.3.1.400 SP2

Cloudera’s core data runtime

Cloudera Data Services on Premises

1.5.5

Platform for Data Services

Cloudera Base - OpenShift Cluster - 100+ Bare Metal Nodes 

OpenShift Virtualization VMs created across the 100+ worker node cluster, with node affinity/anti-affinity assigned to run one VM on each bare-metal worker node.

OpenShift Virtualization for Cloudera Base.png

Cloudera Data Services - OpenShift Cluster - 7 Bare Metal Nodes 

OpenShift for Cloudera Data Services (1).png

Data Services Testing At Scale

End to End Functionality was validated across the major Data Services:

  • CDW (Cloudera Data Warehouse) : Created Virtual Warehouses (Hive / Impala) and ran successful sample CRUD queries via the Hue interface.
  • CDE (Cloudera Data Engineering) : Set up Virtual Clusters (Spark 3.5.1), added Hadoop Authentication with a Keytab, and executed multiple sample Spark jobs.
  • CAI (Cloudera AI) : Successfully deployed the AI Workbench, configured Hadoop Authentication, and tested session creation with sample project templates, along with the deployment of Cloudera Agent Studio.

Enterprise Security

Implemented a full security suite, ensuring the cluster meets enterprise compliance and operational needs:

  • Identity & Access : Leveraged FreeIPA (v4.12.2) as the Identity Provider for Kerberos and LDAP, securing principals and simplifying user management.
  • Data Protection : Used Ranger for granular access control and AutoTLS (Self-signed) for encryption in transit.
  • Gateways : Knox and Atlas were configured, with access secured either directly or via the Knox SSO gateway.

Real-World Proof: The Use Case

Executed a Bank Branch Performance Analytics workload to exercise the full data lifecycle - ingestion, processing, and serving. This involved:

  1. Ingesting data into HDFS/Ozone.
  2. Processing and transforming the data using CDE Spark jobs.
  3. Querying and reporting from data materialized in CDW (Hive/Impala).
  4. Visualizing the final metrics with a data visualization app.

Full Use Case Details

Together, Cloudera and Red Hat enable a future-ready on-premises analytics platform that combines the flexibility of Kubernetes with the power of enterprise data services. This proven, large-scale deployment shows how organisations can confidently modernise their data infrastructure while maintaining security, control, and operational excellence.