Developer Blogs

Announcements
Now Live: Explore expert insights and technical deep dives on the new Cloudera Community BlogsRead the Announcement

Running Cloudera Data Platform on Red Hat OpenShift Virtualization

avatar
Cloudera Employee

As enterprises modernize their on-premises data infrastructure, the combination of Cloudera Data Platform and Red Hat OpenShift Virtualization (OCP-V) is emerging as a transformative architecture.
This joint engineering work demonstrates:

  • Production-grade deployment of Cloudera on-premises Base (a.k.a Cloudera Private Cloud Base) on OCP-V
  • Successful validation at 100+ OCP-V Virtualized nodes
  • End-to-end integration with Data Services (CDW, CDE, CAI)
  • Enterprise-grade security with TLS, Kerberos, LDAP, Ranger, and Knox

This document brings together the complete deployment workflow, architecture overview, a real-world use case deployed for functional validation, along with validation results and key outcomes - serving as a practical blueprint for field, engineering, and customer teams.

Why OpenShift Virtualization for Cloudera?

This converged architecture encompasses the following:

  • Unified, Cloud-Native Platform: OpenShift Virtualization enables Cloudera Base VMs and containerized workloads to run together on a single Kubernetes platform, eliminating split operational models and simplifying lifecycle management.
  • Elastic, Automated, and Secure Operations: Dynamic resource scaling, rapid VM provisioning, built-in high availability, strong security controls, and GitOps/Ansible-driven automation address the rigidity and operational overhead of legacy VM and bare-metal environments.
  • Proven at Scale: This architecture is validated through large-scale testing, integrating a 100+ node OpenShift Virtualization–based Cloudera Base cluster with a dedicated 7-node bare-metal Data Services cluster, demonstrating robustness and real-world readiness.

The Technical Stack at a Glance

Red Hat and Cloudera components:

Component

Key Version

Role

Red Hat OpenShift Container Platform,

Red Hat OpenShift Virtualization

4.17.42

Virtualization & Containerization

Red Hat Enterprise Linux (RHEL)

9.5

Operating system

Cloudera Manager

7.13.1 CHF4+

Centralized cluster management

Cloudera Base on Premises

7.3.1.400 SP2

Cloudera’s core data runtime

Cloudera Data Services on Premises

1.5.5

Platform for Data Services

Cloudera Base - OpenShift Cluster - 100+ Bare Metal Nodes 

OpenShift Virtualization VMs created across the 100+ worker node cluster, with node affinity/anti-affinity assigned to run one VM on each bare-metal worker node.

OpenShift Virtualization for Cloudera Base.png

Cloudera Data Services - OpenShift Cluster - 7 Bare Metal Nodes 

OpenShift for Cloudera Data Services (1).png

Data Services Testing At Scale

End to End Functionality was validated across the major Data Services:

  • CDW (Cloudera Data Warehouse) : Created Virtual Warehouses (Hive / Impala) and ran successful sample CRUD queries via the Hue interface.
  • CDE (Cloudera Data Engineering) : Set up Virtual Clusters (Spark 3.5.1), added Hadoop Authentication with a Keytab, and executed multiple sample Spark jobs.
  • CAI (Cloudera AI) : Successfully deployed the AI Workbench, configured Hadoop Authentication, and tested session creation with sample project templates, along with the deployment of Cloudera Agent Studio.

Enterprise Security

Implemented a full security suite, ensuring the cluster meets enterprise compliance and operational needs:

  • Identity & Access : Leveraged FreeIPA (v4.12.2) as the Identity Provider for Kerberos and LDAP, securing principals and simplifying user management.
  • Data Protection : Used Ranger for granular access control and AutoTLS (Self-signed) for encryption in transit.
  • Gateways : Knox and Atlas were configured, with access secured either directly or via the Knox SSO gateway.

Real-World Proof: The Use Case

Executed a Bank Branch Performance Analytics workload to exercise the full data lifecycle - ingestion, processing, and serving. This involved:

  1. Ingesting data into HDFS/Ozone.
  2. Processing and transforming the data using CDE Spark jobs.
  3. Querying and reporting from data materialized in CDW (Hive/Impala).
  4. Visualizing the final metrics with a data visualization app.

Full Use Case Details

Together, Cloudera and Red Hat enable a future-ready on-premises analytics platform that combines the flexibility of Kubernetes with the power of enterprise data services. This proven, large-scale deployment shows how organisations can confidently modernise their data infrastructure while maintaining security, control, and operational excellence.