As enterprises modernize their on-premises data infrastructure, the combination of Cloudera Data Platform and Red Hat OpenShift Virtualization (OCP-V) is emerging as a transformative architecture.
This joint engineering work demonstrates:
- Production-grade deployment of Cloudera on-premises Base (a.k.a Cloudera Private Cloud Base) on OCP-V
- Successful validation at 100+ OCP-V Virtualized nodes
- End-to-end integration with Data Services (CDW, CDE, CAI)
- Enterprise-grade security with TLS, Kerberos, LDAP, Ranger, and Knox
This document brings together the complete deployment workflow, architecture overview, a real-world use case deployed for functional validation, along with validation results and key outcomes - serving as a practical blueprint for field, engineering, and customer teams.
Why OpenShift Virtualization for Cloudera?
This converged architecture encompasses the following:
- Unified, Cloud-Native Platform: OpenShift Virtualization enables Cloudera Base VMs and containerized workloads to run together on a single Kubernetes platform, eliminating split operational models and simplifying lifecycle management.
- Elastic, Automated, and Secure Operations: Dynamic resource scaling, rapid VM provisioning, built-in high availability, strong security controls, and GitOps/Ansible-driven automation address the rigidity and operational overhead of legacy VM and bare-metal environments.
- Proven at Scale: This architecture is validated through large-scale testing, integrating a 100+ node OpenShift Virtualization–based Cloudera Base cluster with a dedicated 7-node bare-metal Data Services cluster, demonstrating robustness and real-world readiness.
The Technical Stack at a Glance
Red Hat and Cloudera components:
Component | Key Version | Role |
Red Hat OpenShift Container Platform, Red Hat OpenShift Virtualization | 4.17.42 | Virtualization & Containerization
|
Red Hat Enterprise Linux (RHEL) | 9.5 | Operating system |
Cloudera Manager | 7.13.1 CHF4+ | Centralized cluster management |
Cloudera Base on Premises | 7.3.1.400 SP2 | Cloudera’s core data runtime |
Cloudera Data Services on Premises | 1.5.5 | Platform for Data Services |
Cloudera Base - OpenShift Cluster - 100+ Bare Metal Nodes
OpenShift Virtualization VMs created across the 100+ worker node cluster, with node affinity/anti-affinity assigned to run one VM on each bare-metal worker node.

Cloudera Data Services - OpenShift Cluster - 7 Bare Metal Nodes

Data Services Testing At Scale
End to End Functionality was validated across the major Data Services:
- CDW (Cloudera Data Warehouse) : Created Virtual Warehouses (Hive / Impala) and ran successful sample CRUD queries via the Hue interface.
- CDE (Cloudera Data Engineering) : Set up Virtual Clusters (Spark 3.5.1), added Hadoop Authentication with a Keytab, and executed multiple sample Spark jobs.
- CAI (Cloudera AI) : Successfully deployed the AI Workbench, configured Hadoop Authentication, and tested session creation with sample project templates, along with the deployment of Cloudera Agent Studio.
Enterprise Security
Implemented a full security suite, ensuring the cluster meets enterprise compliance and operational needs:
- Identity & Access : Leveraged FreeIPA (v4.12.2) as the Identity Provider for Kerberos and LDAP, securing principals and simplifying user management.
- Data Protection : Used Ranger for granular access control and AutoTLS (Self-signed) for encryption in transit.
- Gateways : Knox and Atlas were configured, with access secured either directly or via the Knox SSO gateway.
Real-World Proof: The Use Case
Executed a Bank Branch Performance Analytics workload to exercise the full data lifecycle - ingestion, processing, and serving. This involved:
- Ingesting data into HDFS/Ozone.
- Processing and transforming the data using CDE Spark jobs.
- Querying and reporting from data materialized in CDW (Hive/Impala).
- Visualizing the final metrics with a data visualization app.
Full Use Case Details
Together, Cloudera and Red Hat enable a future-ready on-premises analytics platform that combines the flexibility of Kubernetes with the power of enterprise data services. This proven, large-scale deployment shows how organisations can confidently modernise their data infrastructure while maintaining security, control, and operational excellence.