Member since
03-17-2024
1
Post
2
Kudos Received
0
Solutions
01-08-2026
03:13 AM
2 Kudos
As enterprises modernize their on-premises data infrastructure, the combination of Cloudera Data Platform and Red Hat OpenShift Virtualization (OCP-V) is emerging as a transformative architecture. This joint engineering work demonstrates: Production-grade deployment of Cloudera on-premises Base (a.k.a Cloudera Private Cloud Base) on OCP-V Successful validation at 100+ OCP-V Virtualized nodes End-to-end integration with Data Services (CDW, CDE, CAI) Enterprise-grade security with TLS, Kerberos, LDAP, Ranger, and Knox This document brings together the complete deployment workflow, architecture overview, a real-world use case deployed for functional validation, along with validation results and key outcomes - serving as a practical blueprint for field, engineering, and customer teams. Why OpenShift Virtualization for Cloudera? This converged architecture encompasses the following: Unified, Cloud-Native Platform: OpenShift Virtualization enables Cloudera Base VMs and containerized workloads to run together on a single Kubernetes platform, eliminating split operational models and simplifying lifecycle management. Elastic, Automated, and Secure Operations: Dynamic resource scaling, rapid VM provisioning, built-in high availability, strong security controls, and GitOps/Ansible-driven automation address the rigidity and operational overhead of legacy VM and bare-metal environments. Proven at Scale: This architecture is validated through large-scale testing, integrating a 100+ node OpenShift Virtualization–based Cloudera Base cluster with a dedicated 7-node bare-metal Data Services cluster, demonstrating robustness and real-world readiness. The Technical Stack at a Glance Red Hat and Cloudera components: Component Key Version Role Red Hat OpenShift Container Platform, Red Hat OpenShift Virtualization 4.17.42 Virtualization & Containerization Red Hat Enterprise Linux (RHEL) 9.5 Operating system Cloudera Manager 7.13.1 CHF4+ Centralized cluster management Cloudera Base on Premises 7.3.1.400 SP2 Cloudera’s core data runtime Cloudera Data Services on Premises 1.5.5 Platform for Data Services Cloudera Base - OpenShift Cluster - 100+ Bare Metal Nodes OpenShift Virtualization VMs created across the 100+ worker node cluster, with node affinity/anti-affinity assigned to run one VM on each bare-metal worker node. Cloudera Data Services - OpenShift Cluster - 7 Bare Metal Nodes Data Services Testing At Scale End to End Functionality was validated across the major Data Services: CDW (Cloudera Data Warehouse) : Created Virtual Warehouses (Hive / Impala) and ran successful sample CRUD queries via the Hue interface. CDE (Cloudera Data Engineering) : Set up Virtual Clusters (Spark 3.5.1), added Hadoop Authentication with a Keytab, and executed multiple sample Spark jobs. CAI (Cloudera AI) : Successfully deployed the AI Workbench, configured Hadoop Authentication, and tested session creation with sample project templates, along with the deployment of Cloudera Agent Studio. Enterprise Security Implemented a full security suite, ensuring the cluster meets enterprise compliance and operational needs: Identity & Access : Leveraged FreeIPA (v4.12.2) as the Identity Provider for Kerberos and LDAP, securing principals and simplifying user management. Data Protection : Used Ranger for granular access control and AutoTLS (Self-signed) for encryption in transit. Gateways : Knox and Atlas were configured, with access secured either directly or via the Knox SSO gateway. Real-World Proof: The Use Case Executed a Bank Branch Performance Analytics workload to exercise the full data lifecycle - ingestion, processing, and serving. This involved: Ingesting data into HDFS/Ozone. Processing and transforming the data using CDE Spark jobs. Querying and reporting from data materialized in CDW (Hive/Impala). Visualizing the final metrics with a data visualization app. Full Use Case Details Together, Cloudera and Red Hat enable a future-ready on-premises analytics platform that combines the flexibility of Kubernetes with the power of enterprise data services. This proven, large-scale deployment shows how organisations can confidently modernise their data infrastructure while maintaining security, control, and operational excellence.
... View more