Created on 07-21-2020 11:12 AM - edited 11-20-2020 07:22 AM
This article contains Questions & Answers on Cloudera Data Platform (CDP) - private or public cloud.
CLOUDERA DATA PLATFORM
Which clouds is CDP public supported on?
AWS, Azure and, soon, GCP.
What’s the difference between Cloudera’s product and cloud providers?
Cloudera Data Platform (CDP) is a Platform-as-a-Service (PaaS) that is cloud infrastructure agnostic and easily portable between multiple cloud providers including private solutions such as OpenShift. CDP is both hybrid and multi-cloud from the ground up which means one platform can serve all data lifecycle use cases, independent of location or cloud, with a unified security and governance model.
How do CDP experiences compare to solutions from other cloud service providers?
CDP has an SDX layer that stores all policies and metadata for security and governance. This preservation of state is the big differentiating factor, especially when running transient workloads and a variety of experiences. The SDX layer is present across the entire data lifecycle.
Is CDP a completely separate platform or is it merged with AWS?
CDP is a platform that can run in the public cloud, such as AWS, Azure, and, soon, GCP as well as a private cloud running on RedHat’s OpenShift.
Can CDP run in a Kubernetes environment? Can it be deployed both on-prem and in the cloud?
Yes, the CDP Experiences run on the Cloud provider’s Kubernetes offerings (e.g. EKS and AKS) but also on RedHat’s OpenShift in an on-prem world.
Does CDP support auto-scaling?
Yes. All our experiences support auto-scaling specific to workloads. For example, Cloudera Data Warehouse will auto-scale for concurrency as a lot of users run queries on the cluster so it’ll scale up to support that task, which is common with data warehouses.
Are the autoscaling headroom and the other configurations of CDP restricted per team?
CDP has multiple levels of privileges that correspond to the different abstractions inside CDP. Today the ability to allocate isolated resources and experiences in an environment requires the “Environment Admin” role; the ability to access these environments requires an “Environment User” grant for the particular environment. The ability to scale and tune resource usage (headroom, autoscale parameters) for individual experiences can be managed by folks granted the admin role for each particular service ( e.g. “data warehouse admins” for CDW, and “ML admins” for CML). We’ll also be adding finer-grained role-based access controls to CDP services and CRUD operations such as Data Hub Admin and Data Lake Admin.
Is CDP serverless?
The CDP Management console and control plane run as a service in a Cloudera account. It talks to your VPC and cloud account to provision machines for its SDX data lake cluster and for the workloads that run on it. Data Hubs use virtual machines and effectively provide a cluster-as-a-service. The cloud-optimized experiences such as CDW and CML give CDP the ability to control the resources provided for workloads and gives data users a serverless computing experience.
How to migrate data from CDH or HDP to CDP?
Use CDP Replication Manager (RM) to replicate all your data, metadata, and Sentry permissions or Apache Ranger policies into the cloud. That is, RM will automatically move all your workloads into the cloud.
You showed Kafka replication from a data center to the public cloud. Can I also set up replications going both ways? From the cloud to the data center?
Yes, you can!
Do you support moving data between the on-prem and cloud versions of CDP?
Yes, NiFi is one of the options for moving data back and forth between on-prem and cloud versions of the platform in a secure and resilient way.
Is CDP programmatic? Could I create a Spark cluster with an API?
Yes. CDP has a command-line interface (CLI) that can be used to create and destroy datahubs, and workloads, as well as interact with the control plane for user management and automation. This was designed to enable a modern CI/CD with “configuration as code”.
Can we have different versions of, say, Kafka in CDP?
Yes. Customers can create clusters (specifically using CDP Data Hub) with any version of the Cloudera Runtime. While there are constraints on versions depending on the version used for the Data Lake providing the authorization, audit, lineage and data catalog for the workload, we are working on supporting mixed workloads even within one environment.
What would be the performance impact of having data in S3 compared to in HDFS?
Our benchmarks indicate very similar performance characteristics between HDFS and cloud storage. We have implemented several caching improvements across the components to optimize for cloud storage usage.
Where is HDFS in the public cloud experiences?
In the public cloud, CDP uses native cloud storage (S3, ADLS) to store the data. While HDFS is used internally in the CDP Data Lake (SDX) cluster by Hadoop services, HDFS is not used to store end-user data in CDP.
Are there data encryption and decryption options in CDP?
Yes. CDP offers functionality for en-/decryption of data-at-rest as well as data-in-motion.
Do groups of ID brokers require direct access to an ID provider?
We’re moving to a federation model. Anything that is SAML 2.0 compliant you can federate your users to CDP using that model.
When will new experiences, such as the recently announced Cloudera Data Engineering on Public Cloud, be available on Private Cloud?
It is part of our roadmap to have complete parity in the experiences available across both CDP Public Cloud and CDP Private Cloud. More details on exact timelines will be coming soon to our customers.
What is the plan to enable Private Cloud on top of other Kubernetes platforms?
CDP is meant to be a cloud-agnostic platform, across both Public Cloud and Private Cloud. Our roadmap contains many improvements to the dependencies on the underlying Kubernetes platform, including supporting additional distributions.
How does CDP differ on Private Cloud compared to Public Cloud?
From an end-user perspective, the two are very similar and that’s intentional. Part of our hybrid value proposition relies on offering a consistent experience in both public and private cloud environments. The differences come in more at the platform and infrastructure admin level. More details can be found in our public docs.
Can I connect a single instance of Workload Manager to multiple environments or clusters?
Yes, you can connect multiple environments or clusters to Workload Manager
Does Workload Manager support all CDP deployments?
Yes, Workload Manager supports all CDP deployments.
Which engines does Workload Manager support?
WM supports all the key Cloudera engines including Apache Hive, MR, Impala and Spark.
How many visual types does CDP Data Visualization provide?
CDP Data Visualization offers 34 visual types out of the box, and the ability to add custom extensions as needed.
What is a visual? Is it a dashboard or an app?
Both. CDP Data Visualization enables intuitive drag-and-drop dashboarding and no-code custom application creation that can be published and shared everywhere.
See this blog on CDP Data Visualization for more info.