Community Articles

pvidal · ‎11-13-2019

Introduction

Cloudera Data Platform (CDP) for public cloud has an amazing admin UI that drives you through a thorough wizards for setting up environment, data lakes, data hub clusters and experiences.

Details of AWS pre-requisites can be found in CDP official documentation, but why use the easy button when you can leverage AWS and CDP cli and do it the hard way?

My thoughts exactly.

Here is the TL;DR: go to my github and run the scripts as instructed.

AWS minimal requirements for CDP environment

Even through cli, a lot of the configuration of a environment can be automated, including:

Network (VPCs, routing, gateways, etc.)
Security Groups
Free IPA instance(s)

However, you will need to set this minimal set of elements to get an environment ready:

In AWS
- Public/Private keys (not automated/demonstrated here)
- S3 Bucket
- AWS Roles
- AWS policies
In CDP
- Credential (not automated/demonstrated here)

To better understand how roles, policies and bucket interact, you can refer to this diagram:

In this tutorial however, I'm not going to worry about additional roles but the dataake_admin_role and ranger_audit_role (the other ones are optional).

Automation scripts

Step 1: Pre-Requisites

AWS

Install and configure the AWS cli with your account: link

Create a public/private key pair: link

CDP

Create CDP credential for AWS: link

Install and configure CDP CLI: link

Local computer

Clone my github repository

git clone https://github.com/paulvid/cdp_create_env_aws.git

Step 2: Running the scripts

Create AWS S3 bucket:

aws_create_bucket.sh <base_dir> <prefix> <region>

Purge AWS policies and roles (optional):

aws_purge_roles_policies.sh <base_dir> <prefix>

Create AWS policies:

aws_create_policies.sh <base_dir> <prefix>

Create AWS roles:

aws_create_roles.sh <base_dir> <prefix> <bucket>

Create CDP environment:

cdp_create_env.sh <base_dir> <prefix> <credential> <region> <key>

Step 3: Verify periodically until environment status is AVAILABLE

cdp_describe_env.sh <prefix>

Conclusion

Obviously this is just getting us started.

I plan on publishing much more about creating data lakes, data hub clusters and much more. Stay tuned!

Cloudera Community

Community Articles

How to create a CDP environment in AWS with minimal requirements

Apache Impala

Cloudera Data Platform (CDP)

Introduction

AWS minimal requirements for CDP environment

Automation scripts

Step 1: Pre-Requisites

AWS

CDP

Local computer

Step 2: Running the scripts

Step 3: Verify periodically until environment status is AVAILABLE

Conclusion

CDP on AWS automation 101

Can we run CDP on ECS in AWS Environment

Faster Auto-scaling for Higher Computing Requireme...

Accessing AWS services using AWS Java SDK in Scala...

Apache Flume required to be run in CDP environment

External AWS Bucket Access in CDP Public Cloud

Accessing a Private CDP Public Cloud Environment w...

How to get AWS access keys via IDBroker in CDP?

Automation : Using CLI to create / import a datafl...

How to create a Centos7 CDP-DC Base VM for sandbox...