- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Created on 11-13-2019 04:21 PM
Introduction
Cloudera Data Platform (CDP) for public cloud has an amazing admin UI that drives you through a thorough wizards for setting up environment, data lakes, data hub clusters and experiences.
Details of AWS pre-requisites can be found in CDP official documentation, but why use the easy button when you can leverage AWS and CDP cli and do it the hard way?
My thoughts exactly.
Here is the TL;DR: go to my github and run the scripts as instructed.
AWS minimal requirements for CDP environment
Even through cli, a lot of the configuration of a environment can be automated, including:
- Network (VPCs, routing, gateways, etc.)
- Security Groups
- Free IPA instance(s)
However, you will need to set this minimal set of elements to get an environment ready:
- In AWS
- Public/Private keys (not automated/demonstrated here)
- S3 Bucket
- AWS Roles
- AWS policies
- In CDP
- Credential (not automated/demonstrated here)
To better understand how roles, policies and bucket interact, you can refer to this diagram:
In this tutorial however, I'm not going to worry about additional roles but the dataake_admin_role and ranger_audit_role (the other ones are optional).
Automation scripts
Step 1: Pre-Requisites
AWS
Install and configure the AWS cli with your account: link
Create a public/private key pair: link
CDP
Create CDP credential for AWS: link
Install and configure CDP CLI: link
Local computer
Clone my github repository
git clone https://github.com/paulvid/cdp_create_env_aws.git
Step 2: Running the scripts
aws_create_bucket.sh <base_dir> <prefix> <region>
aws_purge_roles_policies.sh <base_dir> <prefix>
aws_create_policies.sh <base_dir> <prefix>
aws_create_roles.sh <base_dir> <prefix> <bucket>
cdp_create_env.sh <base_dir> <prefix> <credential> <region> <key>
Step 3: Verify periodically until environment status is AVAILABLE
cdp_describe_env.sh <prefix>
Conclusion
Obviously this is just getting us started.
I plan on publishing much more about creating data lakes, data hub clusters and much more. Stay tuned!