Options
- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Rising Star
Created on
11-14-2019
07:48 AM
- edited on
11-17-2019
11:14 PM
by
ask_bill_brooks
Introduction
Continuing my previous article on creating a CDP AWS environment, and a CDP data lake, this tutorial teaches you how to automate the creation of a simple data engineering data hub cluster. You'll notice that once a data lake is setup, launching data hub clusters is very easy!
The cluster generated has the following properties:
- Template: CDP 1.1 - Data Engineering: Apache Spark, Apache Hive, Apache Oozie
- Nodes:
- 1 Master m5.2xlarge
- 3 Workers m5.2xlarge
Here is the TL;DR: go to my github and run the scripts as instructed.
Automation scripts
Step 1: Create Data Hub Cluster
cdp_create_dh_de.sh <prefix>
Step 2: Verify periodically until cluster status is AVAILABLE
cdp_describe_dh_de.sh <prefix>
That's it!