Community Articles

Find and share helpful community-sourced technical articles.
Labels (1)
avatar
Rising Star

Introduction

Continuing my previous article on creating a CDP AWS environment, and a CDP data lake, this tutorial teaches you how to automate the creation of a simple data engineering data hub cluster. You'll notice that once a data lake is setup, launching data hub clusters is very easy!

 

The cluster generated has the following properties:

  • Template: CDP 1.1 - Data Engineering: Apache Spark, Apache Hive, Apache Oozie
  • Nodes:
    • 1 Master m5.2xlarge
    • 3 Workers m5.2xlarge

 

Here is the TL;DR: go to my github and run the scripts as instructed.

 

Automation scripts

Step 1: Create Data Hub Cluster

 cdp_create_dh_de.sh <prefix> 

 

Step 2: Verify periodically until cluster status is AVAILABLE

cdp_describe_dh_de.sh <prefix> 

 

That's it!

1,072 Views
0 Kudos