Community Articles

Find and share helpful community-sourced technical articles.
Labels (1)
Rising Star


Continuing my previous article on creating a CDP AWS environment, and a CDP data lake, this tutorial teaches you how to automate the creation of a simple data engineering data hub cluster. You'll notice that once a data lake is setup, launching data hub clusters is very easy!


The cluster generated has the following properties:

  • Template: CDP 1.1 - Data Engineering: Apache Spark, Apache Hive, Apache Oozie
  • Nodes:
    • 1 Master m5.2xlarge
    • 3 Workers m5.2xlarge


Here is the TL;DR: go to my github and run the scripts as instructed.


Automation scripts

Step 1: Create Data Hub Cluster <prefix> 


Step 2: Verify periodically until cluster status is AVAILABLE <prefix> 


That's it!

0 Kudos