Community Articles
Find and share helpful community-sourced technical articles
Super Guru


This article will demonstrate how to rapidly launch a Spark cluster on AWS via CloudBreak.

The prerequisites are documented here. Once you have a AWS account and credentials, launching a Spark cluster is simple.

CloudBreak is your command and control center UI for rapidly launching clusters on AWS, Az\ure, GCP, and on prem. Once the UI is up, add your AWS credentials


  • Select AWS as your cloud provider


  • Select the method for authentication.
    • Key or Role. I prefer role but both work well. Click on the help button and follow the directions on how to setup auth for either method.


  • Now that credentials have been setup, cluster creation may begin. Click on "Clusters" on top left and then click on "Create Cluster" on top right


  • Select Advanced on top left
  • Select Credential: Your AWS Credentials
  • Cluster Name: Name your cluster
  • Region: AWS Region
  • Platform Version: HDP 3.0
  • Cluster Type: To run data science and ETL workloads, select HDP 3.0 Data Science blueprint
  • Click Next


  • Choose Image Type: Select Base Image
  • Choose Image: Select Redhat from drop down list


  • Here options are presented to select AWS instance types. If doing this for the first time, the defaults are fine. Click Next


  • Select the VPC this cluster will be deployed to. If a VPC has not been pre-created, CloudBreak will create one. Click Next


  • Clusters launched on AWS can access data stored in s3. Instructions on enabling s3 access is here.


  • Recipes are actions performed on nodes before and/or after cluster install. If custom actions are not required, click next


  • Next option is to configure auth and metadata database. For those just beginning, click next.
  • Knox is highly recommended; however, if running for first time then disable it.


  • Select AWS security group (SG). If SG has not been pre-created CloudBreak will create one.


  • Lastly, enter a password for the admin user and ssh key. SSH key will be required if there is interest in ssh'ing into the nodes.


The cluster may take 5-15 minutes to deploy. Once the cluster is up the Ambari URL will be available. Enjoy!