Community Articles

Find and share helpful community-sourced technical articles.
avatar
Master Guru

92743-2018-10-09-13-03-58.jpg

This article will demonstrate how to rapidly launch a Spark cluster on AWS via CloudBreak.

The prerequisites are documented here. Once you have a AWS account and credentials, launching a Spark cluster is simple.

CloudBreak is your command and control center UI for rapidly launching clusters on AWS, Az\ure, GCP, and on prem. Once the UI is up, add your AWS credentials

92759-2018-10-09-10-59-07.jpg

  • Select AWS as your cloud provider

92760-2018-10-09-11-01-03.jpg

  • Select the method for authentication.
    • Key or Role. I prefer role but both work well. Click on the help button and follow the directions on how to setup auth for either method.

92761-2018-10-09-11-02-19.jpg

  • Now that credentials have been setup, cluster creation may begin. Click on "Clusters" on top left and then click on "Create Cluster" on top right

92762-2018-10-09-12-15-56.jpg

  • Select Advanced on top left
  • Select Credential: Your AWS Credentials
  • Cluster Name: Name your cluster
  • Region: AWS Region
  • Platform Version: HDP 3.0
  • Cluster Type: To run data science and ETL workloads, select HDP 3.0 Data Science blueprint
  • Click Next

92763-2018-10-09-12-34-09.jpg

  • Choose Image Type: Select Base Image
  • Choose Image: Select Redhat from drop down list

92764-2018-10-09-12-35-51.jpg

  • Here options are presented to select AWS instance types. If doing this for the first time, the defaults are fine. Click Next

92765-2018-10-09-12-21-23.jpg

  • Select the VPC this cluster will be deployed to. If a VPC has not been pre-created, CloudBreak will create one. Click Next

92766-2018-10-09-12-25-49.jpg

  • Clusters launched on AWS can access data stored in s3. Instructions on enabling s3 access is here.

92767-2018-10-09-12-45-24.jpg

  • Recipes are actions performed on nodes before and/or after cluster install. If custom actions are not required, click next

92768-2018-10-09-12-47-41.jpg

  • Next option is to configure auth and metadata database. For those just beginning, click next.
  • Knox is highly recommended; however, if running for first time then disable it.

92769-2018-10-09-12-26-58.jpg

  • Select AWS security group (SG). If SG has not been pre-created CloudBreak will create one.

92770-2018-10-09-12-29-24.jpg

  • Lastly, enter a password for the admin user and ssh key. SSH key will be required if there is interest in ssh'ing into the nodes.

92771-2018-10-09-12-30-43.jpg

The cluster may take 5-15 minutes to deploy. Once the cluster is up the Ambari URL will be available. Enjoy!

1,072 Views