Created on 10-09-201806:10 PM - edited 08-17-201906:11 AM
This article will demonstrate how to rapidly launch a Spark cluster on AWS via CloudBreak.
The prerequisites are documented here. Once you have a AWS account and credentials, launching a Spark cluster is simple.
CloudBreak is your command and control center UI for rapidly launching clusters on AWS, Az\ure, GCP, and on prem. Once the UI is up, add your AWS credentials
Select AWS as your cloud provider
Select the method for authentication.
Key or Role. I prefer role but both work well. Click on the help button and follow the directions on how to setup auth for either method.
Now that credentials have been setup, cluster creation may begin. Click on "Clusters" on top left and then click on "Create Cluster" on top right
Select Advanced on top left
Select Credential: Your AWS Credentials
Cluster Name: Name your cluster
Region: AWS Region
Platform Version: HDP 3.0
Cluster Type: To run data science and ETL workloads, select HDP 3.0 Data Science blueprint
Click Next
Choose Image Type: Select Base Image
Choose Image: Select Redhat from drop down list
Here options are presented to select AWS instance types. If doing this for the first time, the defaults are fine. Click Next
Select the VPC this cluster will be deployed to. If a VPC has not been pre-created, CloudBreak will create one. Click Next
Clusters launched on AWS can access data stored in s3. Instructions on enabling s3 access is here.
Recipes are actions performed on nodes before and/or after cluster install. If custom actions are not required, click next
Next option is to configure auth and metadata database. For those just beginning, click next.
Knox is highly recommended; however, if running for first time then disable it.
Select AWS security group (SG). If SG has not been pre-created CloudBreak will create one.
Lastly, enter a password for the admin user and ssh key. SSH key will be required if there is interest in ssh'ing into the nodes.
The cluster may take 5-15 minutes to deploy. Once the cluster is up the Ambari URL will be available. Enjoy!