Community Articles

Find and share helpful community-sourced technical articles.
avatar

Hortonworks Data Cloud for AWS (HDCloud for AWS) allows you to create on-demand ephemeral Hadoop clusters on AWS.

In this tutorial, we will set up Hortonworks Data Cloud on AWS 1.16 (released in June 2017), including:

  1. Meeting the prerequisites
  2. Subscribing to HDCloud services on AWS Marketplace
  3. Launching the cloud controller using the simple template
  4. Exploring AWS resources created
  5. Accessing the cloud controller UI
  6. Creating a cluster
  7. Working with the cloud dashboard
  8. Opening additional ports
  9. Cleaning up to avoid further charges

This tutorial assumes no prior experience with AWS. Still, if you run into any issues, refer to the Troubleshooting documentation.

Meet the Prerequisites

1. Set up an AWS account: In order to launch HDCloud on AWS, you need to have an AWS account. You can set one up at https://aws.amazon.com/. Creating an AWS account is free, but you need to add a credit card that will be charged once you start running AWS services. Alternatively, you may want to contact your IT to find out if your company has an account to which you can be added.

2. Select an AWS region: Next, decide in which region you would like launch the cloud controller and clusters. The following regions are supported:

  • US East (N.Virginia)
  • US West (Oregon)
  • EU Central (Frankfurt)
  • EU West (Dublin)
  • Asia Pacific (Tokyo)

You may want to pick the region that is nearest to your location - unless you have other constraints. For example, if you have data in Amazon S3 that you will later want to access from your cluster, you will want your clusters to be in the same region as your Amazon S3 data.

3. Create an SSH key pair: Once you’ve decided which region to use, you need to create an SSH keypair in that region. To do that:

  1. Navigate to the Amazon EC2 console at https://console.aws.amazon.com/ec2/.
  2. Check the region listed in the top right corner to make sure that you are in the correct region.
  3. In the left pane, find NETWORK AND SECURITY and click on Key Pairs.
  4. Click on Create Key Pair to create a new key pair.
  5. Your private key file will be automatically downloaded onto your computer. Make sure to save it in a secure location. You will need it to SSH to the cluster nodes. You may want to change access settings for the file using chmod 400 my-key-pair.pem.

Now that you've met all the prerequisites, you can subscribe to HDCLoud services on AWS Marketplace. Let's get started!

Subscribe to HDCloud Services

In order to use the product, you need to subscribe to two AWS marketplace services. You can access them by searching the https://aws.amazon.com/marketplace/ or by clicking on these links:

16415-screen-shot-2017-06-15-at-124109-pm.png

For each of the services, you need to:

1. Open the listing,

2. Click CONTINUE.

3. Click ACCEPT SOFTWARE TERMS.

This will add these two services to Your Software. You are all set to launch the cloud controller!

Launch the Cloud Controller

1. Navigate to the Hortonworks Data Cloud - Controller Service listing page:

16419-screen-shot-2017-06-15-at-124035-pm.png

2. The only setting that you need to review and change is the Region, which needs to be the same as the region that you chose in the prerequisites.

3. Click on Launch with CloudFormation Console and you will be redirected to the Create stack form in the CloudFormation console.

4. On the Select Template page, your template link is already provided, so just click Next.

5. On the Specify Details page, provide the details required:

16421-screen-shot-2017-06-15-at-14824-pm.png

General Configuration

  • Stack name: If you want to, you can change it to a shorter name of your choice.
  • Controller Instance Type: I recommend that you keep the default. If you pick instance type that is not powerful enough, you will run into issues
  • Email Address and Admin Password: Provide a valid email address and create a password. make sure to remember or write down these credentials, as you will later use them to log in to the cloud controller UI.

Security Configuration

  • SSH KeyName: This is the SSH key pair that you created as a prerequisite. If you can’t see it as an option in the dropdown, check the top right corner to make sure that you are using the same region.
  • Remote Access: This should be a range of IP addresses that can reach the cloud controller. You can use this tool http://www.ipaddressguide.com/cidr#range to calculate a valid CIDR range that includes your public IP address. Or, if you are just playing around, you can enter “0.0.0.0/0”, which will allow access from all IP addresses. Never use “0.0.0.0/0” for a production cluster or a long-running cluster.

The parameters under SmartSense Configuration are optional. Enter your SmartSense ID and opt in to SmartSense telemetry if you would like to use flex support.

6. After you've entered all required values, click Next.

7. (Optional step) On the Options page, under Advanced, you have an option to change the setting for Rollback on Failure. By default, this is set to Yes, which means that all of the AWS resources will be deleted if launching the stack fails, and you will avoid being charged for the resources. You can change the setting to No if in case of a failure, you want to keep the resources for troubleshooting purposes.

8. Click Next.

9. Finally, on the Review page, review the information provided and check I acknowledge that AWS CloudFormation might create IAM resources, and then click CREATE.

10. Refresh the CloudFormation console. You will see the status of your stack as CREATE_IN_PROGRESS. If everything goes well, after about 15 minutes the status will change to CREATE_COMPLETE, at which point you will be able to proceed to the next step. Meanwhile, let's explore AWS dashboards.

Explore AWS Dashboards

1. While your cloud controller is being launched, you can click on the Events and Resources tabs to see what AWS resources are being launched on your behalf:

14342-simple-04.png

  • A new VPC, subnet, route table, and Internet gateway were created.
  • A new EC2 instance was created to run the cloud controller. Access rules were defined on the related security group.
  • New IAM roles were created.

2. Once the stack status changed to CREATE_COMPLETE, you can proceed to the next step. If the stack failed for some reason, refer to the Troubleshooting documentation.

Access the Cloud Controller UI

1. To access the cloud controller UI, select the stack that you launched earlier, click on Outputs, and click on the CloudURL:

14346-simple-06.png

2. Even though your browser will tell you that the connection is unsafe, proceed to the UI and log in with the credentials that you provided in the CloudFormation template.

14347-simple-06a.png

14348-simple-07.png

3. After logging in, you will see the dashboard:

14349-simple-08.png

Now that your cloud controller is up and running, you can create your first cluster.

Create a Cluster

1. On the dashboard, click on CREATE CLUSTER to display the form:

16422-create-cluster.png

2. The only parameters that you are required to enter are Cluster Name, password, and confirm password. All other fields are pre-populated and you can keep the defaults. Here is a brief explanation for each of the parameters:

  • Cluster Name: Enter a name for your cluster.
  • HDP Version: Select HDP version 2.5 or 2.6. For each version, a set of preconfigured cluster types is available. I'm going to keep the default version.
  • Cluster Type: Select from the configurations available for the HDP version that you selected. I'm going to keep the default cluster type.
  • Master Instance Type, Worker Instance Type, Compute Instance Type: I recommend that you keep the defaults. If you pick instance types that are not powerful enough, you will run into issues.
  • Worker Instance Count: This determines the number of worker nodes.
  • Compute Instance Count: This determines the number of compute nodes. If you check Use Spot Instances, spot instances will be used instead of on-demand instances.
  • SSH Key Name: Your SSH key name should be pre-populated.
  • Remote Access: Same as with the cloud controller, this must be a valid CIDR range that will allow you to connect to the cluster.
  • Cluster User: Enter credentials that you want to use for your cluster. These are different from the cloud controller credentials; you will use them to log in to the Ambari web UI.
  • Protected Gateway Access: I recommend that you keep the defaults. If you uncheck the two options that are pre-checked, you won’t be able to access Ambari web UI. Checking the third option will give you access to additional cluster UIs.

3. Optionally, in each of the sections you can click on SHOW ADVANCED OPTIONS to display additional options. For example:

  • In the advanced GENERAL CONFIGURATION section, you can specify the configuration properties using this JSON template:
[  {  "configuration-type" : {  "property-name" : "property-value",  "property-name2" : "property-value"  }  },  {  "configuration-type2" : {  "property-name" : "property-value"  }  }]
  • If you need to install additional software, you use the recipes option that allows you to upload "recipes", custom scripts that run pre- or -post cluster deployment.
  • By default, a cluster is created in the same VPC as the cloud controller, just a new subnet is created. You have an option to use a different VPC.

If you are interested in learning about these options, refer to the Create Cluster documentation.

4. Click on CREATE CLUSTER. You have an option to:

  • Receive an email notification when your cluster is ready
  • Save the cluster as a template
  • View CLI JSON that can be used for creating a cluster via HDCloud CLI 14351-simple-9a.png

5. Click on YES, CREATE A CLUSTER.

6. Now you will see a cluster tile appear on the dashboard:

16442-simple-10.png

7. Click on the tile to see the cluster details.

In the EVENT HISTORY log, you can see that a new stack is being launched in the CloudFormation console, then EC2 instances are started to run your cluster nodes, and an Ambari cluster is built. As you can see in the screenshot below, it took 15 minutes to build my 4-node cluster.

8. Once your cluster is ready, its status will change to RUNNING:

16443-simple-11.png

Congratulations! You've just created your first cluster!

Get Started with the Cloud Dashboard

Let’s explore a few shortcuts that you should be aware of when working with HDCloud.

1. Click on the icon to copy complete SSH information for a specific node:

16444-simple-12.png

If you are using a Mac, you can paste it into your terminal and - assuming that your private key is available on your computer - you should be able to access your cluster.

If you are using Windows and need to set up your SSH, refer to http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/putty.html.

2. Next, click on the Ambari Web link to open the Ambari Web UI in a browser:

16445-simple-14.png

3. Log in to Ambari web UI using the credentials that you specified when creating your cluster. Default user was `admin`, so unless you changed the default, this user should log you in.

4. Click on CLUSTER ACTIONS > Resize and resize your cluster by adding one node:

16447-simple-15.png

5. You can also explore the tabs where your cluster settings are available:

14357-simple-15a.png

More Functionality

Click on the menu icon to see other capabilities available in the cloud controller UI:

16448-simple-16.png

CLUSTERS: This is where you are right now.

Open Additional Ports

(Adding this on Ali Bajwa's request) See this post: https://community.hortonworks.com/articles/77290/how-to-open-additional-ports-on-ec2-security-group....

Clean Up

1. Once you don’t need your cluster, you can terminate it by clicking on CLUSTER ACTIONS > TERMINATE:

16449-simple-17.png

This will delete all the EC2 instances that were used to run cluster nodes.

2. After deleting the cluster, you can delete the cloud controller. From the CloudFormation console, delete the stack corresponding to the cloud controller:

14360-simple-18.png

If you try deleting the cloud controller before terminating all the clusters associated with it, you will run into errors.

To avoid unnecessary charges to your AWS account, always make sure that the stacks corresponding to the cluster and the cloud controller were successfully deleted in the CloudFormation console and that the EC2 instances running the cloud controller and cluster nodes were deleted in the EC2 console.

Troubleshooting

If you run into any issues, refer to the Troubleshooting documentation.

Next Steps

To learn more, refer to the HDCloud for AWS product documentation.

Related tutorials:

Feedback

Let us know if this was useful and how we can help you with HDCloud for AWS in the future. Free to leave a comment below with a suggestion for an HDCloud for AWS tutorial that you would like to see. Thanks!


simple-15a.png
6,196 Views
Comments
avatar

Updated for the latest HDCloud version 1.14.1. Check it out!

avatar

Updated for the latest HDCloud version 1.14.4. No major changes, just updated screenshots and links. Check it out!

avatar

Updated for the latest HDCloud version 1.16. Check it out!