Member since
01-07-2019
217
Posts
135
Kudos Received
18
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1930 | 12-09-2021 09:57 PM | |
1864 | 10-15-2018 06:19 PM | |
9240 | 10-10-2018 07:03 PM | |
4025 | 07-24-2018 06:14 PM | |
1478 | 07-06-2018 06:19 PM |
12-10-2016
01:19 AM
1 Kudo
DCloud for AWS 1.16 (released in June 2017) allows you to register a previously created RDS instance as a Hive or Druid metastore. In this tutorial, we will be:
Launching an RDS instance and creating a database on it. Registering a database running on an RDS instance as a Hive metastore. Note that you can use these instructions for creating a Druid metastore. Let's get started! Prerequisites As a prerequisite to this tutorial, you need to have an HDCloud cloud controller running on AWS. If you need help with the steps required to meet this prerequisite, refer to the following tutorial: How to set up Hortonworks Data Cloud (HDCloud) for AWS. Launch an RDS Instance 1. Navigate to the RDS Dashboard at https://console.aws.amazon.com/rds. 2. In the top right corner, select the region in which you want to create your DB instance. Let’s create the RDS instance in the same region in which you've launched the cloud controller. 3. In the RDS Dashboard navigation pane, click Instances, and then click Launch DB instance to launch the Launch DB Instance Wizard. 4. In Step 1: Select Engine, select the PostgreSQL Engine and click Select. 5. In Step 2: Production?, select Dev/Test and click Next Step. 6. In Step 3: Specify DB Details, enter:
For Instance Specifications, you can use values similar to those in the screenshot. Make sure to use DB Engine Version 9.5.4 or later. For Settings, come up with an identifier, a username, and a password for your instance. Click Next Step. 7. In Step 4: Configure Advanced Settings:
In the Network & Security section, select the VPC where the RDS instance should be started. Select the same VPC in which your cloud controller is running. On the right, in the Connection Information, make sure that the Inbound access on the security group is set to “0.0.0.0/0”. In the Database Options section, enter a Database Name. This field is not required, so it’s easy to miss it. If you miss it, you will have to create the database manually. 8. Click Launch DB Instance. 9. Click on View Your DB Instances to get redirected to the RDS Dashboard. Keep this page open, as you will need to copy the RDS information and provide it in the Hive metastore registration form. 10. When your RDS instance is ready, proceed to the next step. Congratulations! You've just launched an RDS instance and created a database on it. Let's register this database as a Hive metastore in your cloud controller. Register a Hive Metastore 1. Log in to the cloud controller UI. 2. From the navigation menu, select SHARED SERVICES: 3. The list of registered Hive Metastores is displayed. 4. Click REGISTER METASTORE and the registration form is displayed: 5. Enter the following parameters:
Name: Enter the name to use when registering this Metastore to the cloud controller. This is not the database name. HDP Version: Select the version of HDP that this Metastore can be used with. JDBC Connection: Select the database type (PostgreSQL) and enter the RDS Endpoint (HOST:PORT/DB_NAME). Authentication: Enter the RDS connection username and password. You can obtain these parameters from the RDS dashboard: 6. Click Test connection to validate and test the RDS connection information. 7. Once your settings are validated and working, click REGISTER HIVE METASTORE to save the metastore. The metastore will now show up in the list of available metastores when creating a cluster. Congratulations! You've just registered your RDS as a Hive metastore. Feedback Let us know if this was useful and how we can help you with HDCloud for AWS in the future. Free to leave a comment below with a suggestion for an HDCloud for AWS tutorial that you would like to see. Thanks!
... View more
12-09-2016
06:23 PM
yjiang This may be helpful https://community.hortonworks.com/questions/70277/how-to-create-users-on-hdc-admin-ui.html
... View more
11-30-2016
07:27 PM
@milind pandit Starting and stopping the cloud controller is not supported. See http://docs.hortonworks.com/HDPDocuments/HDCloudAWS/HDCloudAWS-1.8.0/bk_hdcloud-aws/content/delete-controller/index.html
... View more
11-30-2016
06:38 PM
To get started with the HDCloud for AWS general availability version, visit http://docs.hortonworks.com/HDPDocuments/HDCloudAWS/HDCloudAWS-1.8.0/bk_hdcloud-aws/content/index.html
... View more
11-19-2016
01:09 AM
3 Kudos
This is tutorial will help you get started accessing data stored on Amazon S3 from a cluster created through Hortonworks Data Cloud for AWS 1.16 (released in June 2017). The tutorial assumes no prior experience with AWS.
In this tutorial:
- We will use DistCp to copy sample data from S3 to HDFS and from HDFS to S3.
- We will be using fs shell commands.
- We will be using the Landsat 8 data that AWS makes available in the s3://landsat-pds in US West (Oregon) region.
- We will also create a new S3 bucket to which we will copy data from HDFS.
- In general, when specifying a path to S3, we will follow this required convention: `s3a://bucket-name/directory/`.
Let's get started!
Prerequisites
Before starting this tutorial, your cloud controller needs to be running, and you must have a cluster running on AWS.
To set up the cloud controller and cluster, refer to the following tutorial:
How to set up Hortonworks Data Cloud for AWS. Accessing HDFS in HDCloud for AWS
1. SSH to a cluster node.
You can copy the SSH information from the cloud controller UI:
2.In HDCloud clusters, after you SSH to a cluster node, the default user is cloudbreak. The cloudbreak user doesn’t have write access to HDFS, so let’s create a directory to which we will copy the data, and then let’s change the owner and permissions so that the cloudbreak user can write to the directory: sudo -u hdfs hdfs dfs -mkdir /user/cloudbreak
sudo -u hdfs hdfs dfs -chown cloudbreak /user/cloudbreak
sudo -u hdfs hdfs dfs -chmod 700 /user/cloudbreak
Now you will be able to copy data to the newly created directory. Copying from S3 to HDFS
We will copy the scene_list.gz file from a public S3 bucket called landsat-pds to HDFS:
1. First, let’s check if the scene_list.gz file that we are trying to copy exists in the S3 bucket:
hadoop fs -ls s3a://landsat-pds/scene_list.gz
2. You should see something similar to:
-rw-rw-rw- 1 cloudbreak 33410181 2016-11-18 17:16 s3a://landsat-pds/scene_list.gz
3. Now let's copy scene_list.gz to your current directory using the following command: hadoop distcp s3a://landsat-pds/scene_list.gz .
4. You should see something similar to:
________________________________________________________
[cloudbreak@ip-10-0-1-208 ~]$ hadoop distcp s3a://landsat-pds/scene_list.gz .
16/11/18 22:00:50 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, overwrite=false, skipCRC=false, blocking=true, numListstatusThreads=0, maxMaps=20, mapBandwidth=100, sslConfigurationFile='null', copyStrategy='uniformsize', preserveStatus=[], preserveRawXattrs=false, atomicWorkPath=null, logPath=null, sourceFileListing=null, sourcePaths=[s3a://landsat-pds/scene_list.gz], targetPath=, targetPathExists=true, filtersFile='null'}
16/11/18 22:00:51 INFO impl.TimelineClientImpl: Timeline service address: http://ip-10-0-1-208.ec2.internal:8188/ws/v1/timeline/
16/11/18 22:00:51 INFO client.RMProxy: Connecting to ResourceManager at ip-10-0-1-208.ec2.internal/10.0.1.208:8050
16/11/18 22:00:51 INFO client.AHSProxy: Connecting to Application History server at ip-10-0-1-208.ec2.internal/10.0.1.208:10200
16/11/18 22:00:53 INFO tools.SimpleCopyListing: Paths (files+dirs) cnt = 1; dirCnt = 0
16/11/18 22:00:53 INFO tools.SimpleCopyListing: Build file listing completed.
16/11/18 22:00:53 INFO tools.DistCp: Number of paths in the copy list: 1
16/11/18 22:00:53 INFO tools.DistCp: Number of paths in the copy list: 1
16/11/18 22:00:53 INFO impl.TimelineClientImpl: Timeline service address: http://ip-10-0-1-208.ec2.internal:8188/ws/v1/timeline/
16/11/18 22:00:53 INFO client.RMProxy: Connecting to ResourceManager at ip-10-0-1-208.ec2.internal/10.0.1.208:8050
16/11/18 22:00:53 INFO client.AHSProxy: Connecting to Application History server at ip-10-0-1-208.ec2.internal/10.0.1.208:10200
16/11/18 22:00:53 INFO mapreduce.JobSubmitter: number of splits:1
16/11/18 22:00:54 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1479498757313_0009
16/11/18 22:00:54 INFO impl.YarnClientImpl: Submitted application application_1479498757313_0009
16/11/18 22:00:54 INFO mapreduce.Job: The url to track the job: http://ip-10-0-1-208.ec2.internal:8088/proxy/application_1479498757313_0009/
16/11/18 22:00:54 INFO tools.DistCp: DistCp job-id: job_1479498757313_0009
16/11/18 22:00:54 INFO mapreduce.Job: Running job: job_1479498757313_0009
16/11/18 22:01:01 INFO mapreduce.Job: Job job_1479498757313_0009 running in uber mode : false
16/11/18 22:01:01 INFO mapreduce.Job: map 0% reduce 0%
16/11/18 22:01:11 INFO mapreduce.Job: map 100% reduce 0%
16/11/18 22:01:11 INFO mapreduce.Job: Job job_1479498757313_0009 completed successfully
16/11/18 22:01:11 INFO mapreduce.Job: Counters: 38
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=145318
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=349
HDFS: Number of bytes written=33410189
HDFS: Number of read operations=13
HDFS: Number of large read operations=0
HDFS: Number of write operations=4
S3A: Number of bytes read=33410181
S3A: Number of bytes written=0
S3A: Number of read operations=3
S3A: Number of large read operations=0
S3A: Number of write operations=0
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=8309
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=8309
Total vcore-milliseconds taken by all map tasks=8309
Total megabyte-milliseconds taken by all map tasks=8508416
Map-Reduce Framework
Map input records=1
Map output records=0
Input split bytes=121
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=54
CPU time spent (ms)=3520
Physical memory (bytes) snapshot=281440256
Virtual memory (bytes) snapshot=2137710592
Total committed heap usage (bytes)=351272960
File Input Format Counters
Bytes Read=228
File Output Format Counters
Bytes Written=8
org.apache.hadoop.tools.mapred.CopyMapper$Counter
BYTESCOPIED=33410181
BYTESEXPECTED=33410181
COPY=1
[cloudbreak@ip-10-0-1-208 ~]$
________________________________________________________
5. Now let’s check if the file that we copied is present in the cloudbreak directory: hadoop fs -ls
6. You should see something similar to:
-rw-r--r-- 3 cloudbreak hdfs 33410181 2016-11-18 21:30 scene_list.gz
Congratulations! You’ve successfully copied the file from an S3 bucket to HDFS! Creating an S3 Bucket In this step, we will copy the scene_list.gz file from the cloudbreak directory to an S3 bucket. But before that, we need to create a new S3 bucket.
1. In your browser, navigate to the S3 Dashboard
https://console.aws.amazon.com/s3/home.
2. Click on
Create Bucket and create a bucket:
For example, here I am creating a bucket called “domitest”. Since my cluster and source data are in the Oregon region, I am creating this bucket in that region.
3. Next, navigate to the bucket, and create a folder:
For example, here I am creating a folder called “demo”.
4. Now, from our cluster node, let’s check if the bucket and folder that we just created exist: hadoop fs -ls s3a://domitest/
5. You should see something similar to:
Found 1 items
drwxrwxrwx - cloudbreak 0 2016-11-18 22:17 s3a://domitest/demo Congratulations! You’ve successfully created an Amazon S3 bucket. Copying from HDFS to S3
1. Now let’s copy the scene_list.gz file from HDFS to this newly created bucket: hadoop distcp /user/cloudbreak/scene_list.gz s3a://domitest/demo
2. You should see something similar to:
______________________
[cloudbreak@ip-10-0-1-208 ~]$ hadoop distcp /user/cloudbreak/scene_list.gz s3a://domitest/demo
16/11/18 22:20:32 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, overwrite=false, skipCRC=false, blocking=true, numListstatusThreads=0, maxMaps=20, mapBandwidth=100, sslConfigurationFile='null', copyStrategy='uniformsize', preserveStatus=[], preserveRawXattrs=false, atomicWorkPath=null, logPath=null, sourceFileListing=null, sourcePaths=[/user/cloudbreak/scene_list.gz], targetPath=s3a://domitest/demo, targetPathExists=true, filtersFile='null'}
16/11/18 22:20:33 INFO impl.TimelineClientImpl: Timeline service address: http://ip-10-0-1-208.ec2.internal:8188/ws/v1/timeline/
16/11/18 22:20:33 INFO client.RMProxy: Connecting to ResourceManager at ip-10-0-1-208.ec2.internal/10.0.1.208:8050
16/11/18 22:20:33 INFO client.AHSProxy: Connecting to Application History server at ip-10-0-1-208.ec2.internal/10.0.1.208:10200
16/11/18 22:20:34 INFO tools.SimpleCopyListing: Paths (files+dirs) cnt = 1; dirCnt = 0
16/11/18 22:20:34 INFO tools.SimpleCopyListing: Build file listing completed.
16/11/18 22:20:34 INFO tools.DistCp: Number of paths in the copy list: 1
16/11/18 22:20:34 INFO tools.DistCp: Number of paths in the copy list: 1
16/11/18 22:20:34 INFO impl.TimelineClientImpl: Timeline service address: http://ip-10-0-1-208.ec2.internal:8188/ws/v1/timeline/
16/11/18 22:20:34 INFO client.RMProxy: Connecting to ResourceManager at ip-10-0-1-208.ec2.internal/10.0.1.208:8050
16/11/18 22:20:34 INFO client.AHSProxy: Connecting to Application History server at ip-10-0-1-208.ec2.internal/10.0.1.208:10200
16/11/18 22:20:34 INFO mapreduce.JobSubmitter: number of splits:1
16/11/18 22:20:35 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1479498757313_0010
16/11/18 22:20:35 INFO impl.YarnClientImpl: Submitted application application_1479498757313_0010
16/11/18 22:20:35 INFO mapreduce.Job: The url to track the job: http://ip-10-0-1-208.ec2.internal:8088/proxy/application_1479498757313_0010/
16/11/18 22:20:35 INFO tools.DistCp: DistCp job-id: job_1479498757313_0010
16/11/18 22:20:35 INFO mapreduce.Job: Running job: job_1479498757313_0010
16/11/18 22:20:42 INFO mapreduce.Job: Job job_1479498757313_0010 running in uber mode : false
16/11/18 22:20:42 INFO mapreduce.Job: map 0% reduce 0%
16/11/18 22:20:53 INFO mapreduce.Job: map 100% reduce 0%
16/11/18 22:21:01 INFO mapreduce.Job: Job job_1479498757313_0010 completed successfully
16/11/18 22:21:01 INFO mapreduce.Job: Counters: 38
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=145251
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=33410572
HDFS: Number of bytes written=8
HDFS: Number of read operations=10
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
S3A: Number of bytes read=0
S3A: Number of bytes written=33410181
S3A: Number of read operations=14
S3A: Number of large read operations=0
S3A: Number of write operations=4098
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=14695
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=14695
Total vcore-milliseconds taken by all map tasks=14695
Total megabyte-milliseconds taken by all map tasks=15047680
Map-Reduce Framework
Map input records=1
Map output records=0
Input split bytes=122
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=57
CPU time spent (ms)=4860
Physical memory (bytes) snapshot=280420352
Virtual memory (bytes) snapshot=2136977408
Total committed heap usage (bytes)=350748672
File Input Format Counters
Bytes Read=269
File Output Format Counters
Bytes Written=8
org.apache.hadoop.tools.mapred.CopyMapper$Counter
BYTESCOPIED=33410181
BYTESEXPECTED=33410181
COPY=1
______________________
3. Next, let’s check if the file that we copied is present in the cloudbrek directory: hadoop fs -ls s3a://domitest/demo
4. You should see something similar to:
Found 1 items
-rw-rw-rw- 1 cloudbreak 33410181 2016-11-18 22:20 s3a://domitest/demo/scene_list.gz
5. You will also see the file on the S3 Dashboard:
Congratulations! You’ve successfully copied the file from HDFS to the S3 bucket! Next Steps
1. Try creating another bucket. Using similar syntax, you can try copying files between two S3 buckets that you created.
2. If you want to copy more files, try adding -D fs.s3a.fast.upload=true and see how this accelerates the transfer. Click here for more information.
3. Try running more hadoop fs commands listed here. 4. Learn more about the landat-pds bucket at https://pages.awscloud.com/public-data-sets-landsat.html. Cleaning Up
Any files stored on S3 or in HDFS add to your charges, so it’s good to get into the habit of getting rid of the files.
1. To delete the scene_list.gz file from HFDS, run: hadoop fs -rm -skipTrash /user/cloudbreak/scene_list.gz
2. To delete the scene_list.gz file from the S3 bucket, run: hadoop fs -rm -skipTrash s3a://domitest/demo/scene_list.gz
Or, you can delete it from the S3 Dashboard. More Resources
Visit Cloud Data Access documentation for more information on working with Amazon S3 buckets.
... View more
11-17-2016
12:45 AM
2 Kudos
In this tutorial, we will set up the Hortonworks Data Cloud for AWS 1.16 (released in June 2017) using the advanced template, which requires you to configure a VPC and an RDS instance and database prior to launching the cloud controller. We will go through the following steps:
Subscribing to HDCloud services Launching an RDS instance Launching the cloud controller using the advanced template Accessing the cloud controller UI This tutorial assumes no prior experience with AWS. If you encounter errors while performing the steps, refer to the Troubleshooting documentation. Let's get started! Prerequisites The prerequisites for this tutorial are: 1. You have an AWS account. 2. You know in which AWS region you want to launch. 3. You have an SSH key pair in the selected region. If you need help with the steps required to meet these prerequisites, refer to the following post: How to set up Hortonworks Data Cloud (HDCloud) for AWS Subscribe to HDCloud Services In order to use the product, you need to subscribe to two AWS marketplace services. You can access them by searching the https://aws.amazon.com/marketplace/ or by clicking on the following links:
Hortonworks Data Cloud - Controller Service Hortonworks Data Cloud - HDP Services For each of the services, you need to: 1. Open the listing. 2. Click CONTINUE. 3. Click ACCEPT SOFTWARE TERMS. This will add these two services to Your Software. Now you can launch the cloud controller. Launch an RDS Instance As the advanced template requires you to enter information related to an existing RDS instance, you need to create a PostgreSQL RDS instance and a database prior to launching the cloud controller. Only PostreSQL RDS instance type is supported. 1. Navigate to the RDS Dashboard at https://console.aws.amazon.com/rds. 2. In the top right corner, select the region in which you want to create your DB instance. For simplicity, let’s create the RDS instance in the same region in which you will later launch the cloud controller. 3. In the RDS Dashboard navigation pane, click Instances, and then click Launch DB instance to launch the Launch DB Instance Wizard. 4. In Step 1: Select Engine, select the PostgreSQL Engine and click Select. 5. In Step 2: Production?, select Dev/Test and click Next Step. 6. In Step 3: Specify DB Details, enter:
For Instance Specifications, you can use values similar to those in the screenshots. Make sure to use DB Engine Version 9.5.4 or later. For Settings, come up with an identifier, a username, and a password for your instance. Click Next Step. 7. In Step 4: Configure Advanced Settings:
In the Network & Security section, select the VPC where the RDS instance should be started. I am using the default VPC. On the right, in the Connection Information, make sure that the Inbound access on the security group is set to “0.0.0.0/0”. You can change this setting later, but in this stage, the RDS instance must be accessible to the cloud controller that we will create in the next step. In the Database Options section, enter a Database Name. This field is not required, so it’s easy to miss it. If you miss it, you will have to create the database manually. 8. Click Launch DB Instance. 9. Click on View Your DB Instances to get redirected to the RDS Dashboard. Keep this page open, as you will need to copy the RDS information and provide it in the CloudFormation template. While your RDS database is being created, you can get started with the next step, which is launching the cloud controller using the advanced template. Launch Cloud Controller (Advanced Template) 1. Navigate to the Hortonworks Data Cloud - Controller Service listing page: 2. The only settings that you’ll want to change are:
The Region, which needs to be the same as the region that you chose in the prerequisites and where you created the RDS. The Deployment Options, which needs to be set to ADVANCED. 3. Click Launch with CloudFormation Console and you will be redirected to the Create stack form in the CloudFormation console. 4. On the Select Template page, your template link is already provided, so just click Next. 5. On the Specify Details page, provide the details required: General Configuration
Stack name: You can change this if you want to. Controller Instance Type: I recommend that you keep the default. If you pick instance type that is not powerful enough, you will run into issues. Email Address and Admin Password: You will use these credentials to log in to the cloud controller UI. Security Configuration
SSH KeyName: This is your SSH key pair. If you can’t see it in the form, make sure that you are using the same region. Remote Access: This should be a range of IP addresses that can reach the cloud controller. If you are just playing around, you can enter “0.0.0.0/0” which will allow access to all; Or you can use this tool http://www.ipaddressguide.com/cidr#range to calculate a valid CIDR range that includes your public IP address. The parameters under SmartSense Configuration are optional. Enter your SmartSense ID and opt in to SmartSense telemetry if you would like to use flex support. In order to obtain the remaining parameters, refer to your RDS Dashboard: Network Configuration
VPC ID: For simplicity, let’s use the same default VPC as the one used for the RDS instance. Subnet ID: Make sure to select a subnet that belongs to the chosen VPC. RDS Configuration
RDS Endpoint:You can copy this from the RDS Dashboard. RDS Username: You can copy this from the RDS Dashboard (Username). RDS Password: This is the password that you chose when creating your RDS instance. Database Name: You can copy this from the RDS Dashboard (DB Name). If the DB Name value is blank, you will have to create the database manually. When done, click Next. 6. On the Options page, under Advanced, you have an option to change the setting for Rollback on Failure. By default, this is set to “Yes”, which means that all of the AWS resources will be deleted if launching the stack fails, and you will avoid being charged for the resources. You can change the setting to “No” if in case of a failure, you want to keep the resources for troubleshooting purposes. Click Next. 7. On the Review page, check I acknowledge that AWS CloudFormation might create IAM resources and then click CREATE. 8. Refresh the CloudFormation console. You will see the status of your stack as CREATE_IN_PROGRESS. If everything goes well, after about 15 minutes, the status will change to CREATE_COMPLETE, at which point you will be able to proceed to the next step. If you run into any issues, refer to the Troubleshooting documentation. Get Started with the Cloud Controller UI 1. To access the cloud controller UI, select the stack, click on Outputs, and click on the CloudURL: 2. Even though your browser will tell you that the connection is unsafe, proceed to the UI and log in with the credentials that you provided earlier. 3. After logging in, you will get to the dashboard: Now you can start creating clusters. Have fun! Next Steps To learn more, refer to the HDCloud for AWS product documentation. Let us know if this was useful and how we can help you in the future.
... View more
11-16-2016
12:46 AM
12 Kudos
Hortonworks Data Cloud for AWS (HDCloud for AWS) allows you to create on-demand ephemeral Hadoop clusters on AWS. In this tutorial, we will set up Hortonworks Data Cloud on AWS 1.16 (released in June 2017), including: Meeting the prerequisites Subscribing to HDCloud services on AWS Marketplace Launching the cloud controller using the simple template Exploring AWS resources created Accessing the cloud controller UI Creating a cluster Working with the cloud dashboard Opening additional ports Cleaning up to avoid further charges This tutorial assumes no prior experience with AWS. Still, if you run into any issues, refer to the Troubleshooting documentation. Meet the Prerequisites 1. Set up an AWS account: In order to launch HDCloud on AWS, you need to have an AWS account. You can set one up at https://aws.amazon.com/. Creating an AWS account is free, but you need to add a credit card that will be charged once you start running AWS services. Alternatively, you may want to contact your IT to find out if your company has an account to which you can be added. 2. Select an AWS region: Next, decide in which region you would like launch the cloud controller and clusters. The following regions are supported: US East (N.Virginia) US West (Oregon) EU Central (Frankfurt) EU West (Dublin) Asia Pacific (Tokyo) You may want to pick the region that is nearest to your location - unless you have other constraints. For example, if you have data in Amazon S3 that you will later want to access from your cluster, you will want your clusters to be in the same region as your Amazon S3 data. 3. Create an SSH key pair: Once you’ve decided which region to use, you need to create an SSH keypair in that region. To do that: Navigate to the Amazon EC2 console at https://console.aws.amazon.com/ec2/. Check the region listed in the top right corner to make sure that you are in the correct region. In the left pane, find NETWORK AND SECURITY and click on Key Pairs. Click on Create Key Pair to create a new key pair. Your private key file will be automatically downloaded onto your computer. Make sure to save it in a secure location. You will need it to SSH to the cluster nodes. You may want to change access settings for the file using chmod 400 my-key-pair.pem. Now that you've met all the prerequisites, you can subscribe to HDCLoud services on AWS Marketplace. Let's get started! Subscribe to HDCloud Services In order to use the product, you need to subscribe to two AWS marketplace services. You can access them by searching the https://aws.amazon.com/marketplace/ or by clicking on these links: Hortonworks Data Cloud - Controller Service (allows you to launch HDCLoud) Hortonworks Data Cloud - HDP Services (allows you to create clusters) For each of the services, you need to: 1. Open the listing, 2. Click CONTINUE. 3. Click ACCEPT SOFTWARE TERMS. This will add these two services to Your Software. You are all set to launch the cloud controller! Launch the Cloud Controller 1. Navigate to the Hortonworks Data Cloud - Controller Service listing page: 2. The only setting that you need to review and change is the Region, which needs to be the same as the region that you chose in the prerequisites. 3. Click on Launch with CloudFormation Console and you will be redirected to the Create stack form in the CloudFormation console. 4. On the Select Template page, your template link is already provided, so just click Next. 5. On the Specify Details page, provide the details required: General Configuration Stack name: If you want to, you can change it to a shorter name of your choice. Controller Instance Type: I recommend that you keep the default. If you pick instance type that is not powerful enough, you will run into issues Email Address and Admin Password: Provide a valid email address and create a password. make sure to remember or write down these credentials, as you will later use them to log in to the cloud controller UI. Security Configuration
SSH KeyName: This is the SSH key pair that you created as a prerequisite. If you can’t see it as an option in the dropdown, check the top right corner to make sure that you are using the same region. Remote Access: This should be a range of IP addresses that can reach the cloud controller. You can use this tool
http://www.ipaddressguide.com/cidr#range to calculate a valid CIDR range that includes your public IP address. Or, if you are just playing around, you can enter “0.0.0.0/0”, which will allow access from all IP addresses. Never use “0.0.0.0/0” for a production cluster or a long-running cluster. The parameters under SmartSense Configuration are optional. Enter your SmartSense ID and opt in to SmartSense telemetry if you would like to use flex support. 6. After you've entered all required values, click Next. 7. (Optional step) On the Options page, under Advanced, you have an option to change the setting for Rollback on Failure. By default, this is set to Yes, which means that all of the AWS resources will be deleted if launching the stack fails, and you will avoid being charged for the resources. You can change the setting to No if in case of a failure, you want to keep the resources for troubleshooting purposes. 8. Click Next. 9. Finally, on the Review page, review the information provided and check I acknowledge that AWS CloudFormation might create IAM resources, and then click CREATE. 10. Refresh the CloudFormation console. You will see the status of your stack as CREATE_IN_PROGRESS. If everything goes well, after about 15 minutes the status will change to CREATE_COMPLETE, at which point you will be able to proceed to the next step. Meanwhile, let's explore AWS dashboards. Explore AWS Dashboards 1. While your cloud controller is being launched, you can click on the Events and Resources tabs to see what AWS resources are being launched on your behalf: A new VPC, subnet, route table, and Internet gateway were created. A new EC2 instance was created to run the cloud controller. Access rules were defined on the related security group. New IAM roles were created. 2. Once the stack status changed to CREATE_COMPLETE, you can proceed to the next step. If the stack failed for some reason, refer to the Troubleshooting documentation. Access the Cloud Controller UI 1. To access the cloud controller UI, select the stack that you launched earlier, click on Outputs, and click on the CloudURL: 2. Even though your browser will tell you that the connection is unsafe, proceed to the UI and log in with the credentials that you provided in the CloudFormation template. 3. After logging in, you will see the dashboard: Now that your cloud controller is up and running, you can create your first cluster. Create a Cluster 1. On the dashboard, click on CREATE CLUSTER to display the form: 2. The only parameters that you are required to enter are Cluster Name, password, and confirm password. All other fields are pre-populated and you can keep the defaults. Here is a brief explanation for each of the parameters: Cluster Name: Enter a name for your cluster. HDP Version: Select HDP version 2.5 or 2.6. For each version, a set of preconfigured cluster types is available. I'm going to keep the default version. Cluster Type: Select from the configurations available for the HDP version that you selected. I'm going to keep the default cluster type. Master Instance Type, Worker Instance Type, Compute Instance Type: I recommend that you keep the defaults. If you pick instance types that are not powerful enough, you will run into issues. Worker Instance Count: This determines the number of worker nodes. Compute Instance Count: This determines the number of compute nodes. If you check Use Spot Instances, spot instances will be used instead of on-demand instances. SSH Key Name: Your SSH key name should be pre-populated. Remote Access: Same as with the cloud controller, this must be a valid CIDR range that will allow you to connect to the cluster. Cluster User: Enter credentials that you want to use for your cluster. These are different from the cloud controller credentials; you will use them to log in to the Ambari web UI. Protected Gateway Access: I recommend that you keep the defaults. If you uncheck the two options that are pre-checked, you won’t be able to access Ambari web UI. Checking the third option will give you access to additional cluster UIs. 3. Optionally, in each of the sections you can click on SHOW ADVANCED OPTIONS to display additional options. For example: In the advanced GENERAL CONFIGURATION section, you can specify the configuration properties using this JSON template: [ { "configuration-type" : { "property-name" : "property-value", "property-name2" : "property-value" } }, { "configuration-type2" : { "property-name" : "property-value" } }] If you need to install additional software, you use the recipes option that allows you to upload "recipes", custom scripts that run pre- or -post cluster deployment. By default, a cluster is created in the same VPC as the cloud controller, just a new subnet is created. You have an option to use a different VPC. If you are interested in learning about these options, refer to the Create Cluster documentation. 4. Click on CREATE CLUSTER. You have an option to: Receive an email notification when your cluster is ready Save the cluster as a template View CLI JSON that can be used for creating a cluster via HDCloud CLI 5. Click on YES, CREATE A CLUSTER. 6. Now you will see a cluster tile appear on the dashboard: 7. Click on the tile to see the cluster details. In the EVENT HISTORY log, you can see that a new stack is being launched in the CloudFormation console, then EC2 instances are started to run your cluster nodes, and an Ambari cluster is built. As you can see in the screenshot below, it took 15 minutes to build my 4-node cluster. 8. Once your cluster is ready, its status will change to RUNNING: Congratulations! You've just created your first cluster! Get Started with the Cloud Dashboard Let’s explore a few shortcuts that you should be aware of when working with HDCloud. 1. Click on the icon to copy complete SSH information for a specific node: If you are using a Mac, you can paste it into your terminal and - assuming that your private key is available on your computer - you should be able to access your cluster. If you are using Windows and need to set up your SSH, refer to http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/putty.html. 2. Next, click on the Ambari Web link to open the Ambari Web UI in a browser: 3. Log in to Ambari web UI using the credentials that you specified when creating your cluster. Default user was `admin`, so unless you changed the default, this user should log you in. 4. Click on CLUSTER ACTIONS > Resize and resize your cluster by adding one node: 5. You can also explore the tabs where your cluster settings are available: More Functionality Click on the menu icon to see other capabilities available in the cloud controller UI: CLUSTERS: This is where you are right now. CLUSTER TEMPLATES: When creating a cluster, you have an option to save a cluster template. The CLUSTER TEMPLATES page allows you to manage saved cluster templates. SHARED SERVICES: This page allows you to create and manage Hive and Druid metastores. To get started, refer to How to set up an RDS instance and database and register it as a metastore. HISTORY: This page allows you to generate a report including all clusters associated with this cloud controller. DOWNLOAD CLI: Allows you to download the HDC CLI that you can use to create more clusters. To get started, refer to How to use CLI to create and manage cluster on HDCloud for AWS. Open Additional Ports (Adding this on Ali Bajwa's request) See this post: https://community.hortonworks.com/articles/77290/how-to-open-additional-ports-on-ec2-security-group.html Clean Up 1. Once you don’t need your cluster, you can terminate it by clicking on CLUSTER ACTIONS > TERMINATE: This will delete all the EC2 instances that were used to run cluster nodes. 2. After deleting the cluster, you can delete the cloud controller. From the CloudFormation console, delete the stack corresponding to the cloud controller: If you try deleting the cloud controller before terminating all the clusters associated with it, you will run into errors. To avoid unnecessary charges to your AWS account, always make sure that the stacks corresponding to the cluster and the cloud controller were successfully deleted in the CloudFormation console and that the EC2 instances running the cloud controller and cluster nodes were deleted in the EC2 console. Troubleshooting If you run into any issues, refer to the Troubleshooting documentation. Next Steps To learn more, refer to the HDCloud for AWS product documentation. Related tutorials:
How to set up Hortonworks Data Cloud for AWS using the advanced template How to launch Hortonworks Data Cloud cloud controller via AWS CLI How to set up an RDS instance and database and register it as a metastore How to use CLI to create and manage cluster on HDCloud for AWS How to copy between a cluster (HDFS) and S3 buckets Feedback Let us know if this was useful and how we can help you with HDCloud for AWS in the future. Free to leave a comment below with a suggestion for an HDCloud for AWS tutorial that you would like to see. Thanks!
... View more
11-15-2016
09:47 PM
Update: HDCloud GA is here! Check out the official product docs to get started with HDCloud for AWS. Or see this post.
... View more
10-24-2016
08:29 PM
Update: TP #1.7 is the latest version as of today. Same link.
... View more
10-24-2016
08:19 PM
2 Kudos
Hi @Obaid Salikeen, You may also consider using Hortonworks Data Cloud (currently in technical preview stage. See http://hortonworks.github.io/hdp-aws/.
... View more