- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Install CDH5 on EC2 without human interaction
Created on 09-01-2014 02:40 AM - edited 09-16-2022 02:06 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all!
At the company I work, we're currently using a 4 node Amazon EMR cluster together with S3 for all our data warehousing and analysis needs. The cluster gets spin-up each morning and torn down each evening automatically through a cron job running on another server, to save costs.
We're using Impala exstensively. Our data is copied each morning from S3 to HDFS after the cluster has been spun up.
I was looking at installing Hue to provide a nice interface for querying Impala. Then it occurred to me that it would probably be easier to move from EMR to EC2 and install CDH5 on there. Ideally we would use Cloudera Manager for monitoring the cluster while it's running.
The problem: is there a way to install CDH5, including Cloudera Manager, automatically on an EC2 cluster, without human interaction?
Created 09-02-2014 03:13 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
manually and save the master and worker node images as custom AMIs. Use
those AMIs every morning to create a new cluster, then tear it down. When
you want to update CDH, just do it once manually and save new AMIs
Gautam Gopalakrishnan
Created 09-01-2014 04:14 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
require a bit of programming from your side:
Read about Path B installation in this link:
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM5/latest/Cloudera-Manager-Installat...
Then you can use the Cloudera Manager API to add hosts, services and roles
to your cluster
http://cloudera.github.io/cm_api/
Gautam Gopalakrishnan
Created 09-02-2014 02:12 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
http://blog.cloudera.com/blog/2012/10/set-up-a-hadoophbase-cluster-on-ec2-in-about-an-hour/
Gautam Gopalakrishnan
Created 09-02-2014 02:18 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the swift answer! I have looked at the API and it seems you can't actually install packages through the API, right? Does that mean that all the packages for all the services I'd want to enable, should be installed beforehand on all nodes, before I add hosts, services and roles through the API?
Created 09-02-2014 02:42 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
could use a mixture of the AWS API (to provision the hosts), then use the
Cloudera Manager API to provision the cluster (using the parcel deployment)
Gautam Gopalakrishnan
Created 09-02-2014 02:49 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@GautamG wrote:
Yes the rpm/deb packages have to be installed already. Alternatively you
could use a mixture of the AWS API (to provision the hosts), then use the
Cloudera Manager API to provision the cluster (using the parcel deployment)
Does the CM API support distributing parcels? Or how would I go about that? I know how to provision EC2 instances using the Amazon AWS API, but now I'm kind off in the dark on how to install CM and CDH on those 🙂
Regarding the Whirr option: it doesn't support YARN on EC2 with CM yet, right?
Created on 09-02-2014 02:55 AM - edited 09-02-2014 02:56 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please refer to the Path B install link I provided earlier which explains how you can automate CM and CDH installation. Then refer to the CM API (http://cloudera.github.io/cm_api/apidocs/v7/rest.html) specifically the /clusters/{clusterName}/parcels mountpoints.
Gautam Gopalakrishnan
Created 09-02-2014 02:59 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/CDH5-Installation-Guide/c...
Gautam Gopalakrishnan
Created 09-02-2014 03:01 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That's interesting, because this page in the CDH5 documentation states:
Note: At present you can launch and run only an MapReduce cluster; YARN is not supported.
Created 09-02-2014 03:13 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
manually and save the master and worker node images as custom AMIs. Use
those AMIs every morning to create a new cluster, then tear it down. When
you want to update CDH, just do it once manually and save new AMIs
Gautam Gopalakrishnan
