Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Scalable HDP cluster on AWS

avatar

I want to setup an elastic cluster using AWS EC2 and install HDP on it. How can i do it. What are the options available.

I dont want to use AWS EMR. Is it possible to bring up and down datnodes with HDP stack installed on it automatically.

Any suggestions would be great.

1 ACCEPTED SOLUTION

avatar
@ARUNKUMAR RAMASAMY

In hortonworks platform we have Cloudbreak. It is open source

http://sequenceiq.com/cloudbreak-docs/latest/

You can use it to launch Clusters on Amazon, Azure Google Cloud.

It needs a host to install the Cloudbreak software and then it will spin up the nodes for you.

One thing you have ti understand that if you have data in HDFS, it is not easy to bring down nodes. HDFS will kick off HDFS rebalance which will take time.

An elastic cluster will work well when you use a detached storage like Blob storage behind it.

Note scaling up is not an issue, it is scaling down that you will experience some rough time :).

View solution in original post

3 REPLIES 3

avatar
@ARUNKUMAR RAMASAMY

In hortonworks platform we have Cloudbreak. It is open source

http://sequenceiq.com/cloudbreak-docs/latest/

You can use it to launch Clusters on Amazon, Azure Google Cloud.

It needs a host to install the Cloudbreak software and then it will spin up the nodes for you.

One thing you have ti understand that if you have data in HDFS, it is not easy to bring down nodes. HDFS will kick off HDFS rebalance which will take time.

An elastic cluster will work well when you use a detached storage like Blob storage behind it.

Note scaling up is not an issue, it is scaling down that you will experience some rough time :).

avatar

Thanks @Shivaji. If i use s3 for storage , then it should be fine right. But my use case is we wont have the data on hdfs for a longer duration. it will be just for processing. Also we plan to have hbase also in the same cluster. we plan to have more of a static cluster for hbase. so say we will have abase cluster with 5 nodes and have hdfs/yarn/hbase on it and only the dfs and yarn on the elastic nodes. will it that be possible.

also is there a doc, tutorial or url where i can refer to set up an elastic cluster using cloudbreak and HDP

avatar

http://hortonworks.com/hadoop/cloudbreak/ - Check this video out.

If you use S3 you should be fine, except you will not get stellar performance. It will be slower than HDFS on local storage.

If you like the answer, you should hit "Accept" and give a vote :).