Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Pros and cons of using EMR vs Cloudbreak for launching hadoop cluster on AWS

avatar
Expert Contributor

Hi all,

I am a newbie to HDP and cloudbreak. I want to move some of our onsite Hadoop clusters/jobs on AWS. Two solutions that I have came-across are Cloudbreak and EMR, however not sure which one to use.

I wanted to know which technology to use for launching hadoop jobs on AWS? Pros and cons of using either approach would be really helpful (interms of cost, ease of use, monitoring, metrics, latency etc). One apparent cost optimization feature that I am interested in : is to launch the cluster whenever a job or jobs needs to run, and kill the cluster/nodes whenever there are no more jobs to execute.

Thanks

Obaid

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Hi @Obaid Salikeen,

Pros:

  • Multiple cloud provider support (ypu can deploy clusters using the same interface to different providers)
  • You can use it even on private cloud e.g OpenStack
  • Cloudbreak and HDP is open source
  • Cloudbreak installs Ambari, what you can use to monitor or customise your cluster after deployment (e.g. add new services)
  • It comes with fully configured SaltStack what you can use to manage your VMs e.g apply security patches
  • More flexible since you can create your own Blueprint which can contains only those services what you need
  • Cloudbreak supports autoscaling based on metrics gathered from Ambari (e.g some of those metrics are very general e.g. disk space others are Hadoop specific e.g. pending YARN containers)

Cons:

  • need one more instance where Cloudbreak is running (of course one Cloudbreak can manage multiple clusters)
  • Cloudbreak is a cluster management tool and you cannot submit jobs through it. Something like steps in EMR is not supported

Disclaimer: I am an engineer working on Cloudbreak

Attila

View solution in original post

6 REPLIES 6

avatar

Hi @Obaid Salikeen, You may also consider using Hortonworks Data Cloud (currently in technical preview stage. See http://hortonworks.github.io/hdp-aws/.

avatar
Expert Contributor

Thanks @Dominika B,

Thanks for sharing the link, seems interesting.

So I have a very basic question: Amazon EMR lets you launch manage Hadoop and Spark clusters, so what would be the benefit of using Hortonworks cloud vs just using EMR?

Thanks

Obaid

avatar
Expert Contributor

Hi @Obaid Salikeen,

Pros:

  • Multiple cloud provider support (ypu can deploy clusters using the same interface to different providers)
  • You can use it even on private cloud e.g OpenStack
  • Cloudbreak and HDP is open source
  • Cloudbreak installs Ambari, what you can use to monitor or customise your cluster after deployment (e.g. add new services)
  • It comes with fully configured SaltStack what you can use to manage your VMs e.g apply security patches
  • More flexible since you can create your own Blueprint which can contains only those services what you need
  • Cloudbreak supports autoscaling based on metrics gathered from Ambari (e.g some of those metrics are very general e.g. disk space others are Hadoop specific e.g. pending YARN containers)

Cons:

  • need one more instance where Cloudbreak is running (of course one Cloudbreak can manage multiple clusters)
  • Cloudbreak is a cluster management tool and you cannot submit jobs through it. Something like steps in EMR is not supported

Disclaimer: I am an engineer working on Cloudbreak

Attila

avatar
Expert Contributor

Thanks a lot @Attila Kanto for a detailed response,

Let me ask another cost related question, which is an important factor for making a decision on which technology to use: How would you compare EMR vs Cloudbreak (or Hortonworks Data Cloud) in-terms of cost?

Obaid

avatar
Expert Contributor

Sorry, but I do not have such comparison.

Attila

avatar
Expert Contributor

sure, no problem