Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Benchmark Cloudera, hortonworks and MapR

Solved Go to solution
Highlighted

Benchmark Cloudera, hortonworks and MapR

Explorer

Hi,

 

I have to choose between cloudera, hortonworks and mapR. 

And i don't know how can i test the performance between those distributions.

 

Any help?

Thanks in advance

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Benchmark Cloudera, hortonworks and MapR

Master Collaborator
Yes, I think that begins to narrow it down. I don't know that you're
going to find a big performance difference, since distributions will
generally ship the upstream project with only minimal modifications to
integrate it.

(That said, CDH does let you enable native acceleration for some
mathematical operations in Spark MLlib. I don't think other distros
enable this and ship the right libraries. It's possible that could
matter to your use case.)

I'd look at how recent the Spark distribution is. Cloudera ships Spark
1.5 in CDH 5.5; MapR is on 1.4 and Hortonworks on 1.3, with a beta
preview of 1.5 at the moment in both cases. We're already integrating
the nearly-released Spark 1.6 too.

Finally, if you're considering paying for support, I think it bears
evaluating how much each vendor invests in Spark. No investment means
no expertise and no real ability to fix your problems. At Cloudera, we
have a full-time team on Spark, including 4 committers (including me).
I think you'll find other vendors virtually non-existent in the Spark
community, but, go see for yourself.

View solution in original post

4 REPLIES 4
Highlighted

Re: Benchmark Cloudera, hortonworks and MapR

Master Collaborator
First, you'd have to define what you're trying to "benchmark". I don't
think these distributions vary in speed; they include reasonably
different components around the core. That is, it's kind of like
choosing a car solely by its max RPM or something, even if that's
important to you.
Highlighted

Re: Benchmark Cloudera, hortonworks and MapR

Explorer
Thank you for your reply,
Actually after choosing a distribution i have to work with spark and
extract data from social networks .
So should i just test algorithms with spark in each distribution?
Highlighted

Re: Benchmark Cloudera, hortonworks and MapR

Master Collaborator
Yes, I think that begins to narrow it down. I don't know that you're
going to find a big performance difference, since distributions will
generally ship the upstream project with only minimal modifications to
integrate it.

(That said, CDH does let you enable native acceleration for some
mathematical operations in Spark MLlib. I don't think other distros
enable this and ship the right libraries. It's possible that could
matter to your use case.)

I'd look at how recent the Spark distribution is. Cloudera ships Spark
1.5 in CDH 5.5; MapR is on 1.4 and Hortonworks on 1.3, with a beta
preview of 1.5 at the moment in both cases. We're already integrating
the nearly-released Spark 1.6 too.

Finally, if you're considering paying for support, I think it bears
evaluating how much each vendor invests in Spark. No investment means
no expertise and no real ability to fix your problems. At Cloudera, we
have a full-time team on Spark, including 4 committers (including me).
I think you'll find other vendors virtually non-existent in the Spark
community, but, go see for yourself.

View solution in original post

Highlighted

Re: Benchmark Cloudera, hortonworks and MapR

Explorer
is it possible with Spark to handle big data cleansing ?
Don't have an account?
Coming from Hortonworks? Activate your account here