Support Questions

Find answers, ask questions, and share your expertise

Benchmark Cloudera, hortonworks and MapR

avatar
Explorer

Hi,

 

I have to choose between cloudera, hortonworks and mapR. 

And i don't know how can i test the performance between those distributions.

 

Any help?

Thanks in advance

1 ACCEPTED SOLUTION

avatar
Master Collaborator
Yes, I think that begins to narrow it down. I don't know that you're
going to find a big performance difference, since distributions will
generally ship the upstream project with only minimal modifications to
integrate it.

(That said, CDH does let you enable native acceleration for some
mathematical operations in Spark MLlib. I don't think other distros
enable this and ship the right libraries. It's possible that could
matter to your use case.)

I'd look at how recent the Spark distribution is. Cloudera ships Spark
1.5 in CDH 5.5; MapR is on 1.4 and Hortonworks on 1.3, with a beta
preview of 1.5 at the moment in both cases. We're already integrating
the nearly-released Spark 1.6 too.

Finally, if you're considering paying for support, I think it bears
evaluating how much each vendor invests in Spark. No investment means
no expertise and no real ability to fix your problems. At Cloudera, we
have a full-time team on Spark, including 4 committers (including me).
I think you'll find other vendors virtually non-existent in the Spark
community, but, go see for yourself.

View solution in original post

4 REPLIES 4

avatar
Master Collaborator
First, you'd have to define what you're trying to "benchmark". I don't
think these distributions vary in speed; they include reasonably
different components around the core. That is, it's kind of like
choosing a car solely by its max RPM or something, even if that's
important to you.

avatar
Explorer
Thank you for your reply,
Actually after choosing a distribution i have to work with spark and
extract data from social networks .
So should i just test algorithms with spark in each distribution?

avatar
Master Collaborator
Yes, I think that begins to narrow it down. I don't know that you're
going to find a big performance difference, since distributions will
generally ship the upstream project with only minimal modifications to
integrate it.

(That said, CDH does let you enable native acceleration for some
mathematical operations in Spark MLlib. I don't think other distros
enable this and ship the right libraries. It's possible that could
matter to your use case.)

I'd look at how recent the Spark distribution is. Cloudera ships Spark
1.5 in CDH 5.5; MapR is on 1.4 and Hortonworks on 1.3, with a beta
preview of 1.5 at the moment in both cases. We're already integrating
the nearly-released Spark 1.6 too.

Finally, if you're considering paying for support, I think it bears
evaluating how much each vendor invests in Spark. No investment means
no expertise and no real ability to fix your problems. At Cloudera, we
have a full-time team on Spark, including 4 committers (including me).
I think you'll find other vendors virtually non-existent in the Spark
community, but, go see for yourself.

avatar
Explorer
is it possible with Spark to handle big data cleansing ?