question Re: Benchmark Cloudera, hortonworks and MapR in Archives of Support Questions (Read Only)

Benchmark Cloudera, hortonworks and MapR

tsunami20 — Wed, 16 Dec 2015 10:04:28 GMT

Hi,

I have to choose between cloudera, hortonworks and mapR.

And i don't know how can i test the performance between those distributions.

Any help?

Thanks in advance

Re: Benchmark Cloudera, hortonworks and MapR

srowen — Wed, 16 Dec 2015 10:46:28 GMT

First, you'd have to define what you're trying to "benchmark". I don't
think these distributions vary in speed; they include reasonably
different components around the core. That is, it's kind of like
choosing a car solely by its max RPM or something, even if that's
important to you.

Re: Benchmark Cloudera, hortonworks and MapR

tsunami20 — Wed, 16 Dec 2015 11:03:28 GMT

Thank you for your reply,
Actually after choosing a distribution i have to work with spark and
extract data from social networks .
So should i just test algorithms with spark in each distribution?

Re: Benchmark Cloudera, hortonworks and MapR

srowen — Wed, 16 Dec 2015 11:23:28 GMT

Yes, I think that begins to narrow it down. I don't know that you're
going to find a big performance difference, since distributions will
generally ship the upstream project with only minimal modifications to
integrate it.

(That said, CDH does let you enable native acceleration for some
mathematical operations in Spark MLlib. I don't think other distros
enable this and ship the right libraries. It's possible that could
matter to your use case.)

I'd look at how recent the Spark distribution is. Cloudera ships Spark
1.5 in CDH 5.5; MapR is on 1.4 and Hortonworks on 1.3, with a beta
preview of 1.5 at the moment in both cases. We're already integrating
the nearly-released Spark 1.6 too.

Finally, if you're considering paying for support, I think it bears
evaluating how much each vendor invests in Spark. No investment means
no expertise and no real ability to fix your problems. At Cloudera, we
have a full-time team on Spark, including 4 committers (including me).
I think you'll find other vendors virtually non-existent in the Spark
community, but, go see for yourself.

Re: Benchmark Cloudera, hortonworks and MapR

tsunami20 — Thu, 17 Dec 2015 09:49:28 GMT

is it possible with Spark to handle big data cleansing ?