Options
- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Solved
Go to solution
Benchmark Cloudera, hortonworks and MapR
Explorer
Created ‎12-16-2015 02:04 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have to choose between cloudera, hortonworks and mapR.
And i don't know how can i test the performance between those distributions.
Any help?
Thanks in advance
1 ACCEPTED SOLUTION
Master Collaborator
Created ‎12-16-2015 03:23 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, I think that begins to narrow it down. I don't know that you're
going to find a big performance difference, since distributions will
generally ship the upstream project with only minimal modifications to
integrate it.
(That said, CDH does let you enable native acceleration for some
mathematical operations in Spark MLlib. I don't think other distros
enable this and ship the right libraries. It's possible that could
matter to your use case.)
I'd look at how recent the Spark distribution is. Cloudera ships Spark
1.5 in CDH 5.5; MapR is on 1.4 and Hortonworks on 1.3, with a beta
preview of 1.5 at the moment in both cases. We're already integrating
the nearly-released Spark 1.6 too.
Finally, if you're considering paying for support, I think it bears
evaluating how much each vendor invests in Spark. No investment means
no expertise and no real ability to fix your problems. At Cloudera, we
have a full-time team on Spark, including 4 committers (including me).
I think you'll find other vendors virtually non-existent in the Spark
community, but, go see for yourself.
going to find a big performance difference, since distributions will
generally ship the upstream project with only minimal modifications to
integrate it.
(That said, CDH does let you enable native acceleration for some
mathematical operations in Spark MLlib. I don't think other distros
enable this and ship the right libraries. It's possible that could
matter to your use case.)
I'd look at how recent the Spark distribution is. Cloudera ships Spark
1.5 in CDH 5.5; MapR is on 1.4 and Hortonworks on 1.3, with a beta
preview of 1.5 at the moment in both cases. We're already integrating
the nearly-released Spark 1.6 too.
Finally, if you're considering paying for support, I think it bears
evaluating how much each vendor invests in Spark. No investment means
no expertise and no real ability to fix your problems. At Cloudera, we
have a full-time team on Spark, including 4 committers (including me).
I think you'll find other vendors virtually non-existent in the Spark
community, but, go see for yourself.
4 REPLIES 4
Master Collaborator
Created ‎12-16-2015 02:46 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
First, you'd have to define what you're trying to "benchmark". I don't
think these distributions vary in speed; they include reasonably
different components around the core. That is, it's kind of like
choosing a car solely by its max RPM or something, even if that's
important to you.
think these distributions vary in speed; they include reasonably
different components around the core. That is, it's kind of like
choosing a car solely by its max RPM or something, even if that's
important to you.
Explorer
Created ‎12-16-2015 03:03 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for your reply,
Actually after choosing a distribution i have to work with spark and
extract data from social networks .
So should i just test algorithms with spark in each distribution?
Actually after choosing a distribution i have to work with spark and
extract data from social networks .
So should i just test algorithms with spark in each distribution?
Master Collaborator
Created ‎12-16-2015 03:23 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, I think that begins to narrow it down. I don't know that you're
going to find a big performance difference, since distributions will
generally ship the upstream project with only minimal modifications to
integrate it.
(That said, CDH does let you enable native acceleration for some
mathematical operations in Spark MLlib. I don't think other distros
enable this and ship the right libraries. It's possible that could
matter to your use case.)
I'd look at how recent the Spark distribution is. Cloudera ships Spark
1.5 in CDH 5.5; MapR is on 1.4 and Hortonworks on 1.3, with a beta
preview of 1.5 at the moment in both cases. We're already integrating
the nearly-released Spark 1.6 too.
Finally, if you're considering paying for support, I think it bears
evaluating how much each vendor invests in Spark. No investment means
no expertise and no real ability to fix your problems. At Cloudera, we
have a full-time team on Spark, including 4 committers (including me).
I think you'll find other vendors virtually non-existent in the Spark
community, but, go see for yourself.
going to find a big performance difference, since distributions will
generally ship the upstream project with only minimal modifications to
integrate it.
(That said, CDH does let you enable native acceleration for some
mathematical operations in Spark MLlib. I don't think other distros
enable this and ship the right libraries. It's possible that could
matter to your use case.)
I'd look at how recent the Spark distribution is. Cloudera ships Spark
1.5 in CDH 5.5; MapR is on 1.4 and Hortonworks on 1.3, with a beta
preview of 1.5 at the moment in both cases. We're already integrating
the nearly-released Spark 1.6 too.
Finally, if you're considering paying for support, I think it bears
evaluating how much each vendor invests in Spark. No investment means
no expertise and no real ability to fix your problems. At Cloudera, we
have a full-time team on Spark, including 4 committers (including me).
I think you'll find other vendors virtually non-existent in the Spark
community, but, go see for yourself.
Explorer
Created ‎12-17-2015 01:49 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
is it possible with Spark to handle big data cleansing ?
