Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Difference between Spark MLlib/ML and H20

avatar
Super Collaborator

Hi experts,

Just curious to know about the differences between Spark MLlib/ML and H2O in terms of implementation of algorithms, performance and usability and which one is better in what kinds of use-cases?

Thanks a lot in advance.

1 ACCEPTED SOLUTION

avatar
Master Guru

You will have to run your algorithms on your cluster with your data to get a reasonable performance analysis.

What language are you looking at?

The Python Spark interface is pretty clean.

http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science.html

H2O has a few more algorithms than Spark MLib.

https://spark.apache.org/docs/latest/ml-classification-regression.html

View solution in original post

2 REPLIES 2

avatar
Master Guru

You will have to run your algorithms on your cluster with your data to get a reasonable performance analysis.

What language are you looking at?

The Python Spark interface is pretty clean.

http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science.html

H2O has a few more algorithms than Spark MLib.

https://spark.apache.org/docs/latest/ml-classification-regression.html

avatar
Super Collaborator

Thanks @Timothy Spann for your answer. These links are really helpful. I used python for Spark MLlib so will use the same for H2O as well.