Support Questions

Find answers, ask questions, and share your expertise

Spark MLBase (MLI and ML Optimizer) status

avatar

Does anyone know the status of project MLBase:

"Implementing and consuming Machine Learning at scale are difficult tasks. MLbase is a platform addressing both issues, and consists of three components -- MLlib, MLI, ML Optimizer."

It seems that project and related information are not being updated for last 2 years:

http://www.mlbase.org/

http://www.cs.berkeley.edu/~ameet/mlbase_website/mlbase_website/download.html

https://github.com/amplab/MLI

http://ampcamp.berkeley.edu/wp-content/uploads/2013/07/amp_camp_8_30_13-1.pdf

1 ACCEPTED SOLUTION

avatar

I don't know from personal involvement, but it may be that all useful parts of the MLBase project have been absorbed into Spark ML, and no one chose to continue MLBase as a separate project. The MLBase project page itself says that MLlib is just the Spark project's MLlib, see https://github.com/apache/spark/tree/master/mllib

The MLBase project page also says, "Many features in MLlib have been borrowed from ML Optimizer and MLI." That suggests that there was already a process of absorption happening in 2013, and perhaps after that there was insufficient motivation to continue developing ML Optimizer and MLI as separate components.

In support of this idea, it appears that https://github.com/amplab/MLI/tree/master/src/main/scala/ml is a subset of the contents of https://github.com/apache/spark/tree/master/mllib/src/main/scala/org/apache/spark/ml

In my brief effort I was not able to similarly track down remnants of the "ML Optimizer" code, but certainly there are optimizers throughout the Spark ML code, and they tend to be algorithm-specific, so there wouldn't be much motivation for grouping them into a discrete component.

Hope this helps.

View solution in original post

2 REPLIES 2

avatar

I don't know from personal involvement, but it may be that all useful parts of the MLBase project have been absorbed into Spark ML, and no one chose to continue MLBase as a separate project. The MLBase project page itself says that MLlib is just the Spark project's MLlib, see https://github.com/apache/spark/tree/master/mllib

The MLBase project page also says, "Many features in MLlib have been borrowed from ML Optimizer and MLI." That suggests that there was already a process of absorption happening in 2013, and perhaps after that there was insufficient motivation to continue developing ML Optimizer and MLI as separate components.

In support of this idea, it appears that https://github.com/amplab/MLI/tree/master/src/main/scala/ml is a subset of the contents of https://github.com/apache/spark/tree/master/mllib/src/main/scala/org/apache/spark/ml

In my brief effort I was not able to similarly track down remnants of the "ML Optimizer" code, but certainly there are optimizers throughout the Spark ML code, and they tend to be algorithm-specific, so there wouldn't be much motivation for grouping them into a discrete component.

Hope this helps.

avatar
New Contributor

As a complement to Matt Foley's answer: concerning MLOptimizer, I think they were either meaning generic optimization algorithms such as Gradient Descent, available in mllib.optimization package (see https://spark.apache.org/docs/2.3.0/mllib-optimization.html), or they were meaning ML algorithm hyper-parameter optimization. Hyper-parameter tuning using e.g. cross-validation and grid-search is available in the Spark ML tuning package (see https://spark.apache.org/docs/2.2.0/ml-tuning.html).

However, if they were meaning automatic hyper-parameter optimization using for example Bayesian optimization, then I would like to know more about it...