question Re: Spark MLBase (MLI and ML Optimizer) status in Archives of Support Questions (Read Only)

Spark MLBase (MLI and ML Optimizer) status

gbraccialli3 — Tue, 21 Apr 2026 13:30:51 GMT

Does anyone know the status of project MLBase:

"Implementing and consuming Machine Learning at scale are difficult tasks. MLbase is a platform addressing both issues, and consists of three components -- MLlib, MLI, ML Optimizer."

It seems that project and related information are not being updated for last 2 years:

http://www.mlbase.org/

http://www.cs.berkeley.edu/~ameet/mlbase_website/mlbase_website/download.html

https://github.com/amplab/MLI

http://ampcamp.berkeley.edu/wp-content/uploads/2013/07/amp_camp_8_30_13-1.pdf

Re: Spark MLBase (MLI and ML Optimizer) status

mfoley — Wed, 11 Nov 2015 02:19:25 GMT

I don't know from personal involvement, but it may be that all useful parts of the MLBase project have been absorbed into Spark ML, and no one chose to continue MLBase as a separate project. The MLBase project page itself says that MLlib is just the Spark project's MLlib, see https://github.com/apache/spark/tree/master/mllib

The MLBase project page also says, "Many features in MLlib have been borrowed from ML Optimizer and MLI." That suggests that there was already a process of absorption happening in 2013, and perhaps after that there was insufficient motivation to continue developing ML Optimizer and MLI as separate components.

In support of this idea, it appears that https://github.com/amplab/MLI/tree/master/src/main/scala/ml is a subset of the contents of https://github.com/apache/spark/tree/master/mllib/src/main/scala/org/apache/spark/ml

In my brief effort I was not able to similarly track down remnants of the "ML Optimizer" code, but certainly there are optimizers throughout the Spark ML code, and they tend to be algorithm-specific, so there wouldn't be much motivation for grouping them into a discrete component.

Hope this helps.

Re: Spark MLBase (MLI and ML Optimizer) status

forest — Tue, 25 Jun 2019 00:26:49 GMT

As a complement to Matt Foley's answer: concerning MLOptimizer, I think they were either meaning generic optimization algorithms such as Gradient Descent, available in mllib.optimization package (see https://spark.apache.org/docs/2.3.0/mllib-optimization.html), or they were meaning ML algorithm hyper-parameter optimization. Hyper-parameter tuning using e.g. cross-validation and grid-search is available in the Spark ML tuning package (see https://spark.apache.org/docs/2.2.0/ml-tuning.html).

However, if they were meaning automatic hyper-parameter optimization using for example Bayesian optimization, then I would like to know more about it...