Community Articles

anandi · ‎09-27-2017

A Machine Learning Model learns from data. As you get new incremental data, the Machine Learning model needs to be upgraded.

A Machine Learning Model factory ensures that as you have deployed model in production, continuous learning is also happening on incremental new data ingested in the Production environment.

As deployed ML Model's performance decays, a new trained and serialized model needs to be deployed. An A/B test between the deployed model and the newly trained model can score them to evaluate the performance of the deployed model versus the incrementally trained model.

In order to build a Machine Learning Model factory, we have to establish a robust road to production, first.

The foundational framework is first to establish three environments: DEV, TEST and PROD.

1- DEV - A development environment where the Data Scientists have their own data puddle in order to perform data exploration, profile the data, develop the machine learning features from the data, build the model, train and test it on the limited subset and then commit to git to transport the code to the next stages. For the purpose of scaling and tuning the learning of the Machine Learning model, we establish a DEV Validation environment, where the model learning is scaled with as much historical data as possible and tuned.

2- TEST - The TEST environment is a pre-production environment where we running the machine learning models through integration tests and readying the move of the Machine Learning model to production in two branches:

2a - model deployment: where the trained serialized Machine Learning model is deployed in the production environment

2b - continuous training: where the Machine Learning model is going through continuous training on incremental data

3- PROD - The Production environment is where live data is ingested. In the production environment a deployment server is hosting the serialized trained model. The deployed model exposes a REST api to deliver predictions on live data queries. The ML model code is running in production ingesting incremental live data and getting continuously trained. The deployed model and the continuous training model performances are measured. If the deployed model is showing decay in prediction performance, then it is switched with a newer serialized version of the continuous training model. The model performance measure can be tracked by closing the loop with the users feedback and tracking True Positive, False Positive, True Negative and False Negative.

This choreography of training and deploying machine learning models in production is the heart of the ML model factory.

The road to production is depicting the journey of building Machine Learning models within the DEV/TEST/PROD environments.

39583-machine-learning-model-factory-road-to-production.png

Cloudera Community

Community Articles

Machine Learning Model Factory and Road To Production

Apache Spark

How to use Model Registry on Cloudera Machine Lear...

How to setup Model Registry on Cloudera Machine Le...

Machine Learning Ops with Cloudera AI

How to set up CI-CD workflows in Cloudera Machine ...

Price Optimization with PyGurobi in Cloudera Machi...

How to integrate a Feature Store on Cloudera Machi...

PandasOnSpark in Cloudera Machine Learning (CML)

Implementing Streaming Machine Learning and Deep L...

Distributed XGBoost with PySpark in Cloudera Machi...

Spark Machine Learning Pipeline by Example