The complexity of creating an end-to-end machine learning workflow is one of the biggest hurdles data science and machine learning engineers are facing. It's not a running model.fit() that is hard, but it is ingesting data, getting the data in the right format for the model training process, deploying the model in a way that is accessible to other parts of the business, and running applications that consume the model that is hard. Machine Learning is useful when it's deployed with an end-to-end workflow.
We have been working to create Applied Machine Learning Prototypes for CML that will help you build a fully working machine learning example in CML. The Prototypes will include source data, and walk through various steps:
Ingest data into a useful place in CDP (e.g. a Hive Table)
Explore the data set
Create a plan to build a model
Train the model
Deploy the model
Build and deploy an application
Once you have deployed the template and all the CML artifacts that go with it, you can unpick and work it backward to map the process to your own data in your own environment.
The first Applied Machine Learning Prototype is now available - Churn. To get up and running with it, do the following:
Log in to your CML workspace and create a project using the following repo:
This is the URL to the Git section in the Initial Setup:
This will deploy the files into your CML instance and will look like the following:
From here, follow the instructions in the README. If you just want to deploy the whole project and get the application up and running quickly, launch a new Workbench session:
Once the Workbench is open, open file 8_build_project.py and run the file:
When the script completes the run, your project will look like the following:
Launch the application from the Applications tab and click on the blue arrow next to the name:
This will open the application in a new window. The initial view is a randomly selected table from the dataset. This shows a global view of which features are most important for the predictor model. The reds show increased importance for predicting a customer that will churn and the blues for customers that will not.
Click on any single row to view a "local" interpreted model for that particular data point instance. Here, you can see how adjusting any one of the features will change the instance's churn prediction.
Changing the InternetService to DSL lowers the probability of churn. Note: This does not mean that changing the Internet Service to DSL cause the probability to go down, this is just what the model would predict for a customer with those data points.