Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
avatar
Cloudera Employee

New in Cloudera Machine Learning (CML) 1.2 we’re excited to announce support for hosting persistent web based applications and dashboards using frameworks like Flask, Dash and Shiny to share analytics results and insights with business stakeholders.

 

Example Use case

 

Here’s a quick example of how it works based on a use-case involving New York City transportation data:

  1. Part 1 of the analysis compares workday trip data to facilitate the appropriate choice of transportation (bike vs. car), given a time of the day and route (Analysis based on this blog post by “Todd  W. Schneider”.)
  2. Part 2 of the analysis demonstrates outlier detection for trip durations, which might be interesting for internal audit purposes.

This example uses the most recent data from Citibike and NYC TLC for-hire vehicle (FHV) data.

 

Publishing the app

 

Two pre-trained machine learning models (simple sklearn logistic regression models) are served by CML to be accessed by the application via API. As our focus is mainly on hosting the application using CML, let’s jump straight to that. Here’s a step-by-step instruction on how to leverage the “app hosting” capability of CML.

 

  1. Provision Workspace: To run projects in an isolated workspace, and leverage the benefits of CDSW at the same time, provision an on-demand Machine Learning workspace.newWorkspace.pngprovisionWorkspace.png
  2. newWorkspace.pngLaunch Session and Create Project: Now that the workspace is provisioned, launch a session and create a new project. Before hosting the app on the web, the entire application code is required to be available on the CML Workspace. Projects can be created using an existing template(e.g. R, Python, PySpark). CML allows building applications using the standard structures of Flask/Dash frameworks.projCreate.png
  3. Once the project is created, we can verify if we have all the required files, under Overview→Files on the left navigation sidebar.proj.png
  4. Configuring the application: Much like all web services, our application uses a host address and a port to direct users to the application server. Here’s how we configure the application to point to a specific host and port: appConf.png
  5. Run the application: Run the application file (in our case, “cdsw_app.py”). Check the output and click on the link to the app that is output in the right window. Voila! There’s our application, up and running! Let’s check the application and both the visualisations.

Monte Carlo Simulation Results 

nyc_viz1_small.jpg

This visualisation shows New York City segmented into taxi zones. For each zone the colour chart shows which transport mode is faster from a selected reference zone coloured red. The user can either click on each zone to explore new relationships or press SPACEBAR to run an animated loop for the selected zone, cycling through all time bins.

Outlier Prediction Model 

NYC_viz_2_small.jpg

The user can press T to launch this animation. Trips between Trader Joe store locations are replayed and models served via CML are queried which classify each trip as an outlier or not. Outliers are visualised using red tracks, while normal trips are visualised in blue (Citibike) and yellow (FHV)

 

This was a simple demonstration of CML’s new Analytical Applications feature. For at step-by-step how-to including building the application, see Building an Interactive Machine Learning Application with CML. You can also tune-in for our weekly webinar series with technical experts to learn more about Cloudera's machine learning platform for enterprise data science teams. Each session will feature a product overview including a live demo and Q&A for both end-user, data scientists and administrators. For the webinar series in North America, click here. For the webinar series in Europe, Middle East and Africa, click here.

775 Views