Community Articles

VidyaSargur · ‎08-28-2024

Objective

Cloudera Machine Learning (CML) is a platform designed to help organizations build, deploy, and manage machine learning models at scale. It is part of Cloudera’s suite of enterprise data platforms and solutions, focusing on providing a robust environment for data scientists, analysts, and engineers to collaborate on end-to-end machine learning workflows.

PyGurobi is a Python interface for the Gurobi Optimizer, a powerful and widely used solver for mathematical optimization problems. Gurobi is known for its high performance in solving a variety of optimization problems, including linear programming (LP), quadratic programming (QP), mixed-integer programming (MIP), and others.

In this tutorial, you will use PyGurobi on CML to optimize product prices and maximize enterprise revenue.

Requirements

The following are required to reproduce this example:

CML Workspace in AWS, Azure, OCP, or ECS.
Basic knowledge of Python for Machine Learning including Sci-Kit Learn, Spark, Iceberg, and XGBoost.
You should have basic familiarity with linear and nonlinear programming. If you are new to mathematical optimization, please visit this link for a quick introduction.

Step by Step Instructions

Supporting code for reproducing the tutorial can be found in this Git repository.

Launch a CML Session with the following runtime and resource profile:

Editor: JupyterLab
Kernel: Python 3.10
Edition: Standard
Version: 2024.05
Enable Spark: Spark 3.2 or above
Resource Profile: 2 CPU / 4 GB Mem / 0 GPU
Runtime Image: docker.repository.cloudera.com/cloudera/cdsw/ml-runtime-jupyterlab-python3.10-standard:2024.05.1-b8

Open the terminal and install the requirements:
```
pip3 install -r requirements.txt
```

Part 0: Data Generation

Run notebook ```00_datagen_iceberg_pyspark.ipynb``` and observe the following:

A Spark dataframe with 10000 synthetic product price transactions is created.
The P1 and P2 columns represent the prices for two products sold, and the N1 column represents the quantity of Product 1 sold.
The dataframe is stored as an Iceberg table.

Part 1: Pricing Optimization with Gurobi

Run notebook ```01_price_optimization_with_competing_products.ipynb``` and observe the following:

An MLFlow Experiment Context is created with the name "Price Optimization Experiment".
An initial regressor is built to predict prices using the data stored in the Iceberg table. The data is read using PandasOnSpark which is included in the Spark Runtime AddOn by default.
A Price Optimization model is instantiated with an Objective Function and associated Constraints.
The model is trained on the data. Its outputs include an optimal price recommendation for the two products, with an associated product quantity, and finally, a revenue estimate. In other words, revenue is maximized at 70347.77 when prices are 400 and 300 for the two products, respectively.

Part 2: Deploy Optimization Model in an API Endpoint

Run notebook ```02_price_optimization_model_deployment.ipynb``` and observe the following:

CML APIv2 allows you to programmatically execute actions within CML Workspaces. In this example, the API is used to create a small Python Interface to manage model deployments.
In particular, the interface was used to create a separate CML Project to host an API Endpoint. The API Endpoint is used to allocate a dedicated container for the model and provide an entry point for prediction requests.

Navigate back to the CML workspace and notice a new project named ```CML Project for Optimization Model``` has been created. Open it and notice a new Endpoint has been created in the Model Deployments section.

Open the model deployment and, once it has been completed, enter the following sample payload in the Test Request window. Observe the output response.

Test Input:

{"p[1]": [354,353,352,351,354,353,312,311,314,313,352,351], "p[2]": [110,120,320,220,101,100,101,260,355,140,300,299], "n[1]": [54,53,112,151,154,153,52,51,4,53,92,71]}

Sample Test Output:

{
"model_deployment_crn": "crn:cdp:ml:us-west-1:558bc1d2-8867-4357-8524-311d51259233:workspace:f76bd7eb-adde-43eb-9bd9-e16ec2cb0238/c152a438-6449-465e-8685-e1cc0b9988fa",
"prediction": {
"data": {
"n[1]": [
54,
53,
112,
151,
154,
153,
52,
51,
4,
53,
92,
71
],
"p[1]": [
354,
353,
352,
351,
354,
353,
312,
311,
314,
313,
352,
351
],
"p[2]": [
110,
120,
320,
220,
101,
100,
101,
260,
355,
140,
300,
299
]
},
"optimal prices": [
400,
300
],
"optimal product quantities": [
80,
120
],
"total revenue": 68032.83
},
"uuid": "e6700d88-f4e7-4705-988b-89e9c8092194"
}

Summary

In this tutorial, you used PyGurobi in Cloudera Machine Learning to maximize product revenue by identifying optimal prices and sales quantities for two products.

The PyGurobi library allows you to solve complex linear and nonlinear programming such as the above. Cloudera on Cloud provides the tooling necessary to use libraries such as PyGurobi in an enterprise setting. With CML you can easily leverage Spark on Kubernetes, Runtime Add-Ons, Iceberg, Python, MLFlow, and more, to install and containerize workloads and machine learning models at scale, without any custom installations.

Cloudera Community

Community Articles

Price Optimization with PyGurobi in Cloudera Machine Learning

Apache Spark

Cloudera Data Platform (CDP)

Cloudera Machine Learning (CML)

Objective

Requirements

Step by Step Instructions

Part 0: Data Generation

Part 1: Pricing Optimization with Gurobi

Part 2: Deploy Optimization Model in an API Endpoint

Summary

Related Articles and Resources

Machine Learning Ops with Cloudera AI

How to set up CI-CD workflows in Cloudera Machine ...

How to integrate a Feature Store on Cloudera Machi...

PandasOnSpark in Cloudera Machine Learning (CML)

How to use Model Registry on Cloudera Machine Lear...

How to setup Model Registry on Cloudera Machine Le...

Installing Django in Cloudera Machine Learning (CM...

Using Custom Data Connections in Cloudera Machine ...

Spark in CML: Recommendations for using Spark in C...

Cloudera Machine Learning (CML) - Questions & Answ...