Community Articles

Find and share helpful community-sourced technical articles.
Cloudera Employee


Time for the tutorial 1 of a series detailing how to go from AI to Edge!

Note: all code/files referenced in this tutorial can be found on my github, here.


This tutorial is divided in the following sections:

  • Section 1: Create a custom Docker container running Jupyter for CDSW
  • Section 2: Automate Jupyter launch in a CDSW project
  • Section 3: Train and save a model reading MNSIT database

Section 1: Create a custom Docker container running Jupyter for CDSW

This is fairly straight forward to implement, as it is detailed in the official documentation.

Note: make sure that dock is signed in with your Dockerhub username/password (not email) otherwise the docker push will not work.

Step 1: Create a repository in docker hub

Go to docker hub and sign in with your account. Create a new repository as follows:


You should see something like this:


Step 2: Creating a custom docker file

Go to a folder on your computer can create this docker file (saving it as Dockerfile😞

RUN pip3 install --upgrade pip
RUN pip3 install keras
RUN pip3 install tensorflow
RUN pip3 install sklearn
RUN pip3 install jupyter
RUN pip3 install 'prompt-toolkit==1.0.15'
RUN pip3 install onnxruntime
RUN pip3 install keras2onnx

Step 3: Build the container

Run the following command in the folder where the file has been saved:

docker build -t YOUR_USER/YOUR_REPO:YOUR_TAG . -f Dockerfile

Step 4: Publish it to docker hub

Run the following command on your computer:


Section 2: Automate Jupyter launch in a CDSW project

Step 1: Create a shell script to run Jupyter

In CDSW 1.5, you can't add a CMD or an ENTRYPOINT to your docker file. Therefore, you will need to add a .bashrc file to your CDSW project, with the following code:

processes=`ps -ef | grep jupyter | wc -l`

if (( $processes == 2 )) ; then
    echo "Jupyter is already running!"
elif (( $processes == 1 )) ; then
    jupyter notebook --no-browser --ip= --port=8080 --NotebookApp.token=
    echo "Invalid number of processes, relaunch your session!"

Save this file to a github repository.

Step 2: Add the custom engine to CDSW

In CDSW config, use the docker hub image you created as your default engine:


Step 3: Create a project in CDSW with .bashrc

In CDSW, create a new project using the github repository you just created:


Note: You can create a blank project and add the .bashrc file to it, but this automates it.

Step 4: Launch a CDSW session with Jupyter

In your project, open workbench and launch a session with your custom engine. Run terminal access and Jupyter will launch. You will then see the following on your 9 dots, allowing you to run Jupyter:


Section 3: Train and save a model reading MNSIT database

The model training is very well explained in the original Kaggle article that can be found here.

A reviewed version of this notebook can be found on my github. The main thing that was added to the notebook is the publishing of the model:

# Convert into ONNX format with onnxmltools
import keras2onnx
onnx_model = keras2onnx.convert_keras(model,

import onnx
temp_model_file = 'model.onnx'

onnx.save_model(onnx_model, temp_model_file)

After the notebook runs, you should see the model.onnx file created.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.
Version history
Last update:
‎08-17-2019 02:18 PM
Updated by:
Top Kudoed Authors