Community Articles

pvidal · ‎06-27-2019

Introduction

Let's jump into tutorial 2 from my AI to Edge series!

This tutorial details the creation of a Nifi flow executing the ONNX model we trained in my last article.

More precisely we will try to feed these 3 handwritten digits and predict their value:

Note: as always, all code/files referenced in this tutorial can be found on my github, here.

Agenda

Below is an overview of the flow:

As you can see, the flow is divided in the following sections:

Section 1: Listening to a folder for new png files
Section 2: Resizing these images to 28x28 (size used to train our model)
Section 3: Converting these images to CSV (format used to train our model)
Section 4: Running our predictive model

Section 1: Listening to a folder for new png files

Step 1: Setup a variable for the root folder

This will be useful when we deploy the flow to a minifi flow. Go to your variables and create the following:

Name: root_folder
Value: location of your download of my github

Step 2: List files in folder

Create a ListFiles processor and modify the following properties:

Input Directory: ${root_folder}NIFI/png/original/
File Filter: [^\.].*.png

Step 3: Fetch files in folder

Create a FetchFiles processor with default parameters.

Note: The List/Fetch paradigm is very powerful because it will allow us to continuously look for new images without reprocessing all of them. ListFiles is a stateful processor. If you're unfamiliar with the concept I encourage you to read about it on this community.

Section 2: Resizing these images to 28x28

Step 1: Resize Image

Create a ResizeImage processor and modify the following properties:

Image Width (in pixels): 28
Image Height (in pixels): 28

Step 2: Enter out attributes for resized images

Create an UpdateAttribute processor, aimed at defining the folder and filename of the resized images, by adding the following properties to the processor:

filedirectory: ${root_folder}NIFI/png/resized/
filename: resized_${filename}

Section 3: Converting these images to CSV

Step 1: Saving modified image

Create a PutFile processor and modify the following properties to store the converted image in the resized folder:

Directory: ${filedirectory}

Step 2: Execute a python script to convert images to CSV

In this step we will create an ExecuteStreamCommand processor that will run the convertImg.sh python script. The script takes the resized image file, converts it to grayscale, and converts it into an inverted CSV to match the input of our model. Below is the script itself:

#!/usr/bin/env python3

import os,png,array
import pandas as pd
import time
import sys

from PIL import Image
from PIL import ImageOps
columnNames = list()

for i in range(784):
    pixel = 'pixel'
    pixel += str(i)
    columnNames.append(pixel)


train_data = pd.DataFrame(columns = columnNames)
start_time = time.time()



img_name = sys.argv[1]
img = Image.open(img_name)
img = img.convert('LA')

rawData = img.load()

data = []
for y in range(28):
    for x in range(28):
        data.append(rawData[x,y][0])
print(i)
k = 0

#print data
train_data.loc[i] = [255-data[k] for k in range(784)]
csvFile = sys.argv[2]
print(csvFile)

train_data.to_csv(csvFile,index = False)

As you can see it expects two arguments:

Location of the resized image (img_name = sys.argv[1])
Location of the target CSV (csvFile = sys.argv[2])

Thus, you will modify the following properties in the ExecuteStreamCommand processor:

Command Arguments: ${root_folder}NIFI/png/resized/${filename};${root_folder}NIFI/csv/${filename}.csv
Command Path: ${root_folder}NIFI/convertImg.sh

Section 4: Running our predictive model

Step 1: Enter input attributes for model execution

Create an UpdateAttribute processor, aimed at defining the locations of the CSV file and the ONNX model, by adding the following properties to the processor:

filename: ${root_folder}NIFI/csv/${filename}.csv
onnxModel: ${root_folder}NOTEBOOKS/model.onnx

Step 2: Use python to run the model with onnxruntime

In this step we will create an ExecuteStreamCommand processor that will run the runModel.sh python script. The script takes the CSV version of the image and run the ONNX model created in the last tutorial with this CSV as an input. Below is the script itself:

#!/usr/bin/env python3

import onnxruntime as rt
import onnx as ox
import numpy
import pandas as pd
import shutil
import sys

test=pd.read_csv(sys.argv[1])

X_test = test.values.astype('float32')
X_test = X_test.reshape(X_test.shape[0], 28, 28,1)

session = rt.InferenceSession(sys.argv[2])

input_name = session.get_inputs()[0].name
label_name = session.get_outputs()[0].name
prediction = session.run([label_name], {input_name: X_test.astype(numpy.float32)})[0]

number = 0

for i in range(0, 9):
    if (prediction[0][i] == 1.0):
        number = i

print(number)

As you can see it expects two arguments:

Location of the CSV (test=pd.read_csv(sys.argv[1]))
Location of the ONNX model (session = rt.InferenceSession(sys.argv[2]))

Thus, you will modify the following properties in the ExecuteStreamCommand processor:

Command Arguments: ${filename};${onnxModel}
Command Path: ${root_folder}NIFI/runModel.sh

Results

If you run the flow against the image in my github, you will see 3 output flowfiles, predicting the value of the handwritten digit, like shown below:

Cloudera Community

Community Articles

Part 2: Nifi flow creation to parse new images and run the model

Apache MiNiFi

Apache NiFi

Cloudera Data Science Workbench (CDSW)

Tensorflow

Introduction

Agenda

Section 1: Listening to a folder for new png files

Step 1: Setup a variable for the root folder

Step 2: List files in folder

Step 3: Fetch files in folder

Section 2: Resizing these images to 28x28

Step 1: Resize Image

Step 2: Enter out attributes for resized images

Section 3: Converting these images to CSV

Step 1: Saving modified image

Step 2: Execute a python script to convert images to CSV

Section 4: Running our predictive model

Step 1: Enter input attributes for model execution

Step 2: Use python to run the model with onnxruntime

Results

Apache NiFi - Part 2 (Twitter Flow)

NiFi/HDF Dataflow Optimization (Part 1 of 2)

Image Data Flow for Industrial Imaging

Part 1: CDSW model training using a custom docker ...

Geo Enrich NiFi Provenance Event Data using Lookup...

Monitoring Kafka with Burrow - Part 2

Basic Image Processing and Linux Utilities As Part...

Druid - Part 2

Using Apache NiFi to Validate that Records Adhere ...

Cloudera NiFi - Automatic policy creation