Community Articles

Find and share helpful community-sourced technical articles.
Labels (1)
avatar
Master Guru

Ingesting Apache MXNet Gluon Deep Learning Results Via MQTT and Apache NiFi


Summary:

Using a Pre-Trained Model in Apache MXNet Gluon Python 3 code to classify a webcam image captured and processed with OpenCV. In our Python script, we capture the image to disk and capture JSON metadata about the percentage, probabilities and device information. This JSON data is then sent via MQTT to a broker. Apache NiFi processes the JSON data.

Example Image

78456-gluon-image-img.jpg


Source Code

SQL Table DDL

CREATE EXTERNAL TABLE IF NOT EXISTS gluon2 (top1pct STRING, top2pct STRING, top3pct STRING, 
top4pct STRING, top5pct STRING, top1 STRING, top2 STRING, top3 STRING, top4 STRING, 
top5 STRING, imgname STRING, host STRING, end STRING, te STRING, battery INT, 
systemtime STRING, cpu DOUBLE, diskusage STRING, memory DOUBLE, id STRING) 
STORED AS ORC
LOCATION '/gluon2'


Technologies: Python 3, Apache MXNet, Gluon, MQTT, Apache NiFi, OpenCV.


Based on http://gluon-crash-course.mxnet.io/predict.html


Apache NiFi Overview

78453-mxnetgluonsave.png'

Steps

  1. ConsumeMQTT: Ingest MQTT data from gluon2 topic sent from Python
  2. InferAvroSchema: One time grab the schema, then you can remove this processor.
  3. RouteOnContent: Throw away errors
  4. MergeRecord: Convert many JSON records into one large Apache AVRO file
  5. ConvertAvroToORC: Convert that Apache AVRO File into an Apache ORC file
  6. PutHDFS: Store the ApacheORC file in HDFS.

A side effect of the process is that is produces a SQL DDL to create a new table for this schema.

Table Example

78455-tabledatagluon.png


tabledatagluon.png
2,487 Views
Comments
avatar
Master Guru

Adding Parquet Output

https://cwiki.apache.org/confluence/display/Hive/Parquet

create external table gluon2_parquet (top1pct STRING, top2pct STRING, top3pct STRING, top4pct STRING, top5pct STRING, top1 STRING, top2 STRING, top3 STRING, top4 STRING, top5 STRING, imgname STRING, host STRING, `end` STRING, te STRING, battery INT, systemtime STRING, cpu DOUBLE, diskusage STRING, memory DOUBLE, id STRING) STORED AS PARQUET LOCATION '/gluon2par'

select * from gluon2_parquet

Add the PutParquet Processor

78457-putparquet.png