- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Created on 06-15-2018 03:16 PM - edited 08-17-2019 07:09 AM
Ingesting Apache MXNet Gluon Deep Learning Results Via MQTT and Apache NiFi
Summary:
Using a Pre-Trained Model in Apache MXNet Gluon Python 3 code to classify a webcam image captured and processed with OpenCV. In our Python script, we capture the image to disk and capture JSON metadata about the percentage, probabilities and device information. This JSON data is then sent via MQTT to a broker. Apache NiFi processes the JSON data.
Example Image
Source Code
- Schema: https://github.com/tspannhw/OpenSourceComputerVision/blob/master/gluon2.avsc
- Python Source: https://github.com/tspannhw/OpenSourceComputerVision/blob/master/nifigluon2.py
- Shell Script: https://github.com/tspannhw/OpenSourceComputerVision/blob/master/rungluon2.sh
SQL Table DDL
CREATE EXTERNAL TABLE IF NOT EXISTS gluon2 (top1pct STRING, top2pct STRING, top3pct STRING, top4pct STRING, top5pct STRING, top1 STRING, top2 STRING, top3 STRING, top4 STRING, top5 STRING, imgname STRING, host STRING, end STRING, te STRING, battery INT, systemtime STRING, cpu DOUBLE, diskusage STRING, memory DOUBLE, id STRING) STORED AS ORC LOCATION '/gluon2'
Technologies: Python 3, Apache MXNet, Gluon, MQTT, Apache NiFi, OpenCV.
Based on http://gluon-crash-course.mxnet.io/predict.html
Apache NiFi Overview
'
Steps
- ConsumeMQTT: Ingest MQTT data from gluon2 topic sent from Python
- InferAvroSchema: One time grab the schema, then you can remove this processor.
- RouteOnContent: Throw away errors
- MergeRecord: Convert many JSON records into one large Apache AVRO file
- ConvertAvroToORC: Convert that Apache AVRO File into an Apache ORC file
- PutHDFS: Store the ApacheORC file in HDFS.
A side effect of the process is that is produces a SQL DDL to create a new table for this schema.
Table Example
Created on 06-15-2018 04:11 PM - edited 08-17-2019 07:08 AM
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
Adding Parquet Output
https://cwiki.apache.org/confluence/display/Hive/Parquet
create external table gluon2_parquet (top1pct STRING, top2pct STRING, top3pct STRING, top4pct STRING, top5pct STRING, top1 STRING, top2 STRING, top3 STRING, top4 STRING, top5 STRING, imgname STRING, host STRING, `end` STRING, te STRING, battery INT, systemtime STRING, cpu DOUBLE, diskusage STRING, memory DOUBLE, id STRING) STORED AS PARQUET LOCATION '/gluon2par'
select * from gluon2_parquet
Add the PutParquet Processor