Community Articles
Find and share helpful community-sourced technical articles
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Labels (1)
Super Guru

Using Apache MXNet GluonCV with Apache NiFi for Deep Learning Computer Vision

Source: https://github.com/tspannhw/OpenSourceComputerVision/

Gluon and Apache MXNet have been great for deep learning especially for newbies like me. It got even better! They added a Deep Learning Toolkit that is easy to use and has a number of great pre-trained models that you can easily use to do some general use cases around computer vision. So I have used a simple well-documented example that I tweaked to save the final image and send some JSON details via MQTT to Apache NiFi.

This may sound familiar: https://community.hortonworks.com/articles/198912/ingesting-apache-mxnet-gluon-deep-learning-results...

GluonCV makes this even easier! Let's check it out. Again let's take a simpmle Python example tweak it, run it via a shell script and send the results over MQTT.

See: https://gluon-cv.mxnet.io/build/examples_detection/demo_ssd.html#sphx-glr-build-examples-detection-d...

Python Code: https://github.com/tspannhw/UsingGluonCV/tree/master


This is the Saved Annotated Figure

78466-gluonpic.jpg

Simple Apache NiFi Flow to Ingest MQTT Data from GluonCV example Python and Store to Hive and Parquet and HBase.

78467-gluoncvflow1.png

A simple flow:

  1. ConsumeMQTT
  2. InferAvroSchema
  3. RouteOnContent
  4. MergeRecord (convert batches of json to single avro)
  5. ConvertAvroToORC
  6. PutHDFS
  7. PutParquet
  8. PutHbaseRecord

Again Apache NiFi generates a schema for us from data examination. There's a really cool project coming out of New Jersey that has advanced schema generation looking at tables, I'll report on that later. We take it add, save to Schema Registry and are ready to Merge Records. One thing you may want to add is to turn regular types from: "type": "string" to "type": ["string","null"].

Schema

{
 "type": "record",
 "name": "gluoncv",
 "fields": [
  {
   "name": "imgname",
   "type": "string",
   "doc": "Type inferred from '\"images/gluoncv_image_20180615203319_6e0e5f0b-d2aa-4e94-b7e9-8bb7f29c9512.jpg\"'"
  },
  {
   "name": "host",
   "type": "string",
   "doc": "Type inferred from '\"HW13125.local\"'"
  },
  {
   "name": "shape",
   "type": "string",
   "doc": "Type inferred from '\"(1, 3, 512, 910)\"'"
  },
  {
   "name": "end",
   "type": "string",
   "doc": "Type inferred from '\"1529094800.88097\"'"
  },
  {
   "name": "te",
   "type": "string",
   "doc": "Type inferred from '\"2.4256367683410645\"'"
  },
  {
   "name": "battery",
   "type": "int",
   "doc": "Type inferred from '100'"
  },
  {
   "name": "systemtime",
   "type": "string",
   "doc": "Type inferred from '\"06/15/2018 16:33:20\"'"
  },
  {
   "name": "cpu",
   "type": "double",
   "doc": "Type inferred from '23.2'"
  },
  {
   "name": "diskusage",
   "type": "string",
   "doc": "Type inferred from '\"112000.8 MB\"'"
  },
  {
   "name": "memory",
   "type": "double",
   "doc": "Type inferred from '65.8'"
  },
  {
   "name": "id",
   "type": "string",
   "doc": "Type inferred from '\"20180615203319_6e0e5f0b-d2aa-4e94-b7e9-8bb7f29c9512\"'"
  }
 ]
}

Example JSON

{"imgname": "images/gluoncv_image_20180615203615_c83fed6f-2ec8-4841-97e3-40985f7859ad.jpg", "host": "HW13125.local", "shape": "(1, 3, 512, 910)", "end": "1529094976.237143", "te": "1.8907802104949951", "battery": 100, "systemtime": "06/15/2018 16:36:16", "cpu": 29.3, "diskusage": "112008.6 MB", "memory": 66.5, "id": "20180615203615_c83fed6f-2ec8-4841-97e3-40985f7859ad"}

Table Generated

CREATE EXTERNAL TABLE IF NOT EXISTS gluoncv (imgname STRING, host STRING, shape STRING, end STRING, te STRING, battery INT, systemtime STRING, cpu DOUBLE, diskusage STRING, memory DOUBLE, id STRING)

STORED AS ORC

LOCATION '/gluoncv'

Parquet Table

create external table gluoncv_parquet (imgname STRING, host STRING, shape STRING, end STRING, te STRING, battery INT, systemtime STRING, cpu DOUBLE, diskusage STRING, memory DOUBLE, id STRING)

STORED AS PARQUET

LOCATION '/gluoncvpar'

Reference:

https://gluon-cv.mxnet.io/

https://gluon-cv.mxnet.io/build/examples_detection/index.html

https://medium.com/apache-mxnet/gluoncv-deep-learning-toolkit-for-computer-vision-9218a907e8da

409 Views
Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
2 of 2
Last update:
‎08-17-2019 07:09 AM
Updated by:
 
Contributors
Top Kudoed Authors