Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (1)
avatar
Master Guru

62526-56690-icon.png

62527-56692-dwsberling.jpg

This is for people preparing to attend my talk on Deep Learning at DataWorks Summit Berling 2018 (https://dataworkssummit.com/berlin-2018/#agenda) on Thursday April 19, 2018 at 11:50AM Berlin time.

In this example we required Apache NiFi 1.5 or newer.

This is part 2 of https://community.hortonworks.com/articles/155435/using-the-new-mxnet-model-server.html

Our flow that receives the JSON files from the server does some minimal processing. We add some meta data fields, infer an AVRO schema from the JSON file (we only need to do this once in development and then you can delete that box from your flow). As you can see I can easily push that data to HDFS as a parquet file.

This is if you wish to not install Apache MXNet on your HDF, HDP or related nodes. You can now install Apache MXNet plus MMS on a cloud or edge server and call it via HTTP from Apache NiFi for processing.

Local Apache NiFi Flow To Call Our SSD Predict and Squeeze Net Predict REST Services

62747-mxnetlocalingestmms.png

Cluster Receiving The Two Remote Ports

62749-mxnetstorageoverview.png


Server Apache NiFi Flow

62740-mxnetstreamtoparquet.png


Example Squeeze Net JSON Data Processed by Apache NiFi

62741-mmssqueezenetpredictionjson.png


Set the Schema and Mime Type

62742-squeezenetsettings.png

Storage Settings For Apache Parquet Files on HDFS

62748-mxnetparqueststorage.png


SSD MMS Logs

62743-mxnetmodelservermmsssdpredict.png

Squeeze Net MMS Logs

62744-mxnetmodelserverssdpredict.png


Schemas Used

62750-ssdsqueezenetschemas.png

An Example Prediction returned, as you can see you get the coordinates for drawing a box.

62751-mmsssdpredict.png


To Store Apache Parquet Files:

hdfs dfs -mkdir /ssdpredict

hdfs dfs -chmod 755 /ssdpredict

Inside one of the files stored by Apache NiFi in HDFS, as your can see there is an embedded Apache Avro schema in JSON format built by Avro Parquet MR tool version 1.8.2.

parquet.avro.schema�{"type":"record","name":"ssdpredict","fields":[{"name":"prediction","type":{"type":"array","items":{"type":"array","items":["string","int"]}},"doc":"Type inferred from '[[\"person\",385,329,466,498],[\"bicycle\",96,386,274,498]]'"}]}writer.model.nameavroIparquet-mr version 1.8.2 (build c6522788629e590a53eb79874b95f6c3ff11f16c)sPAR1


Example File

-rw-r--r-- 3 nifi hdfs 688 2018-03-08 18:32 /ssdpredict/201801081202602.jpg.parquet.avro

Apache NiFi Flow File:

apache-mxnet-cluster-processing.xml


Reference:

http://parquet.apache.org/documentation/latest/


modelservermxnetstorage.png
1,224 Views