Created on 03-08-201808:18 PM - edited 08-17-201908:32 AM
This is for people preparing to attend my talk on Deep Learning at DataWorks Summit Berling 2018 (https://dataworkssummit.com/berlin-2018/#agenda) on Thursday April 19, 2018 at 11:50AM Berlin time.
In this example we required Apache NiFi 1.5 or newer.
Our flow that receives the JSON files from the server does some minimal processing. We add some meta data fields, infer an AVRO schema from the JSON file (we only need to do this once in development and then you can delete that box from your flow). As you can see I can easily push that data to HDFS as a parquet file.
This is if you wish to not install Apache MXNet on your HDF, HDP or related nodes. You can now install Apache MXNet plus MMS on a cloud or edge server and call it via HTTP from Apache NiFi for processing.
Local Apache NiFi Flow To Call Our SSD Predict and Squeeze Net Predict REST Services
Cluster Receiving The Two Remote Ports
Server Apache NiFi Flow
Example Squeeze Net JSON Data Processed by Apache NiFi
Set the Schema and Mime Type
Storage Settings For Apache Parquet Files on HDFS
SSD MMS Logs
Squeeze Net MMS Logs
Schemas Used
An Example Prediction returned, as you can see you get the coordinates for drawing a box.
To Store Apache Parquet Files:
hdfs dfs -mkdir /ssdpredict
hdfs dfs -chmod 755 /ssdpredict
Inside one of the files stored by Apache NiFi in HDFS, as your can see there is an embedded Apache Avro schema in JSON format built by Avro Parquet MR tool version 1.8.2.
parquet.avro.schema�{"type":"record","name":"ssdpredict","fields":[{"name":"prediction","type":{"type":"array","items":{"type":"array","items":["string","int"]}},"doc":"Type inferred from '[[\"person\",385,329,466,498],[\"bicycle\",96,386,274,498]]'"}]}writer.model.nameavroIparquet-mr version 1.8.2 (build c6522788629e590a53eb79874b95f6c3ff11f16c)sPAR1