Created on 12-29-2018 05:49 AM - edited 08-17-2019 05:03 AM
Implementing Streaming Machine Learning and Deep Learning In Production Part 1
After we have done our data exploration with Apache Zeppelin, Hortonworks Data Analytics Studio and other Data Science Notebooks and Tools, we will start building iterations of ever improving models that need to be used in live environments. These will need to run at scale and score millions of records in real-time streams.
These can be in various frameworks, versions,types and many options of data required.
There are a number of things we need to think about when doing this.
Model Deployment Options
Requirements
Required Elements
Apache Spark - Apache NiFi
There are a number of options for running Machine Learning models in production via Apache NiFi. I have use these methods.
Hadoop - YARN 3.1 - No Docker - No Spark
We can deploy Deep Learning Models and run classification (as well as training) on YARN natively.
Apache Kafka Streams
Kafka Streams has full integration Platform services including Schema Registry, Ranger and Ambari.
Apache NiFi Native Java Processors for Classification
We can use a custom processor in Java that runs as a native part of the dataflow.
Apache NiFi Integration with a Model Server Native to a Framework
Apache MXNet has an open source model server that has a full REST API that can easily be integrated with Apache NiFi.
https://community.hortonworks.com/articles/155435/using-the-new-mxnet-model-server.html
To run Apache MXNet model server is easy:
mxnet-model-server --models SSD=resnet50_ssd_model.model --service ssd_service.py --port 9998
TensorFlow also has a model server that supports gRPC and REST.
https://www.tensorflow.org/serving/api_rest
Hortonworks Streaming Analytics Manager (SAM)
SAM supports running machine learning models exported as PMML as part of a flow.
You can score the model in a fully graphical manner:
Deep Work on Model Governance and Integration with Apache Atlas:
References: