- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Created on 12-29-2018 05:49 AM - edited 08-17-2019 05:03 AM
Implementing Streaming Machine Learning and Deep Learning In Production Part 1
After we have done our data exploration with Apache Zeppelin, Hortonworks Data Analytics Studio and other Data Science Notebooks and Tools, we will start building iterations of ever improving models that need to be used in live environments. These will need to run at scale and score millions of records in real-time streams.
These can be in various frameworks, versions,types and many options of data required.
There are a number of things we need to think about when doing this.
Model Deployment Options
- Apache Spark
- Apache Storm (Hortonworks Streaming Analytics Manager - SAM)
- Apache Kafka Streams
- Apache NiFi
- YARN 3.1
- YARN Submarine
- TensorFlow Serving on YARN
- Cloudera Data Science Workbench
Requirements
- Classification
- REST API
- Security
- Automation
- Data Lineage
- Schema Versioning, REST API and Management
- Data Provenance
- Scripting
- Integration with Kafka
- Containerized Services
- Support Docker Containers running on YARN
- Support Dockerized Spark Jobs
- Model Registry
- Scalability
- Data Variety
- Data and Storage Format
- Flexiblity
- Handling Media Types such as images, sound and video
Required Elements
- Apache NiFi 1.8.0
- Apache Kafka 2.0
- Apache Kafka Streams 2.0
- Apache Atlas 1.0.0
- Apache Ranger 1.2.0
- Apache Knox 1.0
- Hortonworks Streams Messaging Manager 1.2.0
- Hortonworks Schema Registry 0.5.2
- NiFi Registry 0.2.0
- Apache Hadoop 3.1
- Apache YARN 3.1+
- Apache HDFS or Amazon S3
- Apache Druid 0.12.1
- Apache HBase 2.0
Apache Spark - Apache NiFi
There are a number of options for running Machine Learning models in production via Apache NiFi. I have use these methods.
- Apache NiFi to Apache Spark Integration via Kafka and Spark Streaming
- Apache NiFi to Apache Spark Integration via Kafka and Spark Structured Streaming
- Apache NiFi to Apache Spark Integration via Apache Livy
Hadoop - YARN 3.1 - No Docker - No Spark
We can deploy Deep Learning Models and run classification (as well as training) on YARN natively.
Apache Kafka Streams
Kafka Streams has full integration Platform services including Schema Registry, Ranger and Ambari.
Apache NiFi Native Java Processors for Classification
We can use a custom processor in Java that runs as a native part of the dataflow.
- https://community.hortonworks.com/content/kbentry/116803/building-a-custom-processor-in-apache-nifi-...
- https://github.com/tspannhw/nifi-tensorflow-processor
- https://community.hortonworks.com/articles/229215/apache-nifi-processor-for-apache-mxnet-ssd-single....
- https://github.com/tspannhw/nifi-mxnetinference-processor
Apache NiFi Integration with a Model Server Native to a Framework
Apache MXNet has an open source model server that has a full REST API that can easily be integrated with Apache NiFi.
https://community.hortonworks.com/articles/155435/using-the-new-mxnet-model-server.html
To run Apache MXNet model server is easy:
mxnet-model-server --models SSD=resnet50_ssd_model.model --service ssd_service.py --port 9998
TensorFlow also has a model server that supports gRPC and REST.
https://www.tensorflow.org/serving/api_rest
Hortonworks Streaming Analytics Manager (SAM)
SAM supports running machine learning models exported as PMML as part of a flow.
You can score the model in a fully graphical manner:
Deep Work on Model Governance and Integration with Apache Atlas:
- Customizing Atlas (Part1): Model governance, traceability and registry
- Generalized Framework to Deploy Models and Integrate Apache Atlas for Model Governance
- Customizing Atlas (Part2): Deep source metadata & embedded entities
- Customizing Atlas (Part3): Lineage beyond Hadoop, including reports & emails
References:
- https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68140
- https://apachecon.dukecon.org/acna/2018/#/scheduledEvent/7058e0d4f5ab28836
- https://dataworkssummit.com/berlin-2018/session/iot-with-apache-mxnet-and-apache-nifi-and-minifi/
- https://dataworkssummit.com/berlin-2018/session/apache-deep-learning-101/
- https://dataworkssummit.com/san-jose-2018/session/open-source-computer-vision-with-tensorflow-apache...
- https://www.slideshare.net/bunkertor/apache-deep-learning-201-philly-open-source
- https://www.slideshare.net/bunkertor/running-apache-nifi-with-apache-spark-integration-options