Community Articles
Find and share helpful community-sourced technical articles
Labels (3)
Super Guru

Implementing Streaming Machine Learning and Deep Learning In Production Part 1

After we have done our data exploration with Apache Zeppelin, Hortonworks Data Analytics Studio and other Data Science Notebooks and Tools, we will start building iterations of ever improving models that need to be used in live environments. These will need to run at scale and score millions of records in real-time streams.

These can be in various frameworks, versions,types and many options of data required.

There are a number of things we need to think about when doing this.


Model Deployment Options

  • Apache Spark
  • Apache Storm (Hortonworks Streaming Analytics Manager - SAM)
  • Apache Kafka Streams
  • Apache NiFi
  • YARN 3.1
  • YARN Submarine
  • TensorFlow Serving on YARN
  • Cloudera Data Science Workbench


  • Classification
  • Security
  • Automation
  • Data Lineage
  • Schema Versioning, REST API and Management
  • Data Provenance
  • Scripting
  • Integration with Kafka
  • Containerized Services
  • Support Docker Containers running on YARN
  • Support Dockerized Spark Jobs
  • Model Registry
  • Scalability
  • Data Variety
  • Data and Storage Format
  • Flexiblity
  • Handling Media Types such as images, sound and video

Required Elements

  • Apache NiFi 1.8.0
  • Apache Kafka 2.0
  • Apache Kafka Streams 2.0
  • Apache Atlas 1.0.0
  • Apache Ranger 1.2.0
  • Apache Knox 1.0
  • Hortonworks Streams Messaging Manager 1.2.0
  • Hortonworks Schema Registry 0.5.2
  • NiFi Registry 0.2.0
  • Apache Hadoop 3.1
  • Apache YARN 3.1+
  • Apache HDFS or Amazon S3
  • Apache Druid 0.12.1
  • Apache HBase 2.0

Apache Spark - Apache NiFi

There are a number of options for running Machine Learning models in production via Apache NiFi. I have use these methods.

  • Apache NiFi to Apache Spark Integration via Kafka and Spark Streaming
  • Apache NiFi to Apache Spark Integration via Kafka and Spark Structured Streaming
  • Apache NiFi to Apache Spark Integration via Apache Livy


Hadoop - YARN 3.1 - No Docker - No Spark

We can deploy Deep Learning Models and run classification (as well as training) on YARN natively.


Apache Kafka Streams

Kafka Streams has full integration Platform services including Schema Registry, Ranger and Ambari.

Apache NiFi Native Java Processors for Classification

We can use a custom processor in Java that runs as a native part of the dataflow.

Apache NiFi Integration with a Model Server Native to a Framework

Apache MXNet has an open source model server that has a full REST API that can easily be integrated with Apache NiFi.

To run Apache MXNet model server is easy:

mxnet-model-server --models SSD=resnet50_ssd_model.model --service --port 9998

TensorFlow also has a model server that supports gRPC and REST.

Hortonworks Streaming Analytics Manager (SAM)

SAM supports running machine learning models exported as PMML as part of a flow.

You can score the model in a fully graphical manner:

Deep Work on Model Governance and Integration with Apache Atlas:


Don't have an account?
Version history
Last update:
‎08-17-2019 05:03 AM
Updated by:
Top Kudoed Authors