Community Articles

Find and share helpful community-sourced technical articles.
Celebrating as our community reaches 100,000 members! Thank you!
Labels (2)

Manufacturing IoT/Process Monitoring Demo


You will be provided with background on what type of usage cases Manufacturing Shop Floors are implementing to maintain a competitive advantage in their respective marketplaces. And then we will walk through, step by step, instructions on how to build your own Manufacturing IoT - Process Monitoring Dashboard environment. This dashboard will be isolated in this demo to two fans on a single Manufacturing Shop Floor and a single cluster for demo purposes.

What’s Included

  • Instructions and pointers on how to:
    • Obtain and deploy HDP, HDF, and KEPServerEX
    • Setup KEPServerEX to access an OPC TCP UA Server
    • Setup KEPServerEX - IoT Gateway as a REST API Endpoint
  • A pre-build Nifi Flow (fans_demo_v7.xml)
    • Use either the Nifi “InvokeHTTP” or “GetFile” processor
    • Inbound Source can be an IoT Gateway REST API or Fan Event File Dump data
  • A Fan Events Data Tar file containing a dump of individual fan events (fans.tar.gz)
  • A Hive Create Table query (create_table_fans_demo7.sql)
  • Druid Kafka Indexer spec (supervisor-spec.json)
  • CURL start command (
  • A step by step walk through with sample commands and references

Note: References to ips, ports, and hostnames in each step must be changes to match your lab.

Figure 1


Slide 1

Slide 2

Slide 3

Slide 4

Setup - Manufacturing IoT - Process Monitoring Dashboard

Step 1: Deploy your own KEPServerEX Server, HDP and HDF

Follow the Assumptions section in the Readme:

At this point, you should have a fully functioning HDP with HDF, and optionally, for Step 2 below, a KEPServerEX Server up and running.

Here is a view from Ambari of the cluster we will be using during this walkthrough:


Step 2 (Optional): Setup KEPServerEX - Connectivity and IoT Gateway

You may not have an IoT lab setup available so we have made this step optional. It should also be noted that we chose KEPServerEX as it is already deployed in 10s of thousands of Manufacturing shop floors today, and has pre-built connectors to most major Manufacturing Equipment in the field. The Operational Technology (OT) teams at Manufacturing Shop floors are already experts in setting up industry standard OPC.TCP UA Servers. For more information on this topic, please see:

In our IoT Lab, we are using two special sensors attached to each of two fans on our shop floor. A sensor is a device used to measure a property, such as pressure, position, temperature, or acceleration, and respond with feedback. The sensors we have deployed are called accelerometers and they measures vibrations. We have labeled the sensors XAccel and YAccel and our OT team has very carefully attached them to each fan (See Figure 1 Above). Note that we can extend this demo to support Predictive Maintenance Usage Cases, by having our data scientists write Advanced Pattern Recognition (APR) models which will detect the slightest of anomalies in vibration rate. This will be the trigger to send alerts to our Operations teams to investigate and potentially save millions of dollars by avoiding unplanned outages Of course, our IoT demo only focuses on a few fans, but you can image monitoring all your manufacturing equipment across multiple geographically dispersed manufacturing plants. The potential for savings is exponential. For this demo, we’ll focus on Phase 1 - Our Process Monitoring Dashboard. We could easily inject an APR model and Alerts into this demo as a next step.

Step 2a: Setup Connectivity to an OPC TCP UA Server

Above, you will see that to get started configuring KEPServerEX, all we need to do is click on the plus sign next to “Connectivity”, add “Fans” and then open the “Property Editor” and add an Endpoint URL. In our lab, we were told by our OT team that the fans are connected to opc.tcp://10.1.175:49580 . After we add this ip:port combination, we hit “OK” to continue.

Step 2b: Setup Connectivity to an OPC TCP UA Server

In the UI pictured below, we will expand “Fans” and then “Fans” again by clicking on the plus signs under “Project” : “Connectivity” and then click on “Device” and use the menu to add our six Tags as listed below in the right hand side of this page. Once the Tags are added, we will be able to use the OPC client to test if data is coming into our KEPServerEX connection. This client is below the “Menu” bar and to the right of the red X. It is a nice feature provided by the OPC Foundation as an open source option for testing and is included here.

Step 2c: Setup an IoT Gateway - REST Server

We can now expand the plus sign to the left of “IoT Gateway” under our project and add a REST Server. This requires us to use the dropdown menu, select “Fans.Fans” from available connected devices, and then manually add each Server Tag which will create a single event with a timestamp containing readings from both fans. We are collecting RPM, XAccel and YAccel in each event over time. See the UI below for an example of what this should look like when you are finished configuring this step.

At this point, if both of our fans are turned on on the shop floor, we should be able to go to any Browser that has access to our IoT Gateway - REST Server and issue the following command to GET a single event:

The read results from the above GET will be a single event in the following JSON format. Note that we have an array of JSON objects returned for each event read.











Once we have the above working, we can continue to the next step.

Step 3: Setup a fans_demo Kakfa topic

In our HDP cluster, we find our Kakfa bin directory located here: /usr/hdp/ We must create a “fans_demo” Kafka topic to be used by the rest of this demo. Here are some helpful example commands to get you going on this step.

./ --create --zookeeper c3n1m710p:2181,c9n1m710p:2181,c18n1m710p:2181 --replication-factor 1 --partitions 1 --topic fans_demo

./ --list --zookeeper c3n1m710p:2181,c9n1m710p:2181,c18n1m710p:2181

./ --zookeeper c3n1m710p:2181,c9n1m710p:2181,c18n1m710p:2181 --topic fans_demo --from-beginning

./ --bootstrap-server m510-16c:6667 --offset-json-file j.json

Step 4: Move Fan Events Data Dump Tar file to Nifi server

After downloading fans.tar.gz from , you will need to move it to your Nifi Server node. Here is an example of moving it from a MAC using iterm to the Nifi Server node and then unpacking it using tar. A new directory will be created in /tmp on your nifi_host called fans. This will be used by the Nifi Flow - fans_demo_v7.xml once it is setup in Nifi in the next step.

#On your MAC

$ cd <directory where you moved fans.tar.gz>

$ scp fans.tar.gz root@<nifi_hostname>:/tmp

$ ssh root@<nifi_hostname>

#On the nifi_host

$ cd /tmp

$ tar -xvf fans.tar.gz

This step is not listed as optional, even if you have access to an OT lab. This step setup will allow you to do continuous testing starting with Step 5 below.

Step 5: Setup a continuous Fan Events Data Flow pipeline to HDFS

After downloading our Pre-Build Nifi Template (fans_demo_v7.xml) from , you will need to import this template using the Nifi UI:

The entire flow will be visible as a single processor group - “Monitor Fans”. Double click on this processor group and you will then see the entire flow as show below in the Nifi UI below. Note that for this step, we are focusing our data flow testing on moving fan events into HDP, and more specifically, HDFS. This can be show down the middle of the Nifi dataflow. Even though the diagram below shows all processors running, at this point, we should see all processors stopped. Do not turn them on at this time. We will do some in a testing fashion, following one data pipeline split at a time until we are comfortable we have everything working smoothly.

Again, before turning on any Nifi Processors for testing, the following changes need to be made the above “Monitor Fans” nifi data flow so that it will work in your environment:

  • Assuming you don’t have an OT lab, replace “Pull Fan Data (InvokeHTTP)” processor with the “GetFile” processor already present on the UI.
    • Note: The “GetFile” processor will pull from /tmp/fans directory, and remove all event files present. This is as designed, as you don’t want to populate the event data twice because it has a timestamp field associated with it. So, please don’t change the Property in this processor to keep the event files to “true”.
  • Update the Configuration for “PutHDFS” to point to your cluster. And, if you have a preference on file ownership for output from the output flowfiles from the process to HDFS, please change it here as well. Keep in mind, we will query later using the “hive” user to this file: /user/nifi/fans_demo7
  • Update “Fan1 Kafka Producer (PutKafka)” and “Fan2 Kakfa Producer (PutKakfa)” Configurations so they point to your Kafka instance/host. For now, keep the “Split for Fan1 Producer (ReplaceText)” and “Split for Fan2 Producer (ReplaceText)” processors turned off until we are ready to test them in Step 7.

Now, in the Nifi UI, turn on only the “GetFile” processor by right clicking and selecting “start” from the dropdown menu. Then check that the events are flowing through to the first Nifi Queue properly. Each Nifi flow file should look similar to the “readResults” from Step 2c.

This can be tested after the event data has flown into the first queue by using the “List Queue” menu option after clicking on the Queue once event data is present. You should be able to display by default, 100 events as shown above in the Nifi Queue Listing UI.

At this point, you should be able to “View” the contents of a single read event, and validate the “readResults” match. Here is what you should see for each event in the Nifi UI:

If you get this far, and the “readResults” for each event match, you are ready to continue to the next step.

Note this entire Nifi flow actually has branching pipelines from a single inbound Fans Event data stream to Kakfa and also HDFS. On this step, we will only focus on testing moving the fan event data to HDFS (we will test the Nifi to Kakfa pipeline split later). In testing our Nifi dataflow pipeline to HDFS, we now only want to now turn on the following processors listed below on the Nifi UI. But before turning each of them on, one at a time, you should step through each processor, reviewing their configuration to gain valuable knowledge and also, turning them on one at a time and inspecting each response or success flowfile and associated attributes by using the “List Queue” steps used in the previous step.

You are ready to turn the next Nifi Processor in the “Nifi to HDFS” pipeline split at this time:

“Move JSON to Attributes (EvaluateJSONPath)”

Repeat inspecting the Queue and Event flowfiles. Review the changes to attributes. Then continue this process for the remainder of the Nifi processors leading to HDFS:

“Create CSV (Attributes to CSV)”


“Add EOL to Flowfile (ReplaceText)”


“Merge Content (Merge Content)”


“Change Batch Filename (UpdateAttribute)”



V ^

“Create Delay Attribute (UpdateAttribute)” > “3 Minute Wait (RouteOnAttribute)”

At this point, you should have data in HDFS. You can inspect this by running the following commands on an HDFS Client Node:

$ su nifi

$ hadoop fs -ls /user/nifi/fans_demo7

In Step 7, we will return to this flow and discuss testing the additional split data pipeline to Kakfa.

Step 6: (Optional) Create a Hive Table - fans_demo7

You can use your iterm on MAC and login to an HDP node that has the Hive Client installed as root and change to the hive user and create the fans_demo7 table. Note that if your starting cluster was one of the AMI's mentioned above, note that Hive is not installed by default and would need to be manually added (alternatively, this step could be skipped as well)

Here is an example session:

$ ssh root@<hive_node>

$ su hive

$ cd

$ vi create_table_fans_demo7.sql

#Copy the create external table statement below and use vi “insert” and then paste it and

# then hit “esc” :wq to create the file and exit vi.

$ hive

0: jdbc:hive2://<zk_node:2181,zk_node:2181> source create_table_fans_demo7.sql;

0: jdbc:hive2://<zk_node:2181,zk_node:2181> select * from fans_demo7 limit 20;

0: jdbc:hive2://<zk_node:2181,zk_node:2181> !quit


create external tableifnotexists fans_demo7 (fan_timestamp bigint, fan1rpm decimal(20,3), fan1xaccel decimal(20,3), fan1yaccel decimal(20,3), fan2rpm decimal(20,3), fan2xaccel decimal(20,3), fan2yaccel decimal(20,3)) row format delimited fields terminated by',' lines terminated by'\n' location '/user/nifi/fans_demo7';

select * from fans_demo7 limit 20;

Example Output:

Now, you are able to use your favorite BI tool to access the fan event data. You are also able to run ML/DL model development cycles on HDP using Spark and Tensorflow with this same data. These models could then be deployed back out to the Ingest Cycle for Predictive Maintenance and other Manufacturing - Industry 4.0 Usage Cases.

Step 7: Test the Fan Events Data Flow pipeline split to Kafka

You are ready to turn the next Nifi Processors in the “Nifi to Kakfa” pipeline split at this time. We have adjusted the flow processors on the screen so that you can easily see the split and data flowing to Kakfa in the following UI:

Now, we will want to turn the following two processors on at the same time. You can do this by making sure they are the only processors highlighted on the Nifi UI and then using the “Operations” panel on the left hand side of the screen. Use the “>” button.

“Split for Fan1 Producer (ReplaceText)” and “Split for Fan2 Producer (ReplaceText)”

The reason we want to turn them on at the same time is we are splitting the original JSON Array into individual flowfiles for each of the two fans. This will lead to a much cleaner dashboard. Note that at this split, the same timestamp will flow to each of fan1 and fan2 flowfiles.

Don’t forget to repeat inspecting each Queue and Event flowfiles. Review the changes to flowfile, including its attributes so you can learn how Nifi works.

Now, lets turn on the next two processors on at the same time now. Remember, you can do this by making sure they are the only processors highlighted on the Nifi UI and then using the “Operations” panel on the left hand side of the screen. Use the “>” button.

“Fan1 Kakfa Producer (PutKakfa)” and “Fan2 Kafka Producer (PutKakfa)”

At this point, you should have data in a single fans_demo topic in Kakfa. And each flowfile has been split so that Fan1 and Fan2 data are generating their own flowfile output. You can inspect the Kafka topic by running the following commands on an Kafka Client Node (replace hostnames with your own):

# listing all events in fans_demo topic

$ ./ --zookeeper c3n1m710p:2181,c9n1m710p:2181,c18n1m710p:2181 --topic fans_demo --from-beginning

# Deleting all events in fans_demo topic

$ ./ --bootstrap-server m510-16c:6667 --offset-json-file j.json

Step 7: Ingest Data from Kafka to Druid

Druid is an analytics data store designed for analytic (OLAP) queries on event data. It draws inspiration from Google’s Dremel, Google’s PowerDrill, and search infrastructure.

Pre-requisite: Before using it you need to ensure "druid-kafka-indexing-service" is available in druid.extensions.loadList property. You can confirm this by opening Ambari > Druid > Configs and searching for druid.extensions.loadList property

Step 7a: Start the Druid Kafka Indexer

You should be able to find this service under the following directory:/usr/hdp/

First, login to your Druid node and create the following supervisor-spec.json file in /tmp on your Druid node.

supervisor-spec.json (make sure to change the hostname)


"type": "kafka",

"dataSchema": {

"dataSource": "fans_demo7",

"parser": {

"type": "string",

"parseSpec": {

"format": "json",

"timestampSpec": {

"column": "fan_timestamp",

"format": "millis"


"dimensionsSpec": {

"dimensions": [],

"dimensionExclusions": [







"metricsSpec": [


"name": "count",

"type": "count"



"name": "rpm_sum",

"fieldName": "rpm",

"type": "doubleSum"



"name": "rpm_min",

"fieldName": "rpm",

"type": "doubleMin"



"name": "rpm_max",

"fieldName": "rpm",

"type": "doubleMax"



"name": "xaccel_sum",

"fieldName": "xaccel",

"type": "doubleSum"



"name": "xaccel_min",

"fieldName": "xaccel",

"type": "doubleMin"



"name": "xaccel_max",

"fieldName": "xaccel",

"type": "doubleMax"



"name": "yaccel_sum",

"fieldName": "yaccel",

"type": "doubleSum"



"name": "yaccel_min",

"fieldName": "yaccel",

"type": "doubleMin"



"name": "yaccel_max",

"fieldName": "yaccel",

"type": "doubleMax"



"granularitySpec": {

"type": "uniform",

"segmentGranularity": "HOUR",

"queryGranularity": "NONE"



"tuningConfig": {

"type": "kafka",

"maxRowsPerSegment": 5000000


"ioConfig": {

"topic": "fans_demo",

"consumerProperties": {

"bootstrap.servers": "m510-16c:6667"


"taskCount": 1,

"replicas": 1,

"taskDuration": "PT10M"



Then, to start the Druid Kafka Indexer, run the following command:

curl -X POST -H 'Content-Type: application/json' -d @/tmp/supervisor-spec.jsonhttp://c9n1m710p:8090/druid/indexer/v1/supervisor

Then, within Ambari, click on the “Druid” service on the left hand side of the page and then use the “Druid - Coordinator Console” Quick Link on the right hand side of the page. You should now see the following dataSource (fans_demo7) and Running Tasks (index_kafka_fans_demo7_……..). You can inspect the logs using the hot link URLs on the right had of the page. Be patient, we have the indexer writing every hour to Druid, so it will take a this long to see the Druid Data Source Show up.

Step 7b: Inspect the Druid - Overlord Console

Within Ambari, click on the “Druid” service on the left hand side of the page if you are not still already there and then use the “Druid - Overlord Console” Quick Link on the right hand side of the page. Again, be patient. Take a break and come back in 60 minutes. Then, you should be able to refresh this screen and see the fans_demo7 Druid datasource.

If you get this far, you are no ready to use Superset to create a Real-Time Dashboard.

Step 8: Create a Manufacturing IoT - Process Monitoring Dashboard

Within Ambari, click on the “Superset” service on the left hand side of the page and then use the “Superset” Quick Link on the right hand side of the page.

Step 8a: Create Dashboard Slices

Use the “Charts” menu and follow the screenshots below to create the following real-time slices of fan event data:

Slice 1: Monitor Fan RPM Speed

Slice 2: X Accel Vibration Rate

Slice 3: Y Accel Vibration Rate

Slice 4: Fan Monitoring Table

Step 8b: Create Manufacturing Shop Floor Dashboard

Add all the slices in “Edit” Mode”.

Note that any vibration rate for X or Y Accels over 350 is concerning and should be investigated.

I hope this lab is a valuable learning tool for your team! Over and Out!


Michael Ger, Hortonworks

General Manager, Manufacturing and Automotive

Provided Slides 1-3

Additional Material

Data Works Summit, June 2018

Title: An Introduction to Druid


Author: Nishant Bangarwa, Hortonwork - Software Engineer, Druid Committer, PMC