Member since
09-28-2015
9
Posts
26
Kudos Received
0
Solutions
12-31-2018
05:08 PM
5 Kudos
Manufacturing IoT/Process Monitoring Demo Summary You will be provided with background on what type of usage cases Manufacturing Shop Floors are implementing to maintain a competitive advantage in their respective marketplaces. And then we will walk through, step by step, instructions on how to build your own Manufacturing IoT - Process Monitoring Dashboard environment. This dashboard will be isolated in this demo to two fans on a single Manufacturing Shop Floor and a single cluster for demo purposes. What’s Included Instructions and pointers on how to: Obtain and deploy HDP, HDF, and KEPServerEX Setup KEPServerEX to access an OPC TCP UA Server Setup KEPServerEX - IoT Gateway as a REST API Endpoint A pre-build Nifi Flow (fans_demo_v7.xml) Use either the Nifi “InvokeHTTP” or “GetFile” processor Inbound Source can be an IoT Gateway REST API or Fan Event File Dump data A Fan Events Data Tar file containing a dump of individual fan events (fans.tar.gz) A Hive Create Table query (create_table_fans_demo7.sql) Druid Kafka Indexer spec (supervisor-spec.json) CURL start command (start_druid_kafka_indexer.sh) A step by step walk through with sample commands and references Note: References to ips, ports, and hostnames in each step must be changes to match your lab. Figure 1 Background Slide 1 Slide 2 Slide 3 Slide 4 Setup - Manufacturing IoT - Process Monitoring Dashboard Step 1: Deploy your own KEPServerEX Server, HDP and HDF Follow the Assumptions section in the Readme: https://github.com/mlochbihler/manufacturing_iot_demo At this point, you should have a fully functioning HDP with HDF, and optionally, for Step 2 below, a KEPServerEX Server up and running. Here is a view from Ambari of the cluster we will be using during this walkthrough: with Step 2 (Optional): Setup KEPServerEX - Connectivity and IoT Gateway You may not have an IoT lab setup available so we have made this step optional. It should also be noted that we chose KEPServerEX as it is already deployed in 10s of thousands of Manufacturing shop floors today, and has pre-built connectors to most major Manufacturing Equipment in the field. The Operational Technology (OT) teams at Manufacturing Shop floors are already experts in setting up industry standard OPC.TCP UA Servers. For more information on this topic, please see: https://opcfoundation.org/ In our IoT Lab, we are using two special sensors attached to each of two fans on our shop floor. A sensor is a device used to measure a property, such as pressure, position, temperature, or acceleration, and respond with feedback. The sensors we have deployed are called accelerometers and they measures vibrations. We have labeled the sensors XAccel and YAccel and our OT team has very carefully attached them to each fan (See Figure 1 Above). Note that we can extend this demo to support Predictive Maintenance Usage Cases, by having our data scientists write Advanced Pattern Recognition (APR) models which will detect the slightest of anomalies in vibration rate. This will be the trigger to send alerts to our Operations teams to investigate and potentially save millions of dollars by avoiding unplanned outages Of course, our IoT demo only focuses on a few fans, but you can image monitoring all your manufacturing equipment across multiple geographically dispersed manufacturing plants. The potential for savings is exponential. For this demo, we’ll focus on Phase 1 - Our Process Monitoring Dashboard. We could easily inject an APR model and Alerts into this demo as a next step. Step 2a: Setup Connectivity to an OPC TCP UA Server Above, you will see that to get started configuring KEPServerEX, all we need to do is click on the plus sign next to “Connectivity”, add “Fans” and then open the “Property Editor” and add an Endpoint URL. In our lab, we were told by our OT team that the fans are connected to opc.tcp://10.1.175:49580 . After we add this ip:port combination, we hit “OK” to continue. Step 2b: Setup Connectivity to an OPC TCP UA Server In the UI pictured below, we will expand “Fans” and then “Fans” again by clicking on the plus signs under “Project” : “Connectivity” and then click on “Device” and use the menu to add our six Tags as listed below in the right hand side of this page. Once the Tags are added, we will be able to use the OPC client to test if data is coming into our KEPServerEX connection. This client is below the “Menu” bar and to the right of the red X. It is a nice feature provided by the OPC Foundation as an open source option for testing and is included here. Step 2c: Setup an IoT Gateway - REST Server We can now expand the plus sign to the left of “IoT Gateway” under our project and add a REST Server. This requires us to use the dropdown menu, select “Fans.Fans” from available connected devices, and then manually add each Server Tag which will create a single event with a timestamp containing readings from both fans. We are collecting RPM, XAccel and YAccel in each event over time. See the UI below for an example of what this should look like when you are finished configuring this step. At this point, if both of our fans are turned on on the shop floor, we should be able to go to any Browser that has access to our IoT Gateway - REST Server and issue the following command to GET a single event: http://10.1.10.29:39320/iotgateway/read?ids=Fan.Fans.Fan1RPM&ids=Fan.Fans.Fan1XAccel&ids=Fan.Fans.Fan1YAccel&ids=Fan.Fans.Fan2RPM&ids=Fan.Fans.Fan2XAccel&ids=Fan.Fans.Fan2YAccel The read results from the above GET will be a single event in the following JSON format. Note that we have an array of JSON objects returned for each event read. {"readResults": [ {"id":"Fan.Fans.Fan1RPM","s":true,"r":"","v":2168.6972766802269,"t":1542814217376}, {"id":"Fan.Fans.Fan1XAccel","s":true,"r":"","v":141.83649160225542,"t":1542814217376}, {"id":"Fan.Fans.Fan1YAccel","s":true,"r":"","v":306.66535907602884,"t":1542814217376}, {"id":"Fan.Fans.Fan2RPM","s":true,"r":"","v":2175.153030248483,"t":1542814217376}, {"id":"Fan.Fans.Fan2XAccel","s":true,"r":"","v":165.56492668383274,"t":1542814217376}, {"id":"Fan.Fans.Fan2YAccel","s":true,"r":"","v":435.84022952593745,"t":1542814217376} ] } Once we have the above working, we can continue to the next step. Step 3: Setup a fans_demo Kakfa topic In our HDP cluster, we find our Kakfa bin directory located here: /usr/hdp/3.0.1.0-187/kafka/bin We must create a “fans_demo” Kafka topic to be used by the rest of this demo. Here are some helpful example commands to get you going on this step. ./kafka-topics.sh --create --zookeeper c3n1m710p:2181,c9n1m710p:2181,c18n1m710p:2181 --replication-factor 1 --partitions 1 --topic fans_demo ./kafka-topics.sh --list --zookeeper c3n1m710p:2181,c9n1m710p:2181,c18n1m710p:2181 ./kafka-console-consumer.sh --zookeeper c3n1m710p:2181,c9n1m710p:2181,c18n1m710p:2181 --topic fans_demo --from-beginning ./kafka-delete-records.sh --bootstrap-server m510-16c:6667 --offset-json-file j.json Step 4: Move Fan Events Data Dump Tar file to Nifi server After downloading fans.tar.gz from https://github.com/mlochbihler/manufacturing_iot_demo , you will need to move it to your Nifi Server node. Here is an example of moving it from a MAC using iterm to the Nifi Server node and then unpacking it using tar. A new directory will be created in /tmp on your nifi_host called fans. This will be used by the Nifi Flow - fans_demo_v7.xml once it is setup in Nifi in the next step. #On your MAC $ cd <directory where you moved fans.tar.gz> $ scp fans.tar.gz root@<nifi_hostname>:/tmp $ ssh root@<nifi_hostname> #On the nifi_host $ cd /tmp $ tar -xvf fans.tar.gz This step is not listed as optional, even if you have access to an OT lab. This step setup will allow you to do continuous testing starting with Step 5 below. Step 5: Setup a continuous Fan Events Data Flow pipeline to HDFS After downloading our Pre-Build Nifi Template (fans_demo_v7.xml) from https://github.com/mlochbihler/manufacturing_iot_demo , you will need to import this template using the Nifi UI: The entire flow will be visible as a single processor group - “Monitor Fans”. Double click on this processor group and you will then see the entire flow as show below in the Nifi UI below. Note that for this step, we are focusing our data flow testing on moving fan events into HDP, and more specifically, HDFS. This can be show down the middle of the Nifi dataflow. Even though the diagram below shows all processors running, at this point, we should see all processors stopped. Do not turn them on at this time. We will do some in a testing fashion, following one data pipeline split at a time until we are comfortable we have everything working smoothly. Again, before turning on any Nifi Processors for testing, the following changes need to be made the above “Monitor Fans” nifi data flow so that it will work in your environment: Assuming you don’t have an OT lab, replace “Pull Fan Data (InvokeHTTP)” processor with the “GetFile” processor already present on the UI. Note: The “GetFile” processor will pull from /tmp/fans directory, and remove all event files present. This is as designed, as you don’t want to populate the event data twice because it has a timestamp field associated with it. So, please don’t change the Property in this processor to keep the event files to “true”. Update the Configuration for “PutHDFS” to point to your cluster. And, if you have a preference on file ownership for output from the output flowfiles from the process to HDFS, please change it here as well. Keep in mind, we will query later using the “hive” user to this file: /user/nifi/fans_demo7 Update “Fan1 Kafka Producer (PutKafka)” and “Fan2 Kakfa Producer (PutKakfa)” Configurations so they point to your Kafka instance/host. For now, keep the “Split for Fan1 Producer (ReplaceText)” and “Split for Fan2 Producer (ReplaceText)” processors turned off until we are ready to test them in Step 7. Now, in the Nifi UI, turn on only the “GetFile” processor by right clicking and selecting “start” from the dropdown menu. Then check that the events are flowing through to the first Nifi Queue properly. Each Nifi flow file should look similar to the “readResults” from Step 2c. This can be tested after the event data has flown into the first queue by using the “List Queue” menu option after clicking on the Queue once event data is present. You should be able to display by default, 100 events as shown above in the Nifi Queue Listing UI. At this point, you should be able to “View” the contents of a single read event, and validate the “readResults” match. Here is what you should see for each event in the Nifi UI: If you get this far, and the “readResults” for each event match, you are ready to continue to the next step. Note this entire Nifi flow actually has branching pipelines from a single inbound Fans Event data stream to Kakfa and also HDFS. On this step, we will only focus on testing moving the fan event data to HDFS (we will test the Nifi to Kakfa pipeline split later). In testing our Nifi dataflow pipeline to HDFS, we now only want to now turn on the following processors listed below on the Nifi UI. But before turning each of them on, one at a time, you should step through each processor, reviewing their configuration to gain valuable knowledge and also, turning them on one at a time and inspecting each response or success flowfile and associated attributes by using the “List Queue” steps used in the previous step. You are ready to turn the next Nifi Processor in the “Nifi to HDFS” pipeline split at this time: “Move JSON to Attributes (EvaluateJSONPath)” Repeat inspecting the Queue and Event flowfiles. Review the changes to attributes. Then continue this process for the remainder of the Nifi processors leading to HDFS: “Create CSV (Attributes to CSV)” V “Add EOL to Flowfile (ReplaceText)” V “Merge Content (Merge Content)” V “Change Batch Filename (UpdateAttribute)” V “PutHDFS (PutHDFS)” V ^ “Create Delay Attribute (UpdateAttribute)” > “3 Minute Wait (RouteOnAttribute)” At this point, you should have data in HDFS. You can inspect this by running the following commands on an HDFS Client Node: $ su nifi $ hadoop fs -ls /user/nifi/fans_demo7 In Step 7, we will return to this flow and discuss testing the additional split data pipeline to Kakfa. Step 6: (Optional) Create a Hive Table - fans_demo7 You can use your iterm on MAC and login to an HDP node that has the Hive Client installed as root and change to the hive user and create the fans_demo7 table. Note that if your starting cluster was one of the AMI's mentioned above, note that Hive is not installed by default and would need to be manually added (alternatively, this step could be skipped as well) Here is an example session: $ ssh root@<hive_node> $ su hive $ cd $ vi create_table_fans_demo7.sql #Copy the create external table statement below and use vi “insert” and then paste it and # then hit “esc” :wq to create the file and exit vi. $ hive 0: jdbc:hive2://<zk_node:2181,zk_node:2181> source create_table_fans_demo7.sql; 0: jdbc:hive2://<zk_node:2181,zk_node:2181> select * from fans_demo7 limit 20; 0: jdbc:hive2://<zk_node:2181,zk_node:2181> !quit create_table_fans_demo7.sql create external tableifnotexists fans_demo7 (fan_timestamp bigint, fan1rpm decimal(20,3), fan1xaccel decimal(20,3), fan1yaccel decimal(20,3), fan2rpm decimal(20,3), fan2xaccel decimal(20,3), fan2yaccel decimal(20,3)) row format delimited fields terminated by',' lines terminated by'\n' location '/user/nifi/fans_demo7'; select * from fans_demo7 limit 20; Example Output: Now, you are able to use your favorite BI tool to access the fan event data. You are also able to run ML/DL model development cycles on HDP using Spark and Tensorflow with this same data. These models could then be deployed back out to the Ingest Cycle for Predictive Maintenance and other Manufacturing - Industry 4.0 Usage Cases. Step 7: Test the Fan Events Data Flow pipeline split to Kafka You are ready to turn the next Nifi Processors in the “Nifi to Kakfa” pipeline split at this time. We have adjusted the flow processors on the screen so that you can easily see the split and data flowing to Kakfa in the following UI: Now, we will want to turn the following two processors on at the same time. You can do this by making sure they are the only processors highlighted on the Nifi UI and then using the “Operations” panel on the left hand side of the screen. Use the “>” button. “Split for Fan1 Producer (ReplaceText)” and “Split for Fan2 Producer (ReplaceText)” The reason we want to turn them on at the same time is we are splitting the original JSON Array into individual flowfiles for each of the two fans. This will lead to a much cleaner dashboard. Note that at this split, the same timestamp will flow to each of fan1 and fan2 flowfiles. Don’t forget to repeat inspecting each Queue and Event flowfiles. Review the changes to flowfile, including its attributes so you can learn how Nifi works. Now, lets turn on the next two processors on at the same time now. Remember, you can do this by making sure they are the only processors highlighted on the Nifi UI and then using the “Operations” panel on the left hand side of the screen. Use the “>” button. “Fan1 Kakfa Producer (PutKakfa)” and “Fan2 Kafka Producer (PutKakfa)” At this point, you should have data in a single fans_demo topic in Kakfa. And each flowfile has been split so that Fan1 and Fan2 data are generating their own flowfile output. You can inspect the Kafka topic by running the following commands on an Kafka Client Node (replace hostnames with your own): # listing all events in fans_demo topic $ ./kafka-console-consumer.sh --zookeeper c3n1m710p:2181,c9n1m710p:2181,c18n1m710p:2181 --topic fans_demo --from-beginning # Deleting all events in fans_demo topic $ ./kafka-delete-records.sh --bootstrap-server m510-16c:6667 --offset-json-file j.json Step 7: Ingest Data from Kafka to Druid Druid is an analytics data store designed for analytic (OLAP) queries on event data. It draws inspiration from Google’s Dremel, Google’s PowerDrill, and search infrastructure. Pre-requisite: Before using it you need to ensure "druid-kafka-indexing-service" is available in druid.extensions.loadList property. You can confirm this by opening Ambari > Druid > Configs and searching for druid.extensions.loadList property Step 7a: Start the Druid Kafka Indexer You should be able to find this service under the following directory:/usr/hdp/3.0.1.0-187/druid/ First, login to your Druid node and create the following supervisor-spec.json file in /tmp on your Druid node. supervisor-spec.json (make sure to change the hostname) { "type": "kafka", "dataSchema": { "dataSource": "fans_demo7", "parser": { "type": "string", "parseSpec": { "format": "json", "timestampSpec": { "column": "fan_timestamp", "format": "millis" }, "dimensionsSpec": { "dimensions": [], "dimensionExclusions": [ "timestamp", "value" ] } } }, "metricsSpec": [ { "name": "count", "type": "count" }, { "name": "rpm_sum", "fieldName": "rpm", "type": "doubleSum" }, { "name": "rpm_min", "fieldName": "rpm", "type": "doubleMin" }, { "name": "rpm_max", "fieldName": "rpm", "type": "doubleMax" }, { "name": "xaccel_sum", "fieldName": "xaccel", "type": "doubleSum" }, { "name": "xaccel_min", "fieldName": "xaccel", "type": "doubleMin" }, { "name": "xaccel_max", "fieldName": "xaccel", "type": "doubleMax" }, { "name": "yaccel_sum", "fieldName": "yaccel", "type": "doubleSum" }, { "name": "yaccel_min", "fieldName": "yaccel", "type": "doubleMin" }, { "name": "yaccel_max", "fieldName": "yaccel", "type": "doubleMax" } ], "granularitySpec": { "type": "uniform", "segmentGranularity": "HOUR", "queryGranularity": "NONE" } }, "tuningConfig": { "type": "kafka", "maxRowsPerSegment": 5000000 }, "ioConfig": { "topic": "fans_demo", "consumerProperties": { "bootstrap.servers": "m510-16c:6667" }, "taskCount": 1, "replicas": 1, "taskDuration": "PT10M" } } Then, to start the Druid Kafka Indexer, run the following command: curl -X POST -H 'Content-Type: application/json' -d @/tmp/supervisor-spec.jsonhttp://c9n1m710p:8090/druid/indexer/v1/supervisor Then, within Ambari, click on the “Druid” service on the left hand side of the page and then use the “Druid - Coordinator Console” Quick Link on the right hand side of the page. You should now see the following dataSource (fans_demo7) and Running Tasks (index_kafka_fans_demo7_……..). You can inspect the logs using the hot link URLs on the right had of the page. Be patient, we have the indexer writing every hour to Druid, so it will take a this long to see the Druid Data Source Show up. Step 7b: Inspect the Druid - Overlord Console Within Ambari, click on the “Druid” service on the left hand side of the page if you are not still already there and then use the “Druid - Overlord Console” Quick Link on the right hand side of the page. Again, be patient. Take a break and come back in 60 minutes. Then, you should be able to refresh this screen and see the fans_demo7 Druid datasource. If you get this far, you are no ready to use Superset to create a Real-Time Dashboard. Step 8: Create a Manufacturing IoT - Process Monitoring Dashboard Within Ambari, click on the “Superset” service on the left hand side of the page and then use the “Superset” Quick Link on the right hand side of the page. Step 8a: Create Dashboard Slices Use the “Charts” menu and follow the screenshots below to create the following real-time slices of fan event data: Slice 1: Monitor Fan RPM Speed Slice 2: X Accel Vibration Rate Slice 3: Y Accel Vibration Rate Slice 4: Fan Monitoring Table Step 8b: Create Manufacturing Shop Floor Dashboard Add all the slices in “Edit” Mode”. Note that any vibration rate for X or Y Accels over 350 is concerning and should be investigated. I hope this lab is a valuable learning tool for your team! Over and Out! Reference Michael Ger, Hortonworks General Manager, Manufacturing and Automotive Provided Slides 1-3 Additional Material Data Works Summit, June 2018 Title: An Introduction to Druid URL:https://www.youtube.com/watch?v=JEhmHsN8jZI Author: Nishant Bangarwa, Hortonwork - Software Engineer, Druid Committer, PMC
... View more
Labels:
06-16-2016
05:35 AM
Same issue Mark reported on HDP 2.4 Sandbox using sqoop import on a single table. Example command sqoop import --connect jdbc:mysql://192.168.1.17:3306/test --username drice --password hadoop --table client --hive-table default.client --hive-import -m 1 NOTE Marks workaround worked new command sqoop import --connect jdbc:mysql://192.168.1.17:3306/test --username drice --password hadoop --table client --hive-table default.client --hive-import -m 1 --driver com.mysql.jdbc.Driver
... View more
11-24-2015
03:39 PM
2 Kudos
This article will shield light on the HP Big Data Reference Architecture. Hadoop Architectures have been traditionally build to move the math to the data. This has been the basis for Hadoop architectures since the initial implementations back at Yahoo. Let's take a look at traditional verse HP's Big Data Architecture approach. Current traditional Big Data
approach • Compute
and storage are always collocated • All
servers are identical • Data
is partitioned across servers on direct-attached storage (DAS) tradhadoop.png New HP Big Data approach • Separate
compute and storage tiers connected by Ethernet networking • Standard
Hadoop installed asymmetrically with storage components on the storage servers
and yarn applications on the compute servers hpbdra.png With this new approach, HP engineers have challenged the conventional wisdom that compute should always be co-located with data. HP has been aggressively benchmarking with this new archiecture, and feel they have designed an architecture that will provide maximum elasticity for Big Data Workloads. They are leveraging YARN labels and have not changes a line of Hadoop code with their deployments. Results are significant and have outperformed traditional architecture in a number of tests. This architecture allows for optimized data analytics by running multiple applications while consolidating multiple data stores in a single, high performance system. The HP BDRA is able to be flexible to rapid change, and offers an opportunity to lower TCO. I would encourage you to read more abou this at HP and Hortonworks web sites. I hope you find this information useful in your hadoop journey! Mark Lochbihler Hortonworks Partner Engineering @MarkLochbihler
... View more
Labels:
11-24-2015
03:07 PM
2 Kudos
Presto is currently licensed by Apache and provides an ANSI SQL compliance and a rules based optimizer. In the Fall of 2014, Presto was up to 88 releases, with 41 Contributors and 3943 commits. In the Spring 2015, Teradata provided the first ever commercial support for Presto and is committed to a multi-phased roadmap. It is a 100% open source SQL on Hadoop Engine. Presto offers a modern code base, proven scalability, interactive querying and cross platform query capability. Presto is used by a community of well known, well respected technology companies. And, Presto can leverage HCAT schema and also work with ORC file formats today.If you are looking for an ANSI SQL, fast engine on Hadoop, and you also need to access data outside of Hadoop, Presto is a good fit. Keep in mind that individual tasks must fit into memory at this time, so it is not the best choice for large scale batch on Hadoop.
... View more
Labels:
01-07-2017
11:57 PM
Hey Mark, good article! Thought I'd resurface this by adding a note on point (3) above, for those who want to set up multiple HS2s and load balance as per http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_hadoop-high-availability/content/ch_multiple_hs2s.html And here is a sample LIBNAME syntax that would be used to connect from the SAS application: libname h2 hadoop URI=”jdbc:hive2://<server1>,<server2>,<server3>/default;
serviceDiscoveryMode=zooKeeper;zooKeeperNamespace="hiveserver2″
user=&sysuserid server=”dummy”;
... View more
11-24-2015
03:58 PM
also note, Teradata has released their Teradata Connector for Hadoop(TDCH) with HDP2.3 through joint efforts with both engineering teams.
... View more
11-04-2015
07:22 PM
I second Ryan's comment on dedicated disk or partition for logs. that the biggest mistake made in the field is to not dedicate disk or partition to /var/log. it is not trivial to migrate logs if managed by Ambari after initial installation, although it is doable. If you fail into this situation, ask for instruction set from Hortonworks support. I had to do so and wished I had set this up initially on seperate disk
... View more