1973
Posts
1225
Kudos Received
124
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 2491 | 04-03-2024 06:39 AM | |
| 3847 | 01-12-2024 08:19 AM | |
| 2083 | 12-07-2023 01:49 PM | |
| 3073 | 08-02-2023 07:30 AM | |
| 4213 | 03-29-2023 01:22 PM |
12-10-2018
07:41 PM
3 Kudos
Deep Speech with Apache NiFi 1.8
Tools: Python 3.6, PyAudio, TensorFlow, Deep Speech, Shell, Apache NiFi
Why: Speech-to-Text
Use Case: Voice control and recognition.
Series: Holiday Use Case: Turn on Holiday Lights and Music on command. Cool Factor: Ever want to run a query on Live Ingested Voice Commands?
Other Options: https://community.hortonworks.com/articles/155519/voice-controlled-data-flows-with-google-aiy-voice.html
We are using Python 3.6 to write some code around pyaudio, tensorflow and Deep Speech to capture audio, store it in a wave file and then process it with Deep Speech to extract some text. This example is running in OSX without a GPU on Tensorflow v1.11.
The Mozilla Github repo for their Deep Speech implementation has nice getting started information that I used to integrate our flow with Apache NiFi.
Installation as per https://github.com/mozilla/DeepSpeech
pip3 install deepspeech
wget -O - https://github.com/mozilla/DeepSpeech/releases/download/v0.3.0/deepspeech-0.3.0-models.tar.gz | tar xvfz -
This pre-trained model is available for English. For other languages, you will need to build your own. You can use a beef HDP 3.1 cluster to train this. Note: THIS IS A 1.8 GIG DOWNLOAD. That may be an issue for laptops, devices or small data people. Apache NiFi Flow The flow is simple, we call our shell script that runs Python that records audio and sends it to Deep Speech for processing. We get back a voice_string in JSON that we turn into a record for querying and filtering in Apache NiFi. I am handling a few voice commands for "Save", "Load" and "Move". As you can imagine you can handle pretty much anything you want. It's a simple way to use voice to control streaming data flows or just to ingest large streams of text. Even using advanced Deep Learning, text recognition is still not the strongest. If you are going to load balance connections between nodes, you have options on compression and load balancing strategies. This can come in handy if you have a lot of servers. Shell Script
python3.6 /Volumes/TSPANN/projects/DeepSpeech/processnifi.py /Volumes/TSPANN/projects/DeepSpeech/models/output_graph.pbmm /Volumes/TSPANN/projects/DeepSpeech/models/alphabet.txt
Schema
{
"type" : "record",
"name" : "voice",
"fields" : [ {
"name" : "systemtime",
"type" : "string",
"doc" : "Type inferred from '\"12/10/2018 14:53:47\"'"
}, {
"name" : "voice_string",
"type" : "string",
"doc" : "Type inferred from '\"\"'"
} ]
}
We can add more fields as needed.
Example Run
HW13125:DeepSpeech tspann$ ./runnifi.sh
TensorFlow: v1.11.0-9-g97d851f04e
DeepSpeech: unknown
2018-12-10 14:36:43.714433: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
{"systemtime": "12/10/2018 14:36:43", "voice_string": "one two three or five six seven eight nine"}
We can run this on top of YARN 3.1 as dockerized or non-dockerized workloads. Setting up nodes to run HDF 3.3 - Apache NiFi and friends is easy in the cloud or on-premise in OpenStack with super devops tools. When running Apache NiFi it is easy to monitor in Ambari:
References:
https://github.com/mozilla/DeepSpeech
https://community.hortonworks.com/articles/224268/running-tensorflow-on-yarn-31-with-or-without-gpu.html
https://arxiv.org/abs/1412.5567
https://github.com/tspannhw/nifi-deepspeech
... View more
Labels:
12-06-2018
09:53 PM
5 Kudos
Apache NiFi Processor for Apache MXNet SSD: Single Shot MultiBox Object Detector (Deep Learning) The news is out, Apache MXNet has added a Java API. So as soon as I could I got my hands on the maven repo and an example program and got to work writing a new Apache NiFi processor for it. I have run this on standalone Apache NiFi 1.8.0 and on HDF 3.3 - Apache NiFi 1.8.0 and both work. So anyone who wants to be an alpha tester, please download it and give it a try. Apache MXNet SSD is a good example of a pretrained deep learning model that works pretty well for general images in a use cases especially around people and cars. You can fine-tune this with some more images and runs: https://mxnet.incubator.apache.org/faq/finetune.html The nice thing is now we can start including Apache MXNet as part of Java applications such as Kafka Streams, Apache Storm, Apache Spark, Spring Boot and other use cases using Java. I could potentially inject this into a Hive UDF (https://community.hortonworks.com/articles/39980/creating-a-hive-udf-in-java.html#comment-40026) or Pig UDF. The performance may be fast enough. We now have four Java options for Deep Learning: DL4J, H2O, Tensorflow and Apache MXNet. Unfortunately, both TensorFlow and MXNet Java APIs are not quite production ready. I may do some further research on running MXNet as a Hive UDF, it would be cool to have in a query. For those who don't want to setup a development environment with JDK 8+, Maven 3.3+ and git, you can download a pre-built nar file here: https://github.com/tspannhw/nifi-mxnetinference-processor/releases/tag/v1.0. As part of the recent release of HDF 3.3, I have upgraded my OpenStack Centos 7 cluster. Important Caveats Notice, the Java API is in preview and so is this processor. Do not use this in production! This is in development and I am the only one working on it. The Java API from Apache MXNet is in flux and will be changing. See the POM as it is tied to the OSX/Mac version of the library. You will need to change that. You will need to download the pre-built MXNet model and place it in a directory accessible to Apache NiFi server/cluster. I am still cleaning up the rectangle code for identifying objects in the pictures. As you will notice, my rectangle drawing is a bit off. I need to work on that. Once you drop your built nar file and models in the nifi/lib directory and restart Apache NiFi, you can add it to your canvas. We need to feed it some images. You can use my web cam processor, an image URL feed or local files. To grab images from an HTTPS site, you need an SSL Context Service like this StandardSSLContextService below. You will need to point to the cacerts used by the JRE/JDK running your Apache NiFi node. The default password in Java is changeme. Hopefully you have changed it. To configure my new processor, just put in the full path to the model directory and then "/resnet50_ssd_model" as that is the prefix for the model. Our example flow with new processor being fed by traffic cameras, webcams, local files and local webcam. Some output of our flow: Our top 5 probabilities and labels Example Data: {
"ymin_1" : "456.01",
"ymin_5" : "159.29",
"ymin_4" : "235.83",
"ymin_3" : "206.64",
"ymin_2" : "383.84",
"label_5" : "person",
"xmax_5" : "121.14",
"label_4" : "bicycle",
"xmax_4" : "137.89",
"label_3" : "dog",
"xmax_3" : "179.14",
"ymax_1" : "150.66",
"ymax_2" : "418.95",
"ymax_3" : "476.79",
"label_2" : "bicycle",
"label_1" : "car",
"probability_4" : "0.22",
"probability_5" : "0.13",
"probability_2" : "0.90",
"xmin_5" : "88.93",
"probability_3" : "0.82",
"ymax_4" : "413.43",
"probability_1" : "1.00",
"ymax_5" : "190.04",
"xmax_2" : "149.96",
"xmax_1" : "72.03",
"xmin_3" : "83.82",
"xmin_4" : "93.05",
"xmin_1" : "312.21",
"xmin_2" : "155.96"
} Resources: https://medium.com/apache-mxnet/introducing-java-apis-for-deep-learning-inference-with-apache-mxnet-8406a698fa5a https://github.com/apache/incubator-mxnet/tree/java-api/scala-package/examples/src/main/java/org/apache/mxnetexamples/javaapi https://mxnet.incubator.apache.org/install/java_setup.html Source: https://github.com/tspannhw/nifi-mxnetinference-processor Video walk-through: https://www.youtube.com/watch?v=Q4dSGPvqXSA&t=196s&list=PL-7XqvSmQqfTSihuoIP_ZAnN7mFIHkZ_e&index=17 mxnet-processor.xml Download the artifacts listed: https://github.com/apache/incubator-mxnet/tree/java-api/scala-package/examples/src/main/java/org/apache/mxnetexamples/javaapi/infer/objectdetector#step-1 Maven POM (I used Java 8 and Maven 3.3.9) <?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>com.dataflowdeveloper.mxnet</groupId>
<artifactId>inference</artifactId>
<version>1.0</version>
</parent>
<artifactId>nifi-mxnetinference-processors</artifactId>
<packaging>jar</packaging>
<dependencies>
<dependency>
<groupId>org.apache.nifi</groupId>
<artifactId>nifi-api</artifactId>
</dependency>
<dependency>
<groupId>org.apache.nifi</groupId>
<artifactId>nifi-utils</artifactId>
<version>1.8.0</version>
</dependency>
<dependency>
<groupId>org.apache.nifi</groupId>
<artifactId>nifi-mock</artifactId>
<version>1.8.0</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-simple</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.mxnet</groupId>
<artifactId>mxnet-full_2.11-osx-x86_64-cpu</artifactId>
<version>1.3.1-SNAPSHOT</version>
</dependency>
</dependencies>
</project>
I have moved from Eclipse to IntelliJ from my builds. I am looking at Apache Netbeans as well.
... View more
Labels:
11-27-2018
03:50 AM
5 Kudos
MiniFi Java Agent 0.5 Copy over necessary NARs from Apache NiFi 1.7 lib:
nifi-ssl-context-service-nar-1.7.0.nar nifi-standard-services-api-nar-1.7.0.nar nifi-kafka-1-0-nar-1.7.0.nar This will support PublishKafka_1_0 and ConsumeKafka_1_0. Then create a consume and/or publish flow. You can combine the two based on your needs. In my simple example I consume the Kafka messages in MiniFi and write to a file. I also write the metadata to a JSON file. Consume Kafka Publish Electric Monitoring Data To Kafka Let's monitor the messages going through our topic, smartPlug. Publish Messages to Kafka Consume Any Messages From the smartPlug topic Logs Provenance Event file containing 377 records. In the past 5 minutes, 1512 events have been written to the Provenance Repository, totaling 839.32 KB
2018-11-26 19:42:32,473 INFO [main] o.a.n.c.s.StandardProcessScheduler Starting PutFile[id=25a86505-031a-37d9-0000-000000000000]2018-11-26 19:42:32,474 INFO [main] o.a.n.c.s.StandardProcessScheduler Starting UpdateAttribute[id=9220d40d-ee1d-3f61-0000-000000000000]2018-11-26 19:42:32,474 INFO [main] o.apache.nifi.controller.FlowController Started 0 Remote Group Ports transmitting2018-11-26 19:42:32,478 INFO [main] org.apache.nifi.minifi.MiNiFiServer Flow loaded successfully.2018-11-26 19:42:32,479 INFO [Monitor Processor Lifecycle Thread-2] o.a.n.c.s.TimerDrivenSchedulingAgent Scheduled ConsumeKafka_1_0[id=8556f1ce-a915-3fda-0000-000000000000] to run with 1 threads2018-11-26 19:42:32,479 INFO [main] org.apache.nifi.BootstrapListener Successfully initiated communication with Bootstrap2018-11-26 19:42:32,479 INFO [Monitor Processor Lifecycle Thread-1] o.a.n.c.s.TimerDrivenSchedulingAgent Scheduled AttributesToJSON[id=0628b4e5-10d0-3b09-0000-000000000000] to run with 1 threads2018-11-26 19:42:32,479 INFO [main] org.apache.nifi.minifi.MiNiFi Controller initialization took 2787584123 nanoseconds.2018-11-26 19:42:32,480 INFO [Monitor Processor Lifecycle Thread-1] o.a.n.c.s.TimerDrivenSchedulingAgent Scheduled PutFile[id=25a86505-031a-37d9-0000-000000000000] to run with 1 threads2018-11-26 19:42:32,481 INFO [Monitor Processor Lifecycle Thread-2] o.a.n.c.s.TimerDrivenSchedulingAgent Scheduled UpdateAttribute[id=9220d40d-ee1d-3f61-0000-000000000000] to run with 1 threads2018-11-26 19:42:32,585 INFO [Timer-Driven Process Thread-2] o.a.k.clients.consumer.ConsumerConfig ConsumerConfig values:auto.commit.interval.ms = 5000auto.offset.reset = latestbootstrap.servers = [princeton1.field.hortonworks.com:6667]check.crcs = trueclient.id =connections.max.idle.ms = 540000enable.auto.commit = falseexclude.internal.topics = truefetch.max.bytes = 52428800fetch.max.wait.ms = 500fetch.min.bytes = 1group.id = minificonsumer1heartbeat.interval.ms = 3000interceptor.classes = nullinternal.leave.group.on.close = trueisolation.level = read_uncommittedkey.deserializer = class org.apache.kafka.common.serialization.ByteArrayDeserializermax.partition.fetch.bytes = 1048576max.poll.interval.ms = 300000max.poll.records = 10000metadata.max.age.ms = 300000metric.reporters = []metrics.num.samples = 2metrics.recording.level = INFOmetrics.sample.window.ms = 30000partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor]receive.buffer.bytes = 65536reconnect.backoff.max.ms = 1000reconnect.backoff.ms = 50request.timeout.ms = 305000retry.backoff.ms = 100sasl.jaas.config = nullsasl.kerberos.kinit.cmd = /usr/bin/kinitsasl.kerberos.min.time.before.relogin = 60000sasl.kerberos.service.name = nullsasl.kerberos.ticket.renew.jitter = 0.05sasl.kerberos.ticket.renew.window.factor = 0.8sasl.mechanism = GSSAPIsecurity.protocol = PLAINTEXTsend.buffer.bytes = 131072session.timeout.ms = 10000ssl.cipher.suites = nullssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]ssl.endpoint.identification.algorithm = nullssl.key.password = nullssl.keymanager.algorithm = SunX509ssl.keystore.location = nullssl.keystore.password = nullssl.keystore.type = JKSssl.protocol = TLSssl.provider = nullssl.secure.random.implementation = nullssl.trustmanager.algorithm = PKIXssl.truststore.location = nullssl.truststore.password = nullssl.truststore.type = JKSvalue.deserializer = class org.apache.kafka.common.serialization.ByteArrayDeserializer2018-11-26 19:42:32,727 INFO [Timer-Driven Process Thread-2] o.a.kafka.common.utils.AppInfoParser Kafka version : 1.0.02018-11-26 19:42:32,727 INFO [Timer-Driven Process Thread-2] o.a.kafka.common.utils.AppInfoParser Kafka commitId : aaa7af6d4a11b29d2018-11-26 19:42:33,088 INFO [Timer-Driven Process Thread-2] o.a.k.c.c.internals.AbstractCoordinator [Consumer clientId=consumer-1, groupId=minificonsumer1] Discovered coordinator princeton1.field.hortonworks.com:6667 (id: 2147482646 rack: null)2018-11-26 19:42:33,090 INFO [Timer-Driven Process Thread-2] o.a.k.c.c.internals.ConsumerCoordinator [Consumer clientId=consumer-1, groupId=minificonsumer1] Revoking previously assigned partitions []2018-11-26 19:42:33,091 INFO [Timer-Driven Process Thread-2] o.a.k.c.c.internals.AbstractCoordinator [Consumer clientId=consumer-1, groupId=minificonsumer1] (Re-)joining group2018-11-26 19:42:36,391 INFO [Timer-Driven Process Thread-2] o.a.k.c.c.internals.AbstractCoordinator [Consumer clientId=consumer-1, groupId=minificonsumer1] Successfully joined group with generation 32018-11-26 19:42:36,394 INFO [Timer-Driven Process Thread-2] o.a.k.c.c.internals.ConsumerCoordinator [Consumer clientId=consumer-1, groupId=minificonsumer1] Setting newly assigned partitions [smartPlug-0]2018-11-26 19:44:32,325 INFO [pool-34-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 0 records in 0 milliseconds2018-11-26 19:44:40,700 INFO [Provenance Maintenance Thread-1] o.a.n.p.PersistentProvenanceRepository Created new Provenance Event Writers for events starting with ID 14372018-11-26 19:44:40,765 INFO [Provenance Repository Rollover Thread-1] o.a.n.p.lucene.SimpleIndexManager Index Writer for provenance_repository/index-1543271506000 has been returned to Index Manager and is no longer in use. Closing Index Writer2018-11-26 19:44:40,767 INFO [Provenance Repository Rollover Thread-1] o.a.n.p.PersistentProvenanceRepository Successfully merged 16 journal files (28 records) into single Provenance Log File provenance_repository/1409.prov in 62 milliseconds2018-11-26 19:44:40,768 INFO [Provenance Repository Rollover Thread-1] o.a.n.p.PersistentProvenanceRepository Successfully Rolled over Provenance Event file containing 151 records. In the past 5 minutes, 28 events have been written to the Provenance Repository, totaling 15.43 KB JSON Kafka Message and JSON Kafka Metadata Stored As Files monitor/1448678223641638.attr.json {"path":"./","filename":"1448678223641638","kafka.partition":"0","kafka.offset":"5543","kafka.topic":"smartPlug","kafka.key":"cb90ad21-b311-494c-96cc-06dd2e8747df","uuid":"041459fc-c63e-4056-ab50-1c375cd7d49f"} monitor/1448678223641638 {"day30": 0.431, "day31": 1.15, "sw_ver": "1.2.5 Build 171206 Rel.085954", "hw_ver": "1.0", "mac": "50:C7:BF:B1:95:D5", "type": "IOT.SMARTPLUGSWITCH", "hwId": "60FF6B258734EA6880E186F8C96DDC61", "fwId": "00000000000000000000000000000000", "oemId": "FFF22CFF774A0B89F7624BFC6F50D5DE", "dev_name": "Wi-Fi Smart Plug With Energy Monitoring", "model": "HS110(US)", "deviceId": "8006ECB1D454C4428953CB2B34D9292D18A6DB0E", "alias": "Tim", "icon_hash": "", "relay_state": 1, "on_time": 886569, "active_mode": "schedule", "feature": "TIM:ENE", "updating": 0, "rssi": -75, "led_off": 0, "latitude": 40.268216, "longitude": -74.529088, "index": 18, "zone_str": "(UTC-05:00) Eastern Daylight Time (US & Canada)", "tz_str": "EST5EDT,M3.2.0,M11.1.0", "dst_offset": 60, "month10": 1.581, "month11": 30.888, "current": 0.067041, "voltage": 122.151701, "power": 1.277361, "total": 24.289, "time": "11/26/2018 21:54:22", "ledon": true, "systemtime": "11/26/2018 21:54:22"} Resources:
https://blog.ona.io/general/2017/08/30/streaming-ona-data-with-nifi-kafka-druid-and-superset.html https://community.hortonworks.com/articles/193945/social-media-monitoring-with-nifi-hivedruid-integr.html https://community.hortonworks.com/articles/177561/streaming-tweets-with-nifi-kafka-tranquility-druid.html https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.3.0/kafka-using-kafka-streams/content/kafka-using-kafka-streams.html https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.3.0/minifi-quick-start/content/overview.html https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.3.0/minifi-quick-start/content/before_you_begin.html https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.3.0/minifi-quick-start/content/installing_minifi_on_linux.html https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.3.0/minifi-quick-start/content/using_processors_not_packaged_with_minifi.html?es_p=8055369 https://community.hortonworks.com/articles/227560/real-time-stock-processing-with-apache-nifi-and-ap.html https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.3.0/minifi-quick-start/content/using_processors_not_packaged_with_minifi.html?es_p=8055369 Files: consumekafka2.xml pushkafka1.xml configyml consume.txt configymlsend.txt
... View more
Labels:
11-05-2018
03:23 PM
3 Kudos
In preparation for my talk at the Philadelphia Open Source Conference(https://phillyopensource.splashthat.com/), Apache Deep Learning 201, I wanted to have some good images for running various Apache MXNet GluonCV Deep Learning Algorithms for Computer Vision. See: https://gluon-cv.mxnet.io/ Using Apache open source tools - Apache NiFi 1.8 and Apache MXNet 1.3 with GluonCV I can easily ingest live traffic camera images and run Object Detection, Semantic Segmentation and Instance Segmentation. Code: https://github.com/tspannhw/ApacheDeepLearning201 It's so easy, I am running multiple on the data to see which gives me the results I like. I am like YOLO which is a one of my old favorites.
yolonifitraffic.py demo_mask_rcnn_nyc.py demo_deeplab_nyc.py So we can find the cars and people in these webcams. Use cases can be around traffic optimization, public safety and advertisement optimization. Due to licensing, I thought it better not to show Traffic Camera data here. To industrialize and scale out this process from a single Data Scientist to a national ingestion system, we use the power of Apache NiFi to ingest, process and control flows. I am using the latest Apache NiFi 1.8. Apache NiFi Flow to Ingest and Process Traffic Camera Data First we have a list of URLs that I want to process, this can be sourced and stored anywhere. For ease of use with a static set I am using GenerateFlowFile. I have a JSON file of URLs that I split and parse to call various Computer Vision Python scripts (DeepLab3, MaskRCNN, YOLO and others). YOLO seems to be the most useful so far. I am grabbing the results, some system metrics, metadata and the deep learning analytics generated by Apache MXNet. I split the flow into two portions. One builds GluonCV result data from YOLO and the other creates a file from TensorFlow results done on the fly. Here is a list of my webcam URLs. There's millions of them out there. If your data is tabular, then you need a schema for fast record processing. An Example Dataset Returned from GLUONCV - YOLO Python 3.6 Script I turn JSON data into HDFS Writeable AVRO Data and Can Run Live SQL on It One Output Source Code Be a Joint Slack Group Object
Detection: GluonCV YOLO
v3 and Apache NiFi This can be OpenCV, a static photo or from a URL. Object
Detection: Faster RCNN with GluonCV Faster
RCNN model trained on Pascal VOC dataset with ResNet-50 backbone net = gcv.model_zoo.get_model(faster_rcnn_resnet50_v1b_voc,
pretrained=True) https://gluon-cv.mxnet.io/api/model_zoo.html Instance
Segmentation: Mask RCNN with GluonCV Mask
RCNN model trained on COCO dataset with ResNet-50 backbone net = model_zoo.get_model('mask_rcnn_resnet50_v1b_coco',
pretrained=True) https://gluon-cv.mxnet.io/build/examples_instance/demo_mask_rcnn.html https://github.com/matterport/Mask_RCNN https://arxiv.org/abs/1703.06870 Photo by Ryoji Iwata on Unsplash There's a lot of people crossing the street! Semantic
Segmentation: DeepLabV3 with GluonCV GluonCV DeepLabV3 model on ADE20K
dataset model = gluoncv.model_zoo.get_model('deeplab_resnet101_ade',
pretrained=True) run1.sh demo_deeplab_webcam.py This runs pretty slow on a machine with no GPU. https://www.cityscapes-dataset.com/ http://groups.csail.mit.edu/vision/datasets/ADE20K/ https://arxiv.org/abs/1706.05587 https://gluon-cv.mxnet.io/build/examples_segmentation/demo_deeplab.html That is the best picture of me ever! Semantic
Segmentation: Fully
Convolutional Networks GluonCV FCN model on PASCAL VOC dataset model = gluoncv.model_zoo.get_model(‘fcn_resnet101_voc ', pretrained=True) run1.sh demo_fcn_webcam.py https://gluon-cv.mxnet.io/build/examples_segmentation/demo_fcn.html https://people.eecs.berkeley.edu/~jonlong/long_shelhamer_fcn.pdf It found me. For NYC Dot and PennDot camera usage, you have to sign a developer agreement for a feed! See: http://www.nyc.gov/html/dot/html/home/home.shtml https://www.penndot.gov/Pages/default.aspx
... View more
Labels:
10-18-2018
09:27 PM
2 Kudos
Simple Apache NiFi Operations Dashboard - Part 2 Part 1: https://community.cloudera.com/t5/Community-Articles/Building-a-Custom-Apache-NiFi-Operations-Dashboard-Part-1/ta-p/249060 To access data to display in our dashboard we will use some Spring Boot 2.06 Java 8 microservices to call Apache Hive 3.1.0 tables in HDP 3.0 on Hadoop 3.1. We will have our web site hosted and make REST Calls to Apache NiFi, our microservices, YARN and other APIs. As you can see we can easily incorporate data from HDP 3 - Apache Hive 3.1.0 in Spring Boot java applications with not much trouble. You can see the Maven build script (all code is in github.) Our motivation is put all this data somewhere and show it in a dashboard that can use REST APIs for data access and updates. We may choose to use Apache NiFi for all REST APIs or we can do some in Apache NiFi. We are still exploring. We can also decide to change the backend to HBase 2.0, Phoenix or Druid or a combination. We will see. Spring Boot 2.0.6 Loading JSON Output Spring Boot Microservices and UI https://github.com/tspannhw/operations-dashboard To start I have a simple web page that calls one of the REST APIs. The microservice can be run off of YARN 3.1, Kubernetes, CloudFoundry, OpenShift or any machine that can run a simple Java 8 jar. We can have this HTML as part of a larger dashboard or hosted anywhere. For Parsing the Monitoring Data We have some schemas for Metrics, Status and Bulletins. Now that monitoring data is in Apache Hive, I can query it with easy in Apache Zeppelin (or any JDBC/ODBC tool) Apache Zeppelin Screens We have a lot of reporting tasks for Monitoring NiFi We read from NiFi and send to NiFi, would be nice to have a dedicated reporting cluster Just Show Me Bulletins for MonitorMemory (You can see that in Reporting Tasks) NiFi Query To Limit Which Bulletins We Are Storing In Hive (For Now Just grab Errors) Spring Boot Code for REST APIs Metrics REST API Results Bulletin REST API Results Metrics Home Page Run The Microservice java -Xms512m -Xmx2048m -Dhdp.version=3.0.0 -Djava.net.preferIPv4Stack=true -jar target/operations-0.0.1-SNAPSHOT.jar Maven POM <?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.dataflowdeveloper</groupId>
<artifactId>operations</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging>
<name>operations</name>
<description>Apache Hive Operations Spring Boot</description>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.0.5.RELEASE</version>
<relativePath/>
</parent>
<properties>
<java.version>1.8</java.version>
</properties>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
<exclusions>
<exclusion>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-logging</artifactId>
</exclusion>
<exclusion>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-tomcat</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-log4j2</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-jdbc</artifactId>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>3.1.0</version>
<exclusions>
<exclusion>
<groupId>org.eclipse.jetty.aggregate</groupId>
<artifactId>*</artifactId>
</exclusion>
<exclusion>
<artifactId>servlet-api</artifactId>
<groupId>javax.servlet</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.springframework.restdocs</groupId>
<artifactId>spring-restdocs-mockmvc</artifactId>
<scope>test</scope>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
</plugin>
</plugins>
</build>
<repositories>
<repository>
<id>spring-releases</id>
<url>https://repo.spring.io/libs-release</url>
</repository>
</repositories>
<pluginRepositories>
<pluginRepository>
<id>spring-releases</id>
<url>https://repo.spring.io/libs-release</url>
</pluginRepository>
</pluginRepositories>
</project> With some help from the Internet, we have a simple Javascript to read the Spring Boot /metrics REST API and fill some values: HTML and Javascript (see src/main/resources/static/index.html) <h1>Metrics</h1>
<div id="output" name="output" style="align: center; overflow:auto; height:400px; width:800px" class="white-frame">
<ul id="metrics"></ul>
</div>
<script language="javascript">var myList = document.querySelector('ul');var myRequest = new Request('./metrics/');fetch(myRequest).then(function(response) { return response.json(); }).then(function(data) {for (var i = 0; i < data.length; i++) {var listItem = document.createElement('li');listItem.innerHTML = '<strong>Timestamp' + data[i].timestamp + '</strong>Flow Files Received: ' +data[i].flowfilesreceivedlast5minutes + ' JVM Heap Usage:' + data[i].jvmheap_usage +' Threads Waiting:' + data[i].jvmthread_statestimed_waiting +' Thread Count: ' + data[i].jvmthread_count +' Total Task Duration: ' + data[i].totaltaskdurationnanoseconds +' Bytes Read Last 5 min: ' + data[i].bytesreadlast5minutes +' Flow Files Queued: ' + data[i].flowfilesqueued +' Bytes Queued: ' + data[i].bytesqueued;myList.appendChild(listItem);}});</script> Resources https://github.com/tspannhw/operations-dashboard https://community.hortonworks.com/articles/177256/spring-boot-20-on-acid-integrating-rest-microservi.html https://community.hortonworks.com/articles/207858/more-devops-for-hdf-apache-nifi-and-friends.html https://pierrevillard.com/2017/05/16/monitoring-nifi-ambari-grafana/ Example API Calls to Spring Boot http://localhost:8090/status/Update http://localhost:8090/bulletin/error http://localhost:8090/metrics/ TODO: We will add more calls directly to REST APIs of Apache NiFi clusters for display in our dashboard. REST API for NiFi of Interest /nifi-api/flow/process-groups/root/status /nifi-api/resources /flow/cluster/summary /nifi-api/flow/process-groups/root /nifi-api/Site-to-site /nifi-api/flow/bulletin-board /flow/history\?offset\=1\&count\=100 /nifi-api/flow/search-results\?\q\=NiFi+Operations /nifi-api/flow/status /flow/process-groups/root/controller-services /nifi-api/flow/process-groups/root/status /nifi-api/system-diagnostics
... View more
Labels:
10-18-2018
08:38 PM
4 Kudos
Simple Apache NiFi Operations Dashboard
This is an evolving work in progress, please get involved everything is open source. @milind pandit and I are working on a project to build something useful for teams to analyze their flows, current cluster state, start and stop flows and have a rich one look dashboard.
There's a lot of data provided by Apache NiFi and related tools to aggregate, sort, categorize, search and eventually do machine learning analytics on.
There are a lot of tools that come out of the box that solve parts of these problems. Ambari Metrics, Grafana and Log Search provide a ton of data and analysis abilities. You can find all your errors easily in Log Search and see nice graphs of what is going on in Ambari Metrics and Grafana.
What is cool with Apache NiFi is that is has SitetoSite tasks for sending all the provenance, analytics, metrics and operational data you need to wherever you want it. That includes to Apache NiFi! This is Monitoring Driven Development (MDD).
Monitoring Driven Development (MDD)
MDD - https://pierrevillard.com/2018/08/29/monitoring-driven-development-with-nifi-1-7/
In this little proof of concept work, we grab some of these flows process them in Apache NiFi and then store them in Apache Hive 3 tables for analytics. We should probably push the data to HBase for aggregates and Druid for time series. We will see as this expands.
There are also other data access options including the NiFi REST API and the NiFi Python APIs.
Boostrap Notifier
Send notification when the NiFi starts, stops or died unexpectedly
Two OOTB notifications
Email notification service
HTTP notification service
It’s easy to write a custom notification service
https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#notification_services
Reporting Tasks
AmbariReportingTask (global, per process group)
MonitorDiskUsage(Flowfile, content, provenance repositories)
MonitorMemory
Monitor Disk Usage
MonitorActivity
See:
https://nipyapi.readthedocs.io/en/latest/readme.html
https://community.hortonworks.com/articles/177301/big-data-devops-apache-nifi-flow-versioning-and-au.html
These are especially useful for doing things like purging connections.
Purge it!
nipyapi.canvas.purge_connection(con_id)
nipyapi.canvas.purge_process_group(process_group, stop=False)
nipyapi.canvas.delete_process_group(process_group, force=True, refresh=True)
Use Cases
Example Metrics Data
[ {
"appid" : "nifi",
"instanceid" : "7c84501d-d10c-407c-b9f3-1d80e38fe36a",
"hostname" : "#.#.hortonworks.com",
"timestamp" : 1539411679652,
"loadAverage1min" : 0.93,
"availableCores" : 16,
"FlowFilesReceivedLast5Minutes" : 14,
"BytesReceivedLast5Minutes" : 343779,
"FlowFilesSentLast5Minutes" : 0,
"BytesSentLast5Minutes" : 0,
"FlowFilesQueued" : 59952,
"BytesQueued" : 294693938,
"BytesReadLast5Minutes" : 241681,
"BytesWrittenLast5Minutes" : 398753,
"ActiveThreads" : 2,
"TotalTaskDurationSeconds" : 273,
"TotalTaskDurationNanoSeconds" : 273242860763,
"jvmuptime" : 224997,
"jvmheap_used" : 5.15272616E8,
"jvmheap_usage" : 0.9597700387239456,
"jvmnon_heap_usage" : -5.1572632E8,
"jvmthread_statesrunnable" : 11,
"jvmthread_statesblocked" : 2,
"jvmthread_statestimed_waiting" : 26,
"jvmthread_statesterminated" : 0,
"jvmthread_count" : 242,
"jvmdaemon_thread_count" : 125,
"jvmfile_descriptor_usage" : 0.0709,
"jvmgcruns" : null,
"jvmgctime" : null
} ]
Example Status Data
{
"statusId" : "a63818fe-dbd2-44b8-af53-eaa27fd9ef05",
"timestampMillis" : "2018-10-18T20:54:38.218Z",
"timestamp" : "2018-10-18T20:54:38.218Z",
"actorHostname" : "#.#.hortonworks.com",
"componentType" : "RootProcessGroup",
"componentName" : "NiFi Flow",
"parentId" : null,
"platform" : "nifi",
"application" : "NiFi Flow",
"componentId" : "7c84501d-d10c-407c-b9f3-1d80e38fe36a",
"activeThreadCount" : 1,
"flowFilesReceived" : 1,
"flowFilesSent" : 0,
"bytesReceived" : 1661,
"bytesSent" : 0,
"queuedCount" : 18,
"bytesRead" : 0,
"bytesWritten" : 1661,
"bytesTransferred" : 16610,
"flowFilesTransferred" : 10,
"inputContentSize" : 0,
"outputContentSize" : 0,
"queuedContentSize" : 623564,
"activeRemotePortCount" : null,
"inactiveRemotePortCount" : null,
"receivedContentSize" : null,
"receivedCount" : null,
"sentContentSize" : null,
"sentCount" : null,
"averageLineageDuration" : null,
"inputBytes" : null,
"inputCount" : 0,
"outputBytes" : null,
"outputCount" : 0,
"sourceId" : null,
"sourceName" : null,
"destinationId" : null,
"destinationName" : null,
"maxQueuedBytes" : null,
"maxQueuedCount" : null,
"queuedBytes" : null,
"backPressureBytesThreshold" : null,
"backPressureObjectThreshold" : null,
"isBackPressureEnabled" : null,
"processorType" : null,
"averageLineageDurationMS" : null,
"flowFilesRemoved" : null,
"invocations" : null,
"processingNanos" : null
}
Example Failure Data
[ {
"objectId" : "34c3249c-4a42-41ce-b94e-3563409ad55b",
"platform" : "nifi",
"project" : null,
"bulletinId" : 28321,
"bulletinCategory" : "Log Message",
"bulletinGroupId" : "0b69ea51-7afb-32dd-a7f4-d82b936b37f9",
"bulletinGroupName" : "Monitoring",
"bulletinLevel" : "ERROR",
"bulletinMessage" : "QueryRecord[id=d0258284-69ae-34f6-97df-fa5c82402ef3] Unable to query StandardFlowFileRecord[uuid=cd305393-f55a-40f7-8839-876d35a2ace1,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1539633295746-10, container=default, section=10], offset=95914, length=322846],offset=0,name=783936865185030,size=322846] due to Failed to read next record in stream for StandardFlowFileRecord[uuid=cd305393-f55a-40f7-8839-876d35a2ace1,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1539633295746-10, container=default, section=10], offset=95914, length=322846],offset=0,name=783936865185030,size=322846] due to -40: org.apache.nifi.processor.exception.ProcessException: Failed to read next record in stream for StandardFlowFileRecord[uuid=cd305393-f55a-40f7-8839-876d35a2ace1,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1539633295746-10, container=default, section=10], offset=95914, length=322846],offset=0,name=783936865185030,size=322846] due to -40",
"bulletinNodeAddress" : null,
"bulletinNodeId" : "91ab706b-5d92-454e-bc7a-6911d155fdca",
"bulletinSourceId" : "d0258284-69ae-34f6-97df-fa5c82402ef3",
"bulletinSourceName" : "QueryRecord",
"bulletinSourceType" : "PROCESSOR",
"bulletinTimestamp" : "2018-10-18T20:54:39.179Z"
} ]
Apache Hive 3 Tables
CREATE EXTERNAL TABLE IF NOT EXISTS failure (statusId STRING, timestampMillis BIGINT, `timestamp` STRING, actorHostname STRING, componentType STRING, componentName STRING, parentId STRING, platform STRING, `application` STRING, componentId STRING, activeThreadCount BIGINT, flowFilesReceived BIGINT, flowFilesSent BIGINT, bytesReceived BIGINT, bytesSent BIGINT, queuedCount BIGINT, bytesRead BIGINT, bytesWritten BIGINT, bytesTransferred BIGINT, flowFilesTransferred BIGINT, inputContentSize BIGINT, outputContentSize BIGINT, queuedContentSize BIGINT, activeRemotePortCount BIGINT, inactiveRemotePortCount BIGINT, receivedContentSize BIGINT, receivedCount BIGINT, sentContentSize BIGINT, sentCount BIGINT, averageLineageDuration BIGINT, inputBytes BIGINT, inputCount BIGINT, outputBytes BIGINT, outputCount BIGINT, sourceId STRING, sourceName STRING, destinationId STRING, destinationName STRING, maxQueuedBytes BIGINT, maxQueuedCount BIGINT, queuedBytes BIGINT, backPressureBytesThreshold BIGINT, backPressureObjectThreshold BIGINT, isBackPressureEnabled STRING, processorType STRING, averageLineageDurationMS BIGINT, flowFilesRemoved BIGINT, invocations BIGINT, processingNanos BIGINT) STORED AS ORC
LOCATION '/failure';
CREATE EXTERNAL TABLE IF NOT EXISTS bulletin (objectId STRING, platform STRING, project STRING, bulletinId BIGINT, bulletinCategory STRING, bulletinGroupId STRING, bulletinGroupName STRING, bulletinLevel STRING, bulletinMessage STRING, bulletinNodeAddress STRING, bulletinNodeId STRING, bulletinSourceId STRING, bulletinSourceName STRING, bulletinSourceType STRING, bulletinTimestamp STRING) STORED AS ORC
LOCATION '/error';
CREATE EXTERNAL TABLE IF NOT EXISTS memory (objectId STRING, platform STRING, project STRING, bulletinId BIGINT, bulletinCategory STRING, bulletinGroupId STRING, bulletinGroupName STRING, bulletinLevel STRING, bulletinMessage STRING, bulletinNodeAddress STRING, bulletinNodeId STRING, bulletinSourceId STRING, bulletinSourceName STRING, bulletinSourceType STRING, bulletinTimestamp STRING) STORED AS ORC
LOCATION '/memory'
;
// backpressure
CREATE EXTERNAL TABLE IF NOT EXISTS status (statusId STRING, timestampMillis BIGINT, `timestamp` STRING, actorHostname STRING, componentType STRING, componentName STRING, parentId STRING, platform STRING, `application` STRING, componentId STRING, activeThreadCount BIGINT, flowFilesReceived BIGINT, flowFilesSent BIGINT, bytesReceived BIGINT, bytesSent BIGINT, queuedCount BIGINT, bytesRead BIGINT, bytesWritten BIGINT, bytesTransferred BIGINT, flowFilesTransferred BIGINT, inputContentSize BIGINT, outputContentSize BIGINT, queuedContentSize BIGINT, activeRemotePortCount BIGINT, inactiveRemotePortCount BIGINT, receivedContentSize BIGINT, receivedCount BIGINT, sentContentSize BIGINT, sentCount BIGINT, averageLineageDuration BIGINT, inputBytes BIGINT, inputCount BIGINT, outputBytes BIGINT, outputCount BIGINT, sourceId STRING, sourceName STRING, destinationId STRING, destinationName STRING, maxQueuedBytes BIGINT, maxQueuedCount BIGINT, queuedBytes BIGINT, backPressureBytesThreshold BIGINT, backPressureObjectThreshold BIGINT, isBackPressureEnabled STRING, processorType STRING, averageLineageDurationMS BIGINT, flowFilesRemoved BIGINT, invocations BIGINT, processingNanos BIGINT) STORED AS ORC
LOCATION '/status';
... View more
Labels:
10-12-2018
06:11 PM
8 Kudos
Running TensorFlow on YARN 3.1 with or without GPU
You have the option to run with or without Docker containers. If you are not using Docker containers you will need CUDA, TensorFlow and all your Data Science libraries.
See: https://community.hortonworks.com/articles/222242/running-apache-mxnet-deep-learning-on-yarn-31-hdp.html
Tips from Wangda
Basically GPU on YARN give you isolation of GPU device. Let's say a Node with 4 GPUS. First task comes ask 1 GPU. (Yarn.io/gpu=1). And YARN NM gives the task GPU0. Then the second task comes, ask 2 GPUs. And YARN NM gives the task GPU1/GPU2. So from TF perspective, you don't need to specify which GPUs to use. TF will automatically detect and consume whatever available to the job. For this case, task2 cannot see other GPUs apart from GPU1/GPU2.
If you wish to run Apache MXNet deep learning programs, see this article: https://community.hortonworks.com/articles/222242/running-apache-mxnet-deep-learning-on-yarn-31-hdp.html
Installation
Install CUDA and Nvidia libraries if you have NVidia cards.
Install Python 3.x
Install Docker
Install PIP
sudo yum groupinstall 'Development Tools' -y
sudo yum install cmake git pkgconfig -y
sudo yum install libpng-devel libjpeg-turbo-devel jasper-devel openexr-devel libtiff-devel libwebp-devel -y
sudo yum install libdc1394-devel libv4l-devel gstreamer-plugins-base-devel -y
sudo yum install gtk2-devel -ysudo yum install tbb-devel eigen3-devel -y
pip3.6 install --upgrade pip
pip3.6 install tensorflow
pip3.6 install numpy -U
pip3.6 install scikit-learn -U
pip3.6 install opencv-python -U
pip3.6 install keras
pip3.6 install hdfs
git clone https://github.com/tensorflow/models/
You can see a docker example: https://github.com/hortonworks/hdp-assemblies/blob/master/tensorflow/markdown/Dockerfile.md
https://github.com/hortonworks/hdp-assemblies/blob/master/tensorflow/markdown/TensorflowOnYarnTutorial.md
Run Command for an Example Classification
yarn jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar -jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar -shell_command python3.6 -shell_args "/opt/demo/DWS-DeepLearning-CrashCourse/tf.py /opt/demo/images/photo1.jpg" -container_resources memory-mb=512,vcores=1
Without Docker
container_resources memory-mb=3072,vcores=1,yarn.io/gpu=2
With Docker (Enable it first: https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.1/data-operating-system/content/dosg_enable_gpu_for_docker_ambari_cluster.html)
-shell_env YARN_CONTAINER_RUNTIME_TYPE=docker \
-shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=<docker-image-name> \
Running a More Complex Training Job
https://github.com/hortonworks/hdp-assemblies/blob/master/tensorflow/markdown/RunTensorflowJobUsingNativeServiceSpec.md
This is the main example: https://github.com/tensorflow/models/tree/master/tutorials/image/cifar10_estimator
yarn jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar -jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar -shell_command python3.6 -shell_args "/opt/demo/models/tutorials/image/cifar10_estimator/cifar10_main.py --data-dir=hdfs://default/tmp/cifar-10-data --job-dir=hdfs://default/tmp/cifar-10-jobdir --train-steps=10000 --eval-batch-size=16 --train-batch-size=16 --sync --num-gpus=0" -container_resources memory-mb=512,vcores=1
Example Output
[hdfs@princeton0 DWS-DeepLearning-CrashCourse]$ python3.6 tf.py
2018-10-15 02:37:23.892791: W tensorflow/core/framework/op_def_util.cc:355] Op BatchNormWithGlobalNormalization is deprecated. It will cease to work in GraphDef version 9. Use tf.nn.batch_normalization().
2018-10-15 02:37:24.181707: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
273 racer, race car, racing car 37.46013343334198%
274 sports car, sport car 25.35209059715271%
267 cab, hack, taxi, taxicab 11.118262261152267%
268 convertible 9.854312241077423%
271 minivan 3.2295159995555878%
Output Written to HDFS
hdfs dfs -ls /tfyarn
Found 1 items
-rw-r--r-- 3 root hdfs 457 2018-10-15 02:35 /tfyarn/tf_uuid_img_20181015023542.json
hdfs dfs -cat /tfyarn/tf_uuid_img_20181015023542.json
{"node_id273": "273", "humanstr273": "racer, race car, racing car", "score273": "37.46013343334198", "node_id274": "274", "humanstr274": "sports car, sport car", "score274": "25.35209059715271", "node_id267": "267", "humanstr267": "cab, hack, taxi, taxicab", "score267": "11.118262261152267", "node_id268": "268", "humanstr268": "convertible", "score268": "9.854312241077423", "node_id271": "271", "humanstr271": "minivan", "score271": "3.2295159995555878"}
Full Source Code
https://github.com/tspannhw/TensorflowOnYARN
Resources
https://www.tensorflow.org/
https://github.com/tspannhw/ApacheDeepLearning101/blob/master/yarn.sh
https://github.com/hortonworks/hdp-assemblies/
https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.1/data-operating-system/content/configuring_gpu_scheduling_and_isolation.html
https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.1/data-operating-system/content/dosg_enable_gpu_for_docker_ambari_cluster.html
https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.0/data-operating-system/content/dosg_recommendations_for_running_docker_containers_on_yarn.html
https://feathercast.apache.org/2018/10/02/deep-learning-on-yarn-running-distributed-tensorflow-mxnet-caffe-xgboost-on-hadoop-clusters-wangda-tan/
https://github.com/deep-diver/CIFAR10-img-classification-tensorflow
https://aajisaka.github.io/hadoop-document/hadoop-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/RunningDistributedCifar10TFJobs.html
https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
https://github.com/tspannhw/ApacheDeepLearning101/blob/master/analyzehdfs.py
https://github.com/open-source-for-science/TensorFlow-Course
https://github.com/hortonworks/hdp-assemblies/ https://github.com/hortonworks/hdp-assemblies/blob/master/tensorflow/markdown/Dockerfile.md https://github.com/hortonworks/hdp-assemblies/blob/master/tensorflow/markdown/TensorflowOnYarnTutorial.md https://github.com/hortonworks/hdp-assemblies/blob/master/tensorflow/markdown/RunTensorflowJobUsingHelperScript.md
Documentation
https://hadoop.apache.org/docs/r3.1.0/
https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.1/data-operating-system/content/options_distributed_shell_gpu.html
Coming Soon
https://github.com/leftnoteasy/hadoop-1/tree/submarine/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine
https://conferences.oreilly.com/strata/strata-ny/public/schedule/detail/68289
https://github.com/leftnoteasy/hadoop-1/blob/submarine/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/src/site/QuickStart.md
... View more
Labels:
10-05-2018
05:51 PM
2 Kudos
Posting Images with Apache NiFi 1.7 and a Custom Processor I have been using a shell script for this since Apache NiFi did not have a good way to natively post an image to HTTP servers su such as the model server for Apache MXNet. So I wrote a quick and dirty processor that posts an image there and gathers the headers, result body, status text and status code and returns them to you as attributes. In this example I am download images from picsum.photos free photo service. To use this new processor, download to your lib directory and restart Apache NiFi, then you can add the PostImageProcessor. Eclipse For Building My Processor Configure the Post Image Processor with your URL, fieldname, imagename and image type. MXNet Model Server Results The Attribute Results From the Data Results Example Results post.header
{Server=[Werkzeug/0.14.1 Python/3.6.6], Access-Control-Allow-Origin=[*], Content-Length=[396], Date=[Fri, 05 Oct 2018 17:47:22 GMT], Content-Type=[application/json]}
post.results
{"prediction":[[{"probability":0.24173378944396973,"class":"n02281406 sulphur butterfly, sulfur butterfly"},{"probability":0.19173663854599,"class":"n02190166 fly"},{"probability":0.052654966711997986,"class":"n02280649 cabbage butterfly"},{"probability":0.05147545784711838,"class":"n03485794 handkerchief, hankie, hanky, hankey"},{"probability":0.048753462731838226,"class":"n02834397 bib"}]]}
post.status
OK
post.statuscode
200 Results from HTTP Posting an Image to MXNet Model Server [INFO 2018-10-05 13:47:22,217 PID:88561 /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/mms/serving_frontend.py:predict_callback:467] Request input: data should be image with jpeg format.
[INFO 2018-10-05 13:47:22,218 PID:88561 /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/mms/request_handler/flask_handler.py:get_file_data:137] Getting file data from request.
[INFO 2018-10-05 13:47:22,262 PID:88561 /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/mms/serving_frontend.py:predict_callback:510] Response is text.
[INFO 2018-10-05 13:47:22,262 PID:88561 /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/mms/request_handler/flask_handler.py:jsonify:159] Jsonifying the response: {'prediction': [[{'probability': 0.24173378944396973, 'class': 'n02281406 sulphur butterfly, sulfur butterfly'}, {'probability': 0.19173663854599, 'class': 'n02190166 fly'}, {'probability': 0.052654966711997986, 'class': 'n02280649 cabbage butterfly'}, {'probability': 0.05147545784711838, 'class': 'n03485794 handkerchief, hankie, hanky, hankey'}, {'probability': 0.048753462731838226, 'class': 'n02834397 bib'}]]}
[INFO 2018-10-05 13:47:22,263 PID:88561 /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/werkzeug/_internal.py:_log:88] 127.0.0.1 - - [05/Oct/2018 13:47:22] "POST /squeezenet/predict HTTP/1.1" 200 - Example HTTP Server https://github.com/awslabs/mxnet-model-server Source Code For Processor https://github.com/tspannhw/nifi-postimage-processor Pre-Built NAR To Install https://github.com/tspannhw/nifi-postimage-processor/releases/tag/1.0
... View more
Labels:
10-04-2018
01:33 PM
2 Kudos
Properties File Lookup Augmentation of Data Flow in Apache NiFi 1.7.x A really cool technologist contacted me on LinkedIn and asked an interesting question Tim, How do I read values from a properties file and use them in my flow. I want to update/inject an attribute with this value. If you don't want to use the Variable Registry, but want to inject a value from a properties file how to do it. You could run some REST server and read it or does some file reading hack. But we have a great service to do this very easily! In my UpdateAttribute (or in your regular attributes already), I have an attribute named, keytofind. This contains a lookup key such as an integer or a string key. We will find that value in the properties value and give you that in an attribute of your choosing. We have a Controller Service to handle this for you. It reads from your specified properties file. Make sure Apache NiFi has permissions to that path and can read the file. PropertiesFileLookupService https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-lookup-services-nar/1.7.1/org.apache.nifi.lookup.PropertiesFileLookupService/index.html We lookup the key specified in the “keytofind”. It returns a value that you specify as an extra attribute, mine is “updatedvalue”. This is my properties file: -rwxrwxrwx 1 tspann staff 67 Oct 4 09:15 lookup.properties
stuff1=value1
stuff2=value2
stuff3=value other
tim=spann
nifi=cool In this example, we are using the LookupAttribute processor. You can also use the LookupRecord processor depending on your needs. Resources:
http://discover.attunity.com/apache-nifi-for-dummies-en-report-go-c-lp8558.html https://community.hortonworks.com/articles/140231/data-flow-enrichment-with-nifi-part-2-lookupattrib.html https://community.hortonworks.com/articles/189213/etl-with-lookups-with-apache-hbase-and-apache-nifi.html The Flow lookup-from-properties-values.xml
... View more
Labels:
09-28-2018
09:30 PM
This is an extension of this article: https://community.hortonworks.com/articles/163776/parsing-any-document-with-apache-nifi-15-with-apac.html
... View more