1973
Posts
1225
Kudos Received
124
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 2486 | 04-03-2024 06:39 AM | |
| 3840 | 01-12-2024 08:19 AM | |
| 2076 | 12-07-2023 01:49 PM | |
| 3062 | 08-02-2023 07:30 AM | |
| 4195 | 03-29-2023 01:22 PM |
05-05-2018
01:51 PM
3 Kudos
Tracking Air Quality with HDP and HDF: Part 1 - Apache NiFi Ingest Part 2: Plan Data Storage. Store to Apache Hive, Apache Druid and Apache HBase. Part 3: Query and Visualize Data with Apache Zeppelin and Superset There was an Air Quality alert a few days ago near me and I was curious how I could keep track of this important environmental information. So NiFi! This data is different from weather data, but makes a lot of sense for analytics to add in data from Weather, Social and locally captured cameras. It's very easy to ingest these JSON and Camera Images via Apache NiFi. In the next section we will analyze the datasets and determine how we can aggregate and accumulate massive quantities of this data for tracking air quality in various areas over time and use that as a dimension with other relevant data like weather. We are tracking contaminants and particles in the air. These include:
pm25, pm10 - atmospheric particulate matter so2 - sulfur dioxide no2 - nitrogen dioxide o3 - ozone co - carbon monoxide Photos Courtesy of HazeCam - Brigantine, NJ Example Data {"location":"ARB OER","city":"CA8 - ARB","country":"US","distance":3848728.319714322,"measurements":[{"parameter":"pm25","value":-4,"lastUpdated":"2016-08-08T16:00:00.000Z","unit":"µg/m³","sourceName":"AirNow","averagingPeriod":{"value":1,"unit":"hours"}}],"coordinates":{"latitude":38.568504,"longitude":-121.493256}} {
"location" : "MONTG",
"parameter" : "o3",
"date" : {
"utc" : "2018-05-05T12:00:00.000Z",
"local" : "2018-05-05T06:00:00-06:00"
},
"value" : 0.004,
"unit" : "ppm",
"coordinates" : {
"latitude" : 32.4069,
"longitude" : -86.2564
},
"country" : "US",
"city" : "Montgomery"
} Most of the data is arrays of JSON, so we can easily break that down into individual JSON records, derive an AVRO Schema from that data and then process it as we want. We can join them together and then convert into ORC files or HBase rows. Data Feed Links Haze Cam Provides Web Camera Images of Potential Haze http://hazecam.net/images/main/brigantine_right.jpg OpenAQ (https://openaq.org/#/?_k=7mfsz6) Provides Open Air Quality Data https://api.openaq.org/v1/latest?country=US https://api.openaq.org/v1/measurements?country=US&date_from=2018-05-04 Air NOW API (Provides forecasts and current conditions) http://www.airnowapi.org/aq/observation/zipCode/current/?format=application/json&zipCode=08520&distance=50&API_KEY=SIGNUPFORANAPIKEY http://www.airnowapi.org/aq/forecast/zipCode/?format=application/json&zipCode=08520&date=2018-05-02&distance=25&API_KEY=SIGNUPFORANAPIKEY EPA's Air Quality Notifications http://feeds.enviroflash.info/ https://www.airnow.gov/index.cfm?action=airnow.national http://feeds.enviroflash.info/rss/realtime/445.xml Other Sources http://feeds.enviroflash.info/cap/aggregate.xml https://docs.openaq.org/
... View more
Labels:
05-03-2018
04:00 PM
3 Kudos
Converting CSV Files to Apache Hive Tables with Apache ORC Files I received some CSV files of data to load into Apache Hive. There are many ways to do this, but I wanted to see how easy it was to do in Apache NiFi with zero code. I read CSV files from a directory of files. Then I can Convert the CSV to AVRO directly with ConvertRecord. I will need a schema, so I use the below settings for InferAvroSchema. if ever file is different, you will need to do this every time. CSV Reader I use the Jackson CSV parser which works very well. The first line of the CSV is a header. It can figure out the fields from the header. Once I have an Apache AVRO file it's easy to convert to Apache ORC and then store in HDFS. Template: csvprocess.xml
... View more
Labels:
04-27-2018
04:39 PM
2 Kudos
ETL With Lookups with Apache HBase and Apache NiFi (Microservices Style ETL) When we are ingesting tabular / record-oriented data, we often want to enrich the data by replacing ids with descriptions or visa-versa. There are many transformations that may need to happen before the data is in a happy state. When you are denormalizing your data in Hadoop and usually building very wide tables you often want descriptions or other data to enhance it's usability. Only one call to get everything you need is nice, especially when you have 100 trillion records. We are utilizing a lot of things built already (https://community.hortonworks.com/articles/146198/data-flow-enrichment-with-nifi-part-3-lookuprecord.html). Make sure you read Abdelkrim's first 3 lookup articles. I added some fields to his generated data for testing. I want to do my lookups against HBase which is a great NoSQL store for lookup tables and generate datasets. First I created an HBase Table to use for lookups. Create HBase Table For Lookups create 'lookup_', 'family' Table With Data Most people would have a pre-populated table for lookups. I don't and since we are using a generator to build the lookup ids, I am building the lookup descriptions with a REST CALL at the same time. We could also have a flow that if you don't find the lookup add it, we could also have another flow ingesting the lookup values and add/update those when needed. REST API To Generate Product Descriptions https://baconipsum.com/api/?type=meat&sentences=1&format=text I found this cool API that returns a sentence of meat words. I use this as our description, because MEAT! Call the Bacon API!!! Let's turn our plain text into a clean JSON document Then I store it in HBase as my lookup table. You probably already have a lookup table. This is a demo and I am filling it with my generator. This is not a best practice or a good design pattern. This is a lazy way to populate a table. Example Apache NiFi Flow (Using Apache NiFi 1.5) Generate Some Test Data (https://community.hortonworks.com/articles/146198/data-flow-enrichment-with-nifi-part-3-lookuprecord.html) Generate A Json Document (Note the Empty prod_desc) {
"ts" : "${now():format('yyyymmddHHMMSS')}",
"updated_dt" : "${now()}",
"id_store" : ${random():mod(5):toNumber():plus(1)},
"event_type" : "generated",
"uuid" : "${UUID()}",
"hostname" : "${hostname()}",
"ip" : "${ip()}",
"counter" : "${nextInt()}",
"id_transaction" : "${random():toString()}",
"id_product" : ${random():mod(500000):toNumber()},
"value_product" : ${now():toNumber()},
"prod_desc": ""
}
Lookup Your Record This is the magic. We take in our records, in this case we are reading JSON records and writing JSON records, we could choose CSV, AVRO or others. We connect to the HBase Record Lookup Service. We replace the current prod_desc field in the record with what is returned by the lookup. We use the id_product field as the lookup key. There is nothing else needed to change records in stream. HBase Record Lookup Service HBase Client Service Used by HBase Record Lookup Service We can use UpdateRecord to cleanup, transform or modify any field in the records in stream. Original File {
"ts" : "201856271804499",
"updated_dt" : "Fri Apr 27 18:56:15 UTC 2018",
"id_store" : 1,
"event_type" : "generated",
"uuid" : "0d16967d-102d-4864-b55a-3f1cb224a0a6",
"hostname" : "princeton1",
"ip" : "172.26.217.170",
"counter" : "7463",
"id_transaction" : "5307056748245491959",
"id_product" : 430672,
"value_product" : 1524855375500,
"prod_desc": ""
}
Final File (Note we have populated prod_desc with MEAT!) [ {
"ts" : "201856271804499",
"prod_desc" : "Pork chop leberkas brisket chuck, filet mignon turducken hamburger.",
"updated_dt" : "Fri Apr 27 18:56:15 UTC 2018",
"id_store" : 1,
"event_type" : "generated",
"uuid" : "0d16967d-102d-4864-b55a-3f1cb224a0a6",
"hostname" : "princeton1",
"ip" : "172.26.217.170",
"counter" : "7463",
"id_transaction" : "5307056748245491959",
"id_product" : 430672,
"value_product" : 1524855375500
} ]
References:
https://community.hortonworks.com/articles/171787/hdf-31-executing-apache-spark-via-executesparkinte.html https://community.hortonworks.com/articles/155527/ingesting-golden-gate-records-from-apache-kafka-an.html https://community.hortonworks.com/questions/174144/lookuprecord-and-simplecsvfilelookupservice-in-nif.html https://community.hortonworks.com/articles/138632/data-flow-enrichment-with-nifi-lookuprecord-proces.html https://community.hortonworks.com/articles/64122/incrementally-streaming-rdbms-data-to-your-hadoop.html For those wishing to not include meat in their data, there are alternatives: https://www.vegguide.org/site/api-docs Example Flow etlv2.xml
... View more
Labels:
04-25-2018
02:00 PM
You can use executesql You can use sqoop for initial export https://community.hortonworks.com/articles/108718/ingesting-rdbms-data-as-new-tables-arrive-automagi.html
... View more
04-24-2018
06:44 PM
Vision Thing Part 3: Image Analytics Open Source Computer Vision with TensorFlow, Apache MiniFi, Apache NiFi, OpenCV, Apache Tika and Python In preparation for this talk, I am releasing some articles detailing how to work with images. In this one For Linux machines I recommend building OpenCV yourself and installing the Python connector. sudo yum install -y https://centos7.iuscommunity.org/ius-release.rpm
sudo yum update -y
sudo yum groupinstall 'Development Tools' -y
sudo yum install cmake git pkgconfig -y
sudo yum install libpng-devel libjpeg-turbo-devel jasper-devel openexr-devel libtiff-devel libwebp-devel -y
sudo yum install libdc1394-devel libv4l-devel gstreamer-plugins-base-devel -y
sudo yum install gtk2-devel -y
sudo yum install tbb-devel eigen3-devel -y
sudo yum install -y python36u python36u-libs python36u-devel python36u-pip -y
pip3.6 install numpy
cd ~
git clone https://github.com/Itseez/opencv.git
cd opencv
git checkout 3.1.0
git clone https://github.com/Itseez/opencv_contrib.git
cd opencv_contrib
git checkout 3.1.0
cd ~/opencv
mkdir build
cd build
cmake -D CMAKE_BUILD_TYPE=RELEASE \
-D CMAKE_INSTALL_PREFIX=/usr/local \
-D OPENCV_EXTRA_MODULES_PATH=~/opencv_contrib/modules \
-D INSTALL_C_EXAMPLES=OFF \
-D INSTALL_PYTHON_EXAMPLES=ON \
-D BUILD_EXAMPLES=ON \
-D BUILD_OPENCV_PYTHON2=ON -D BUILD_OPENCV_PYTHON3=ON ..
sudo make
sudo make install
sudo ldconfig
pip3.6 install opencv-python
... View more
Labels:
04-15-2018
09:55 AM
3 Kudos
TIBCO Enterprise Message Service https://www.tibco.com/products/tibco-enterprise-message-service I tested this against the most recent release of TIBCO Enterprise Message Service and their JMS driver available via trial download. I followed the very easy install directions. I downloaded it to a Centos 7 server. Expanded my download to TIB_ems-dev_8.4.0_linux_x86_64 Then made it executable and ran TIBCOUniversalInstaller-lnx-x86-64.bin --console. I used all the defaults (I picked server and client) and then quickly ran the finished install server. Running Tibco on Centos 7 cd /opt/tibco/ems/8.4/bin/
./tibemsd64 -config ~/TIBCO_HOME/tibco/cfgmgmt/ems/data/tibemsd.conf Example JMS Queue Settings URL: tcp://servername:7222
class: com.tibco.tibjms.TibjmsQueueConnectionFactory
Directory: /opt/tibco/ems/8.4/lib/ I believe it just uses these files from that directory:
tibjms.jar jms-2.0.jar Once I have my server and port shown, it's easy to add those settings to Apache NiFi. The settings I need to Publish messages is below. After you enter your username and queue, you need to create (or use) a controller service. Then we use our settings for our server, mine are the default ones. Make sure you enter the lib directory containing your jars and that it is on the Apache NiFi server and Apache NiFi user has permissions to read them. You can also use this same controller to Consume JMS messages from TIBCO EMS. These are example metadata attributes that Apache NiFi provides to you on message receipt. Example Run Log of my TIBCO EMS v8.4.0 Server running on Linux. Example Flow tibco-jms.xml Example Data {
"top1pct" : "43.1",
"top5" : "n09428293 seashore, coast, seacoast, sea-coast",
"top4" : "n04371774 swing",
"top3" : "n02894605 breakwater, groin, groyne, mole, bulwark, seawall, jetty",
"top2" : "n03933933 pier",
"top1" : "n03216828 dock, dockage, docking facility",
"top2pct" : "34.3",
"imagefilename" : "/opt/demo/images/201817121004997.jpg",
"top3pct" : "3.8",
"uuid" : "mxnet_uuid_img_20180413140808",
"top4pct" : "2.7",
"top5pct" : "2.4",
"runtime" : "1.0"
} This is example JSON data, we could use any TEXT. References https://docs.tibco.com/pub/sb-lv/2.2.1/doc/html/authoring/jmsoperator.html http://www.sourcefreak.com/2013/06/tibco-ems-sender-and-receiver-in-java/ https://docs.tibco.com/pub/adr3bs/1.2.0/doc/html/GUID-7111BD21-86C6-4F3A-89B3-B03BFCE15E0D.html http://tutorialspedia.com/tibco-ems-how-to-send-and-receive-jms-messages-in-queues/ https://github.com/SolaceLabs/solace-integration-guides/tree/master/src/nifi-jms-jndi https://docs.tibco.com/pub/activematrix_businessworks/6.2.0/doc/html/GUID-624942EB-89A3-400F-A9D1-B906107E6985.html https://github.com/mcqueary/spring-jms-tibco/blob/master/README.md
... View more
Labels:
04-06-2018
09:54 PM
These setup steps may help for your particular machine.
apt-get install curl wget -y
wget https://github.com/bazelbuild/bazel/releases/download/0.11.1/bazel-0.11.1-installer-linux-x86_64.sh
./bazel-0.11.1-installer-linux-x86_64.sh
apt-get install libblas-dev liblapack-dev python-dev libatlas-base-dev gfortran python-setuptools python-h5py -y
pip3 install six numpy wheel
pip3 install --user numpy scipy matplotlib pandas sympy nose
pip3 install --upgrade tensorflow
git clone --recurse-submodules https://github.com/tensorflow/tensorflow
wget http://mirror.jax.hugeserver.com/apache/nifi/minifi/0.4.0/minifi-0.4.0-bin.zip
wget https://storage.googleapis.com/download.tensorflow.org/models/inception5h.zip
wget http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz
... View more
04-06-2018
07:04 PM
2 Kudos
Using a Pre-Trained Neural Network on MS CoCo DataSet using Mask R-CNN in TensorFlow / Keras.
I am always looking for use cases, always. Yesterday I gave a talk at the IoT Fusion Conference in Philadelphia about IoT. I had a Raspberry Pi taking web camera images. I mentioned I wasn't pointing at people because they may not want to be photographed. I am concerned with privacy. It gave me the idea it would be cool to block people or other things from images. Then on queue, a great library shows up in github. Thanks to Minimaxir's Person Blocker, I can now use Apache NiFi to pull remove people from images. I did a couple of minor tweaks to his code to add OpenCV image capture and output some JSON information on what happened. I am running this on an OSX laptop, but at some point I'll move it to a Raspberry Pi, TinkerBoard or NVidia Jetson TX1.
Please support this project: https://github.com/minimaxir/person-blocker
https://www.patreon.com/minimaxir
Using this pretrained neural network, we can block anything in this list of classes https://github.com/minimaxir/person-blocker/blob/master/classes.py.
You will need to download the MS Coco classes which aren't that large. (mask_rcnn_coco.h5)
This works without a GPU!!
To Install the Libraries:
pip install --upgrade pip
pip install keras
pip install tensorflow
pip install opencv-python
pip install uuid
git clone https://github.com/minimaxir/person-blocker.git
pip3 install -r requirements.txt
There are a bunch of requirements such as Python 3, recent TensorFlow (I used TF 1.7), Keras, Numpy, SkiImage, SCIPY, Pillow, Cython, H5PY, Matplotlib and imageio. I added uuid and json libraries. So you install them and get running. The JSON produced as a record of the run has the following schema:
Schema
{ "type" : "record", "name" : "personblocker", "fields" : [ { "name" : "uuid", "type" : "string", "doc" : "Type inferred from '\"person_uuid_20180406203059f_b7ce1056-9d88-4e7f-b4dd-0e8c8d6e7086\"'" }, { "name" : "runtime", "type" : "string", "doc" : "Type inferred from '\"27\"'" }, { "name" : "host", "type" : "string", "doc" : "Type inferred from '\"server.local\"'" }, { "name" : "ts", "type" : "string", "doc" : "Type inferred from '\"2018-04-06 20:30:59\"'" }, { "name" : "ipaddress", "type" : "string", "doc" : "Type inferred from '\"10.1\"'" }, { "name" : "imagefilename", "type" : "string", "doc" : "Type inferred from '\"person_blocked_20180406203057\"'" }, { "name" : "originalfilename", "type" : "string", "doc" : "Type inferred from '\"images2/tx1_image_b9ebdd52-9a9f-45f0-b71c-a44c54f14b71_20180406203032.jpg\"'" } ] }
Example Output JSON
{"uuid": "person_uuid_20180406201647f_1d2c31bc-c232-4976-a350-747ffabf5afe", "runtime": "76", "host": "mymachine.local", "ts": "2018-04-06 20:16:47", "ipaddress": "10.1.1.12", "imagefilename": "person_blocked_20180406201632", "originalfilename": "images2/tx1_image_0309425f-12ca-4331-a810-21067cbaa8f2_20180406201531.jpg"}
run.sh
python3 -W ignore pb.py 2>/dev/null
Now You See Me (OpenCV Captured Image)
Now You Don't (Person Blocker - it did block a stove pipe)
There's also the option to produce a GIF that moves which is cool but takes time and space. I commented that out.
My modified example:
https://github.com/tspannhw/OpenSourceComputerVision
See run.sh and pb.py.
... View more
Labels:
04-03-2018
05:36 PM
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_command-line-installation/content/configure_livy.html
... View more
04-03-2018
04:12 PM
1 Kudo
Using MQTT From MiniFi Java Agent From a standard Apache NiFi 1.5 download. Copy
nifi-standard-services-api-nar-1.5.0.3.1.1.0-35.nar nifi-mqtt-nar-1.5.0.3.1.1.0-35.nar To minifi 0.40. lib directory. PublishMQTT Publish Sensor Data Via MQTT This is our example MiniFi flow. We can drop out the GetFile (to grab images) and push to NiFi Flow if we just want to do a simple MQTT use case. I am using CloudMQTT, you can use any MQTT broker like Mosquitto or HiveMQ. Ingest in Apache NiFi 1.5 We can process data from MQTT and/or standard Apache NiFi S2S HTTPS. Processing continues as regular. I add another route for mqtt source. References: If you are doing SSL, see https://community.hortonworks.com/articles/47854/accessing-facebook-page-data-from-apache-nifi.html
... View more
Labels: