1973
Posts
1225
Kudos Received
124
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 2460 | 04-03-2024 06:39 AM | |
| 3807 | 01-12-2024 08:19 AM | |
| 2054 | 12-07-2023 01:49 PM | |
| 3039 | 08-02-2023 07:30 AM | |
| 4161 | 03-29-2023 01:22 PM |
03-08-2017
02:02 PM
7 Kudos
We needed to create a data lake of all the companies data, the first set of data was from SQL Server. So using Apache NiFi 1.1.x I ingested it into Hive / ORC. A few of the smaller constantly changing tables need to stay in SQL Server so we need to be able to join tables in Hive with tables in SQL Server. Fortunately, Microsoft provides a very cool extension to SQL Server called Polybase that let's us build external tables pointing to Hadoop. Once those tables are referenced they act like regular tables. So now all the companies data including other data sources loaded into Hadoop Hive ORC tables can be queried and joined. And it's fast!
Step 1: Apache NiFi Magic QueryDatabaseTable: One for each table picking a sequence id primary key the tables have. Could also do timestamp. ConvertAVROtoORC: Point to /etc/hive/conf/hive-site.xml PutHDFS: Store in a separate HDFS, write this down as we need it for polybase. ReplaceText (GenerateHiveDDL): Builds create Hive table string automagically. You can do this manually. PutHiveQL: Runs the Hive table creation DDL. You can do this manually. For my example, I had six tables to do. So I just copied that set of processors and made 5 copies and changed table names and HDFS directories. That's all folks. Step 2: Prepare Polybase Change the yarn-site.xml
file on the SQL Server machine to point to the HDP 2.5. E:\Program
Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\Binn\Polybase\Hadoop\conf\yarn-site.xml <property>
<name>yarn.application.classpath</name>
<value>$HADOOP_CONF_DIR,/usr/hdp/current/hadoop-client/*,/usr/hdp/current/hadoop-client/lib/*,/usr/hdp/current/hadoop-hdfs-client/*,/usr/hdp/current/hadoop-hdfs-client/lib/*,/usr/hdp/current/hadoop-yarn-client/*,/usr/hdp/current/hadoop-yarn-client/lib/*</value>
</property> Step 3: Run DDL Necessary for Polybase Access to Hadoop CREATE EXTERNAL DATA SOURCE [HDP2]
WITH( TYPE = HADOOP, LOCATION = 'hdfs://hadoopserver:8020')
CREATE EXTERNAL FILE FORMAT ORC WITH (
FORMAT_TYPE =ORC );
CREATE EXTERNAL
TABLE [dbo].[myTableIsExcellent]
( [myid] int NULL,
[yourid] int NULL,
[theirid] varchar(64) NULL,
[somedata] varchar(255) NULL, [somedata] int NULL)
WITH (LOCATION='/import/mydirectory/',
DATA_SOURCE =HDP2, FILE_FORMAT =ORC );
2b. Create an external data source pointing to your HDFS on HDP 2c. Create an external file format like ORC that your tables use. I recommend ORC. 2d. Create your external tables pointing to their HDFS directories containing ORC files. Step 4: Polybase Federated Query SELECT TOP (1000) [id], hs.[id], hs.[name], c.[description]
FROM [database].[dbo].[MyHiveTable] hs,
[LocalSQLServerTableName] c
WHERE c.id = hs.id
ORDER BY c.name desc
You don't get much easier than that. Looks like a regular table, acts like a regular table, queries like a regular table. Users won't know or care where the data is. They don't have to know you have 100 petabytes of data sitting in a massive Hortonworks Data Platform. References:
https://msdn.microsoft.com/en-us/library/mt163689.aspx http://blog.pragmaticworks.com/sql-server-2016-polybase https://msdn.microsoft.com/en-us/library/dn935026.aspx https://hernandezpaul.wordpress.com/2016/05/29/polybase-query-service-and-hadoop-welcome-sql-server-2016/ https://realizeddesign.blogspot.com/2015/09/setting-up-polybase-for-yarn-in-sql.html https://blogs.msdn.microsoft.com/sqlcat/2016/06/21/polybase-setup-errors-and-possible-solutions/
... View more
Labels:
03-07-2017
03:51 PM
6 Kudos
Use Case I want to hide text messages inside images. When the images arrive somewhere else, I want to extract those messages. It let's you hide text in images, binaries in images and images in images. I was interesting in hiding text messages in images. After seeing https://en.wikipedia.org/wiki/Turn:_Washington's_Spies I thought secret messages were cool. So using the library, I take an image and text and hide the text in there. The library produces a new image (PNG) that has the message in it. I have a second script that extracts the text. The images look the same to my eyes. A future test would be to run a deep learning library or image analysis tool on the images to see if they spot the bits. They should be able to. A future NiFi tool would be to spot hidden images. It's a fun exercise to use NiFi and it seems possible that encoding messages in images were passing through Niagra Files back in the NSA days. Step 1: Hide Text (ExecuteStreamCommand) Step 2: Fetch File Step 3: UnHide Text (ExecuteStreamCommand) The left image is the original image and the right PNG is the output image with text. The size on disk has increased at a noticeable level.
The python source code is in github and referenced below: hide.sh wget $1 -O img.jpg
python hidetext.py img.jpg "$2" hidetext.py import cv
from LSBSteg import LSBSteg
import sys
imagename=sys.argv[1]
textstring=sys.argv[2]
carrier = cv.LoadImage(imagename)
steg = LSBSteg(carrier)
steg.hideText(textstring)
steg.saveImage(imagename + ".png")
#Image that contain datas unhide.sh python unhidetext.py $1 unhidetext.py import cv
from LSBSteg import LSBSteg
import sysimagename=sys.argv[1]
im = cv.LoadImage(imagename)steg = LSBSteg(im)
print steg.unhideText() For installation, you need to download LSB-Steganography script. OpenCV pip install cv
Reference: https://en.wikipedia.org/wiki/Steganography https://github.com/tspannhw/spy https://github.com/RobinDavid/LSB-Steganography
... View more
Labels:
03-03-2017
08:37 PM
We could have NiFi rewrite, update and add to the sonicPI code so the music is constantly changing
... View more
03-03-2017
07:17 PM
3 Kudos
Working With S3 Compatible Data Stores (and handling single source failure) With the major outage of S3 in my region, I decided I needed to have an alternative file store. I found a great open source server called Minio that I run on a miniPC running Centos 7. We could also use this solution for connecting to other S3 compatible stores such as RiakCS and Google Cloud Storage. I like to remain cloud and location neutral. In Apache NiFi, it's really easy. You can have two sources and two destinations, instead of just your regular AWS S3, you can have one for AWS S3 and one for another. Or you can use the second as a disaster recovery data backup. Since my Minio box is local, I can store data locally. It's pretty affordable to get a few terabytes connected to a small Linux box to hold some backups. With Apache NiFi, you have queues to buffer a potentially slower ingest/egress. Minio Setup wget https://dl.minio.io/server/minio/release/linux-amd64/minio
chmod 755 minio
nohup ./minio server files & Find the version that matches your hardware and OS. It will report back the endpoint (use this in the NiFi endpoint URL), access key and secret key and region. You enter this information in Apache NiFi and any S3 compatible tool like AWS CLI or S3Cmd. S3 Tool Install pip install awscli
AWS Access Key ID [****************3P2F]: 45454545zfgfgfgfgfgzgggzggggFFF
AWS Secret Access Key [****************Y3TG]: FFFDFDFDFDF7d8f7d87f8&D*F7d*&F78
Default region name [us-east-1]:
Default output format [None]:
aws configure set default.s3.signature_version s3v4
aws --endpoint-url http://192.168.1.155:9000 s3 ls s3://nifi
2017-03-01 16:17:19 13729 Retry_Count_Loop.xml
2017-03-01 16:19:58 19929 tspann7.jpg
aws --endpoint-url http://192.168.1.155:9000 s3 ls
2017-03-01 11:19:58 nifi
These are just for testing connectivity. NiFi Setup
Flow 1: GetTwitter: Ingest twitter data with keywords: AWS Outage, ... EvaluateJSONPath: parse out main Twitter fields from JSON CoreNLPProcessor: my custom processor to run Stanford CoreNLP sentiment analysis on the message. NLPProcessor: my custom processor to run Apache OpenNLP name and location entity resolver on the message. AttributeToJSON: convert all the attributes including output from the two custom processors into one unified JSON file. PutS3Object: Store to my S3 compatible datastore. Here you can tee the data from AttributeToJson to a number of different S3 stores including Amazon S3.
Flow 2:
ListS3: list all the files from S3 compatible data store. This is where you can add additional sources to ingest. You can have Amazon S3, Google Cloud Storage, RiakCS, Minio and others. FetchS3Object: get the actual file from S3. PutFile: store locally
S3.properties file # Setup endpoint
host_base = 192.168.1.155:9000
host_bucket = 192.168.1.155:9000
bucket_location = us-east-1
use_https = True
# Setup access keys
access_key = DF&D*F&*D&F*&DF&DFDF
secret_key = &d7df7f77DDFdjfiqeworsdfFDr34fd
accessKey = DF&D*F&*D&F*&DF&DFDF
secretKey = &d7df7f77DDFdjfiqeworsdfFDr34fd
# Enable S3 v4 signature APIs
signature_v2 = False After sending Twitter JSON files to S3.
References: https://github.com/minio/minio https://www.minio.io/ https://dzone.com/articles/aftermath-of-the-aws-s3-outagean-interview-with-ni https://aws.amazon.com/message/41926/ https://cloud.google.com/storage/docs/interoperability https://docs.minio.io/docs/aws-cli-with-minio https://aws.amazon.com/cli/ http://s3tools.org/s3cmd https://github.com/minio/minio-java
... View more
Labels:
03-02-2017
05:25 PM
3 Kudos
1. Host a Web Page (index.html) via HTTP GET with 200 OK Status
2. Receive POST from that page via AJAX with browser data
3. Extract Content and Attributes
4. Build a JSON file of HTTP data
5. Store it
To accept location in a phone or modern browser you must be running SSL.
So I added that for this HTTP Request.
Use openssl to create your 2048 RSA X509, PKCS12, JKS Keystore, Import Trust Store and import in browser
Your web page can be any web page, just POST back via AJAX or Form Submit.
<html>
<head>
<title>NiFi Browser Data Acquisition</title>
<body>
<script>
// Usage
window.onload = function() {
navigator.getBattery().then(function(battery) {
console.log(battery.level);
battery.addEventListener('levelchange', function() {
console.log(this.level);
});
});
};
////////////// print these
var latitude = "";
var longitude = "";
var ips = "";
var batteryInfo = "";
var screenInfo = screen.width +","+ screen.height + "," +
screen.availWidth +","+ screen.availHeight + "," +
screen.colorDepth + "," + screen.pixelDepth;
var pluginsInfo = "";
var coresInfo = "";
/////////////
////// Set Plugins
for (var i = 0; i < 12; i++) {
if ( typeof window.navigator.plugins[i] !== 'undefined' ) {
pluginsInfo += window.navigator.plugins[i].name + ', ';
}
}
////// Set Cores
if ( window.navigator.hardwareConcurrency > 0 ) {
coresInfo = window.navigator.hardwareConcurrency + " cores";
}
/////////////
/// send the information to the server
function loadDoc() {
var xhttp = new XMLHttpRequest();
xhttp.onreadystatechange = function() {
if (this.readyState == 4 && this.status == 200) {
document.getElementById("demo").innerHTML = 'Sent.';
}
};
// /send
xhttp.open("POST", "/send", true);
xhttp.setRequestHeader("Content-type", "application/json");
xhttp.send('{"plugins":"' + pluginsInfo +
'", "screen":"' + screenInfo +
'", "cores":"' + coresInfo +
'", "battery":"' + batteryInfo +
'", "ip":"' + ips +
'", "lat":"' + latitude + '", "lng":"' + longitude + '"}')
}
////////////
function geoFindMe() {
var output = document.getElementById("out");
if (!navigator.geolocation){
output.innerHTML = "<p>Geolocation is not supported by your browser</p>";
return;
}
function success(position) {
latitude = position.coords.latitude;
longitude = position.coords.longitude;
output.innerHTML = '<p>Latitude is ' + latitude + '° <br>Longitude is ' + longitude + '°</p>';
var img = new Image();
img.src="https://maps.googleapis.com/maps/api/staticmap?center=" + latitude + "," + longitude + "&zoom=13&size=300x300&sensor=false";
output.appendChild(img);
}
function error() {
output.innerHTML = "Unable to retrieve your location";
}
output.innerHTML = "<p>Locating…</p>";
navigator.geolocation.getCurrentPosition(success, error);
}
//get the IP addresses associated with an account
function getIPs(callback){
var ip_dups = {};
//compatibility for firefox and chrome
var RTCPeerConnection = window.RTCPeerConnection
|| window.mozRTCPeerConnection
|| window.webkitRTCPeerConnection;
var useWebKit = !!window.webkitRTCPeerConnection;
//bypass naive webrtc blocking using an iframe
if(!RTCPeerConnection){
//NOTE: you need to have an iframe in the page right above the script tag
//
//<iframe id="iframe" sandbox="allow-same-origin" style="display: none"></iframe>
//<script>...getIPs called in here...
//
var win = iframe.contentWindow;
RTCPeerConnection = win.RTCPeerConnection
|| win.mozRTCPeerConnection
|| win.webkitRTCPeerConnection;
useWebKit = !!win.webkitRTCPeerConnection;
}
//minimal requirements for data connection
var mediaConstraints = {
optional: [{RtpDataChannels: true}]
};
var servers = {iceServers: [{urls: "stun:stun.services.mozilla.com"}]};
//construct a new RTCPeerConnection
var pc = new RTCPeerConnection(servers, mediaConstraints);
function handleCandidate(candidate){
//match just the IP address
var ip_regex = /([0-9]{1,3}(\.[0-9]{1,3}){3}|[a-f0-9]{1,4}(:[a-f0-9]{1,4}){7})/
var ip_addr = ip_regex.exec(candidate)[1];
//remove duplicates
if(ip_dups[ip_addr] === undefined)
callback(ip_addr);
ip_dups[ip_addr] = true;
}
//listen for candidate events
pc.onicecandidate = function(ice){
//skip non-candidate events
if(ice.candidate)
handleCandidate(ice.candidate.candidate);
};
//create a bogus data channel
pc.createDataChannel("");
//create an offer sdp
pc.createOffer(function(result){
//trigger the stun server request
pc.setLocalDescription(result, function(){}, function(){});
}, function(){});
//wait for a while to let everything done
setTimeout(function(){
//read candidate info from local description
var lines = pc.localDescription.sdp.split('\n');
lines.forEach(function(line){
if(line.indexOf('a=candidate:') === 0)
handleCandidate(line);
});
}, 1000);
}
window.addEventListener("load", function (ev) {
"use strict";
var log = document.getElementById("log");
// https://dvcs.w3.org/hg/dap/raw-file/tip/sensor-api/Overview.html
window.addEventListener("devicetemperature", function (ev) {
log.textContent += "devicetemperature " + ev.value + "\n";
}, false);
window.addEventListener("devicepressure", function (ev) {
log.textContent += "devicepressure " + ev.value + "\n";
}, false);
window.addEventListener("devicelight", function (ev) {
log.textContent += "devicelight " + ev.value + "\n";
// toy tric
log.style.color = "rgb(" + (255 - 2*ev.value) + ",0,0)";
log.style.backgroundColor = "rgb(0,0," + (2*ev.value) + ")";
}, false);
window.addEventListener("deviceproximity", function (ev) {
log.textContent += "deviceproximity " + ev.value + "\n";
// toy tric
if (ev.value < 3) navigator.vibrate([300, 100, 100]);
}, false);
window.addEventListener("devicenoise", function (ev) {
log.textContent += "devicenoise " + ev.value + "\n";
}, false);
window.addEventListener("devicehumidity", function (ev) {
log.textContent += "devicehumidity " + ev.value + "\n";
}, false);
//https://wiki.mozilla.org/Magnetic_Field_Events
window.addEventListener("devicemagneticfield", function (ev) {
log.textContent += "devicemagneticfield " + [ev.x, ev.y, ev.x]+ "\n";
}, false);
// https://dvcs.w3.org/hg/dap/raw-file/default/pressure/Overview.html
window.addEventListener("atmpressure", function (ev) {
log.textContent += "atmpressure " + ev.value + "\n";
}, false);
// https://dvcs.w3.org/hg/dap/raw-file/tip/humidity/Overview.html
window.addEventListener("humidity", function (ev) {
log.textContent += "humidity " + ev.value + "\n";
}, false);
// https://dvcs.w3.org/hg/dap/raw-file/tip/temperature/Overview.html
window.addEventListener("temperature", function (ev) {
log.textContent += "temperature " + [ev.f, ev.c, ev.k, ev.value] + "\n";
}, false);
// https://dvcs.w3.org/hg/dap/raw-file/tip/battery/Overview.html
try {
if (typeof navigator.getBattery === "function") {
navigator.getBattery().then(function (battery) {
log.textContent += "battery.level " + battery.level + "\n";
log.textContent += "battery.charging " + battery.charging + "\n";
batteryInfo = "battery.level=" + battery.level + "," +
"battery.charging=" + battery.charging;
log.textContent += "battery.chargeTime " + battery.chargeTime + "\n";
log.textContent += "battery.dischargeTime " + battery.dischargeTime + "\n";
battery.addEventListener("levelcharge", function (ev) {
log.textContent += "change battery.level " + battery.level + "\n";
}, false);
}).catch(function (err) {
log.textContent += err.toString() + "\n";
});
} else {
log.textContent += "";
}
} catch (ex) {
log.textContent += ex.toString() + "\n";
}
}, false);
</script>
<p>
<br>
DEMO: Send Data to HDF / Apache NiFi via HandleHTTPRequest
<br>
<p><button onclick="geoFindMe()">Show my location</button></p>
<div id="out"></div>
<div id="demo"></div>
<pre id="log"></pre>
<button type="button" onclick="loadDoc()">Send data to Apache NiFi SSL Server</button>
<iframe id="iframe" sandbox="allow-same-origin" style="display: none"></iframe>
<script>
getIPs(function(ip){ips = ip;});
</script>
</body>
</html>
index.html : A web page to grab user information.
mobile-ingest-v3.xml : Apache NiFi 1.1.x template.
Note: Different browsers, devices, phones, tables and versions will send different values. Users should get a location request pop-up.
JSON Result File
{
"http.request.uri" : "/send",
"http.context.identifier" : "a4f9ae25-5f49-463e-97eb-c8a6bf3be8a7",
"http.remote.host" : "192.xxx.1.xxx",
"http.headers.Host" : "192.xxx.1.xxx:9178",
"http.local.name" : "192.xxx.1.xxx",
"http.headers.DNT" : "1",
"plugins" : "Widevine Content Decryption Module, Shockwave Flash, Chrome PDF Viewer, Native Client, Chrome PDF Viewer, ",
"latitude" : "40.2681799",
"http.headers.Accept" : "*/*",
"battery" : "battery.level=1,battery.charging=true",
"uuid" : "a2f299ae-6ef6-480d-a359-1362d25abe76",
"http.request.url" : "https://192.168.1.151:9178/send",
"http.server.name" : "192.168.1.151",
"http.character.encoding" : "UTF-8",
"path" : "./",
"cores" : "8 cores",
"http.remote.addr" : "192.168.1.151",
"http.headers.User-Agent" : "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36",
"http.method" : "POST",
"http.headers.Connection" : "keep-alive",
"longitude" : "-74.5291745",
"http.server.port" : "9178",
"ip" : "192.168.1.151",
"mime.type" : "application/json",
"http.locale" : "en_US",
"http.headers.Accept-Encoding" : "gzip, deflate, br",
"http.headers.Origin" : "https://192.168.1.151:9178",
"http.servlet.path" : "",
"http.local.addr" : "192.168.1.151",
"filename" : "1082639525534467",
"http.headers.Referer" : "https://192.168.1.151:9178/",
"http.headers.Accept-Language" : "en-US,en;q=0.8",
"http.headers.Content-Length" : "253",
"http.headers.Content-Type" : "application/json",
"RouteOnAttribute.Route" : "isjsonpost"
}
References:
https://github.com/tspannhw/webdataingest
http://webkay.robinlinus.com/
https://github.com/RobinLinus/autofill-phishing
https://github.com/RobinLinus/ubercookie
https://github.com/RobinLinus/socialmedia-leak
https://www.w3schools.com/jsref/prop_screen_availheight.asp
https://community.hortonworks.com/articles/27033/https-endpoint-in-nifi-flow.html
http://www.batchiq.com/nifi-configuring-ssl-auth.html
https://community.hortonworks.com/articles/886/securing-nifi-step-by-step.html
http://mobilehtml5.org/
https://gist.github.com/bellbind/c60d7008e86c34a76aa1
https://github.com/coremob/camera
http://www.girliemac.com/presentation-slides/html5-mobile-approach/deviceAPIs.html?full#23
https://github.com/girliemac/sushi-compass/blob/master/js/app.js
https://github.com/noipfraud/IPLock
http://www.tomanthony.co.uk/blog/detect-visitor-social-networks/
https://appsec-labs.com/html5/#toggle-id-5
https://mobiforge.com/design-development/sense-and-sensor-bility-access-mobile-device-sensors-with-javascript
https://www.html5rocks.com/en/tutorials/device/orientation/
http://qnimate.com/html5-proximity-api/
... View more
Labels:
02-14-2017
03:13 PM
That Livy is only for Zeppelin, it's not safe to use that In HDP 2.6, there will be a Livy available for general usage.
... View more
02-14-2017
02:59 PM
the processor takes a property to run against. You just need to pass something in the sentence parameter. You can concatenate a few fields there. The source is open, it would be easy to ingest a flowfile and process that instead of doing an input attribute. It's changing 2-3 lines and rebuilding.
... View more
02-13-2017
03:33 AM
3 Kudos
Overview I have been running a similar program on Raspberry Pi devices with TensorFlow. Now that MXNet has entered Apache incubation, it has become incredibly interesting to me. With the backing of Apache and Amazon, this library cannot be ignored. So I tried in on the same Raspberry Pi 3B that I was using for TensorFlow. For this example, we are grabbing images from the standard Raspberry Pi Camera and running live image analysis on it with MXNet using the Inception pre-built model from the MXNet Model Zoo. This is the nearly the same as the TensorFlow example. What I noticed is a bit faster execution and smoother process. For accuracy, I have not run enough tests for weighing the two libraries out, but that is something I will look at doing for large number of images. Training both with my camera and images I am interested in would be very helpful. Some use cases I am thinking of are: Security Camera, Water Leak Detection, Evil Cat Sensing, Engine Vibration and self-driving model car. Raspberry Pi v3 B with PI Camera Setup Your Device For Running MXNet sudo apt-get -y install git cmake build-essential g++-4.8 c++-4.8 liblapack* libblas* libopencv*
git clone https://github.com/dmlc/mxnet.git --recursive
cd mxnet
make
cd python
sudo python setup.py install
curl --header 'Host: data.mxnet.io' --header 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Firefox/45.0' --header 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' --header 'Accept-Language: en-US,en;q=0.5' --header 'Referer: http://data.mxnet.io/models/imagenet/' --header 'Connection: keep-alive' 'http://data.mxnet.io/models/imagenet/inception-bn.tar.gz' -o 'inception-bn.tar.gz' -L
tar -xvzf inception-bn.tar.gz
mv Inception_BN-0126.params Inception_BN-0000.params
The primary code is Python taken from some examples from MXNet, OpenCV and PICamera. topn = inception_predict.predict_from_local_file(filename, N=5) This calls the inception_predict from MXNet example. The inception_predict code is referenced in the reference links below. Main Python Code #!/usr/bin/python
# 2017 load pictures and analyze
import time
import sys
import datetime
import subprocess
import sys
import urllib2
import os
import datetime
import ftplib
import traceback
import math
import random, string
import base64
import json
import paho.mqtt.client as mqtt
import picamera
from time import sleep
from time import gmtime, strftime
import inception_predict
packet_size=3000
def randomword(length):
return ''.join(random.choice(string.lowercase) for i in range(length))
# Create camera interface
camera = picamera.PiCamera()
while True:
# Create unique image name
uniqueid = 'mxnet_uuid_{0}_{1}'.format(randomword(3),strftime("%Y%m%d%H%M%S",gmtime()))
# Take the jpg image from camera
filename = '/home/pi/cap.jpg'
# Capture image from RPI
camera.capture(filename)
# Run inception prediction on image
topn = inception_predict.predict_from_local_file(filename, N=5)
# CPU Temp
p = subprocess.Popen(['/opt/vc/bin/vcgencmd','measure_temp'], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
out, err = p.communicate()
# MQTT
client = mqtt.Client()
client.username_pw_set("username","password")
client.connect("mqttcloudprovider", 14162, 60)
# CPU Temp
out = out.replace('\n','')
out = out.replace('temp=','')
# 5 MXNET Analysis
top1 = str(topn[0][1])
top1pct = str(round(topn[0][0],3) * 100)
top2 = str(topn[1][1])
top2pct = str(round(topn[1][0],3) * 100)
top3 = str(topn[2][1])
top3pct = str(round(topn[2][0],3) * 100)
top4 = str(topn[3][1])
top4pct = str(round(topn[3][0],3) * 100)
top5 = str(topn[4][1])
top5pct = str(round(topn[4][0],3) * 100)
row = [ { 'uuid': uniqueid, 'top1pct': top1pct, 'top1': top1, 'top2pct': top2pct, 'top2': top2,'top3pct': top3pct, 'top3': top3,'top4pct': top4pct,'top4': top4, 'top5pct': top5pct,'top5': top5, 'cputemp': out} ]
json_string = json.dumps(row)
client.publish("mxnet",payload=json_string,qos=1,retain=False)
client.disconnect()
We grab an image from a camera, run it through MXNet, convert the results to JSON and then send the message to a cloud hosted MQTT broker. I also grab the CPU temperature to show we can add more sensors.
Example JSON Sent via MQTT
[{"top1pct": "54.5", "top5": "n04590129 window shade", "top4": "n03452741 grand piano, grand", "top3": "n03018349 china cabinet, china closet", "top2": "n03201208 dining table, board", "top1": "n04099969 rocking chair, rocker", "top2pct": "9.1", "top3pct": "8.0", "uuid": "mxnet_uuid_oqy_20170211203727", "top4pct": "2.8", "top5pct": "2.2", "cputemp": "75.2'C"}] Our schema is pretty consistent as above, so we can create a Hive or Phoenix table and insert into that.
HDF / NiFi Flow Consume MQTT This processor will receive messages from a cloud based MQTT broker sent by a few Raspberry PIs I have setup. Extract Fields from MXNET (EvaluateJSONPath) Build a Message (UpdateAttribute) Category 1 ${top1} at ${top1pct}%
Category 2 ${top2} at ${top2pct}%
Category 3 ${top3} at ${top3pct}%
Category 4 ${top4} at ${top4pct}%
Category 5 ${top5} at ${top5pct}%
UUID ${uuid}
CPU Temp ${cputemp}
Send Msg to Slack Channel (PutSlack)
Channel is mxnet Stores Files (PutFile)
We take the JSON convert it to a text message to a Slack channel. That's all it takes to ingest data from an edge device running a camera and running Deep Learning on a tiny device and then send the data asynchronously to a cloud hosted broker that can distribute to cloud and on-premise hosted Apache NiFi servers. We could also use Site-to-Site, HTTP or TCP/IP. MQTT is very lightweight, works over the Internet, has an easy Python library and works well with Apache NiFi.
Reference: This sample program is critical and gave me most of the code needed to run: http://mxnet.io/tutorials/embedded/wine_detector.html http://data.mxnet.io/models/imagenet/ https://community.hortonworks.com/content/repo/77987/rpi-picamera-mqtt-nifi.html https://github.com/tspannhw/mxnet_rpi/blob/master/analyze.py https://community.hortonworks.com/content/kbentry/80339/iot-capturing-photos-and-analyzing-the-image-with.html CloudMQTT has proven to be awesome. Instant setup and a free instance for testing. This is great for getting data from my remote raspberry pis to the cloud and back into HDF 2.1 servers behind firewalls.
http://cloudmqtt.com http://www.jsonpath.com/ Github Repo https://github.com/tspannhw/mxnet_rpi https://community.hortonworks.com/repos/83001/python-mxnet-raspberry-pi-example.html?shortDescriptionMaxLength=140 Pushing to Slack Channel https://nifi-se.slack.com/messages/mxnet/details/ Apache MXNet Incubation https://wiki.apache.org/incubator/MXNetProposal Awesome MXNet https://github.com/dmlc/mxnet/tree/master/example Install MXNet on Raspian http://mxnet.io/get_started/raspbian_setup.html Example Program for MXNet on Raspberry PI 3 http://mxnet.io/tutorials/embedded/wine_detector.html Raspberry Pi with MXNET http://mxnet.io/tutorials/embedded/wine_detector.html MQTT https://github.com/tspannhw/rpi-picamera-mqtt-nifi/blob/master/upload.py Real-Image with Pretrained Model http://mxnet.io/tutorials/r/classifyRealImageWithPretrainedModel.html MXNet GTC Tutorial
https://github.com/dmlc/mxnet-gtc-tutorial MXNet for Facial Identification
https://github.com/tornadomeet/mxnet-face http://vis-www.cs.umass.edu/fddb/results.html http://www.cbsr.ia.ac.cn/english/CASIA-WebFace-Database.html
MXNet Models for ImageNet 1K Inception BN
https://github.com/dmlc/mxnet-model-gallery/blob/master/imagenet-1k-inception-bn.md MXNet Example Image Classification
https://github.com/dmlc/mxnet/tree/master/example/image-classification sudo apt-get install imagemagick identify -verbose /home/pi/cap.jpg
... View more
Labels:
02-10-2017
03:34 PM
why simulate. A raspberry pi or two can send thousands of mqtt message a second. you could simulate that with Gatling or JMeter
... View more
02-06-2017
08:26 PM
5 Kudos
ExtractText NiFi
Custom Processor Powered by Apache Tika Apache Tika is amazing, it is very easy to use it to analyze
file and then to extract text with it.
Apache Tika uses other powerful Apache projects like Apache PDFBox and
Apache POI. Example Usage
Feed in documents, I use my LinkProcessor which grabs links from a website and returns a JSON
List. Split the resulting JSON list into individual JSON rows
with SplitJSON. EvaluateJSONPath
to extract just the URLs. InvokeHTTP
to do a GET on that parsed URL. RouteOnAttribute
to only process file types I am interested in like Microsoft Word. The new ExtractTextProcessor to extract the text of the document. Then we save the text as a file in some data store, perhaps HDFS. If you have a directory of files, you can just use GetFile to ingest them en masse. LinkProcessor (https://github.com/tspannhw/linkextractorprocessor) URL: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/index.html This is an example of a URL that I want to grab all the
documents from. You can point it at any
URL that has links to documents (HTML, Word, Excel, PowerPoint, etc...). RouteOnAttribute I only want to process a few types of files, so I limit them
here. ${filename:endsWith('.doc'):or(${filename:endsWith('.pdf')}):or(${filename:endsWith('.rtf')}):or(${filename:endsWith('.ppt')}):or(
${filename:endsWith('.docx')}):or(${filename:endsWith('.pptx')}):or(${filename:endsWith('.html')}):or(${filename:endsWith('.htm')}):or(${filename:endsWith('.xls')}):or(
${filename:endsWith('.xlsx')}):or(${filename:endsWith('.xml')}):or(${Content-Type:contains('text/html')}):or(${Content-Type:contains('application/pdf')}):or(
${Content-Type:contains('application/msword')}):or(${Content-Type:contains('application/vnd')}):or(${Content-Type:contains('text/xml')})} Release: https://github.com/tspannhw/nifi-extracttext-processor/releases/tag/1.0 Reference:
https://tika.apache.org/ https://tika.apache.org/1.14/formats.html http://pdfbox.apache.org/ https://pdfbox.apache.org/1.8/cookbook/documentcreation.html http://poi.apache.org/ https://community.hortonworks.com/repos/81693/nifi-custom-processor-for-extracting-text-from-doc.html?shortDescriptionMaxLength=140 https://dzone.com/articles/cool-projects-big-data-machine-learning-apache-nifi
... View more
Labels: