1973
Posts
1225
Kudos Received
124
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1921 | 04-03-2024 06:39 AM | |
| 3018 | 01-12-2024 08:19 AM | |
| 1647 | 12-07-2023 01:49 PM | |
| 2422 | 08-02-2023 07:30 AM | |
| 3367 | 03-29-2023 01:22 PM |
06-01-2017
11:55 AM
Avro 1.8.2 is now available
... View more
05-30-2017
01:42 PM
what are your settings for minio? and you must be running a minio server and have permissions to it. you need to set your access and secret keys and host base and host bucket
... View more
05-26-2017
05:04 PM
mysql can enable query logging https://dev.mysql.com/doc/refman/5.7/en/query-log.html
... View more
05-26-2017
02:01 PM
That's interesting. You can read it just like a regular XML with a SAX or DOM parser like Jackson. What do you need from the XML? You could read configuration file elsewhere or pass it into your processor. https://nifi.apache.org/developer-guide.html
... View more
05-25-2017
04:59 PM
1 Kudo
https://dzone.com/articles/deep-learning-on-big-data-platforms https://dzone.com/articles/tensorflow-on-the-edge-part-2-of-5 https://dzone.com/articles/deep-learning-for-data-engineers-part-1 https://dzone.com/articles/deep-learning-and-machine-learning-guide-part-iii https://dzone.com/articles/deep-learning-and-machine-learning-guide-part-ii https://dzone.com/articles/machine-learning-resources https://dzone.com/articles/tensorflow-on-the-edge TensorFlow, MXNet and Deep Learning4J all work well with HDP 2.6 https://community.hortonworks.com/articles/83872/data-lake-30-containerization-erasure-coding-gpu-p.html https://community.hortonworks.com/articles/104649/using-cloudbreak-recipes-to-deploy-anaconda-and-te.html https://community.hortonworks.com/articles/58265/analyzing-images-in-hdf-20-using-tensorflow.html https://community.hortonworks.com/articles/54954/setting-up-gpu-enabled-tensorflow-to-work-with-zep.html https://community.hortonworks.com/articles/80339/iot-capturing-photos-and-analyzing-the-image-with.html https://community.hortonworks.com/articles/59349/hdf-20-flow-for-ingesting-real-time-tweets-from-st.html https://community.hortonworks.com/articles/103863/using-an-asus-tinkerboard-with-tensorflow-and-pyth.html https://community.hortonworks.com/articles/83100/deep-learning-iot-workflows-with-raspberry-pi-mqtt.html
... View more
05-25-2017
04:52 PM
1 Kudo
in HDP 3.0, YARN 3 will run TensorFlow jobs in docker containers. Tensorflow on Spark Caffe on Spark both run on Spark on YARN and will be distributed in your Hadoop cluster For many applications I am running tensorflow python scripts (already trainined) on NiFi nodes, but could run on HDP nodes which all have python installed. Other option is to run TensorFlow with TensorFlow Serving and call it from existing applications, NiFi and big data apps via gRPC
... View more
05-23-2017
04:53 PM
Spacy and NLTK are for other purposes on the Tinker for NLP / Sentiment Analysis. TensorFlow is used to analyze images to figure out what the image is. The device has a camera or can acquire images elsewhere and can process them on the edge before you send to your data lake Tinker with it's 2GB and decent processor can easily run this. A 32 gb microSD card is cheap and stores 10x what I need. These all run very fast and are complete in seconds. That does not kill this box. And it's not a raspberry pi.
... View more
05-22-2017
11:23 PM
2 Kudos
Ingesting JMS Data into Hive A company has a lot of data transmitting around the enterprise asynchronously with Apache ActiveMQ, they want to tap into it and convert JSON messages coming from web servers and store into Hadoop. I am storing the data into Apache Phoenix / HBase via SQL. And also since it's so easy, I am storing the data into ORC files in HDFS for Apache Hive access.
Apache NiFi 1.2 generates DDL for a Hive Table for us hive.ddl
CREATE EXTERNAL TABLE IF NOT EXISTS meetup
(id INT, first_name STRING, last_name STRING, email STRING, ip_address STRING, company STRING, macaddress STRING, cell_phone STRING) STORED AS ORC
LOCATION '/meetup'
insert into meetup
(id , first_name , last_name , email , ip_address , company , macaddress , cell_phone)
values
(?,?,?,?,?,?,?,?)
HDF Flow
ConsumeJMS Path 1: Store in Hadoop as ORC with a Hive Table
InferAvroSchema: get a schema from the JSON data ConvertJSONtoAVRO: build an AVRO file from JSON data MergeContent: build a larger chunk of AVRO data ConvertAvroToORC: build ORC files PutHDFS: land in your Hadoop data lake
Path 2: Upsert into Phoenix (or any SQL database)
EvaluateJSONPath: extract fields from the JSON file UpdateAttribute: Set SQL fields ReplaceText: Create SQL statement with ?. PutSQL: Send to Phoenix through Connection Pool (JDBC) Path 3: Store Raw JSON in Hadoop
PutHDFS: Store JSON data on ingest
Path 4: Call Original REST API to Obtain Data and Send to JMS
GetHTTP: call a REST API to retrieve JSON arrays SplitJSON: split the JSON file into individual records PutJMS <or> PublishJMS: two ways to push messages to JMS. One uses a JMS controller and another uses a JMS client without a controller. I should benchmark this.
Error Message from Failed Load
If I get errors on JMS send, I send the UUID of the file to Slack for ChatOps.
Zeppelin Display of SQL Data
To check the tables I use Apache Zeppelin to query Phoenix and Hive tables.
Formatting in UpdateAttribute for SQL Arguments To set the ? properties for the JDBC prepared statements, we go by #, starting from 1. The type is the JDBC Type, 12 is String; and the value is the value of the FlowFile fields. Yet another message queue ingested with no fuss.
... View more
Labels:
05-22-2017
08:16 PM
4 Kudos
Backup Files from Hadoop
ListHDFS
set parameters, pick a high level directory and work down. /etc/hadoop/conf/core-site.xml FetchHDFS
${path}/${filename} PutFile Store in local Backup Hive Tables SelectHiveQL
Output format AVRO, with SQL: select * from beaconstatus ConvertAVROtoORC
generic for all the tables UpdateAttribute
tablename ${hive.ddl:substringAfter('CREATE EXTERNAL TABLE IF NOT EXISTS '):substringBefore(' (')} PutFile
Use replace directories and create missing directories with directory: /Volumes/Transcend/HadoopBackups/hive/${tablename} For Phoenix tables, I use the same ConvertAvroToORC, UpdateAttribute and PutFile boxes and just add ExecuteSQL to ingest Phoenix data. For every new table, I add one box and link it to ConvertAvroToORC. Done! This is enough of a backup so if I need to rebuild and refill my development cluster, I can do so easily. Also I have them schedule for once a day to rewrite everything. This is Not For Production or Extremely Large Data! This works great for a development cluster or personal dev cluster. You can easily backup files by ingesting with GetFile and other things can be backed up by called ExecuteStreamCommand. Local File Storage of Backed up Data drwxr-xr-x 3 tspann staff 102 May 20 23:00 any_data_trials2
drwxr-xr-x 3 tspann staff 102 May 20 22:59 any_data_meetup
drwxr-xr-x 3 tspann staff 102 May 20 22:59 any_data_ibeacon
drwxr-xr-x 3 tspann staff 102 May 20 22:57 any_data_gpsweather
drwxr-xr-x 3 tspann staff 102 May 20 10:53 any_data_beaconstatus
drwxr-xr-x 3 tspann staff 102 May 20 10:52 any_data_beacongateway
drwxr-xr-x 3 tspann staff 102 May 19 17:36 any_data_atweetshive2
drwxr-xr-x 3 tspann staff 102 May 19 17:31 any_data_atweetshive
Other Tools to Extract Data ShowTables to get your list and then you can grab all the DDL for the Hive tables. ddl.sql show create table atweetshive;
show create table atweetshive2;
show create table beacongateway;
show create table beaconstatus;
show create table dronedata;
show create table gps;
show create table gpsweather;
show create table ibeacon;
show create table meetup;
show create table trials2; Hive Script to Export Table DDL beeline -u jdbc:hive2://myhiveserverthrift:10000/default --color=false --showHeader=false --verbose=false --silent=true --outputformat=csv -f ddl.sql Backup Zeppelin Notebooks in Bulk tar -cvf notebooks.tar /usr/hdp/current/zeppelin-server/notebook/
gzip -9 notebooks.tar
scp userid@pservername:/opt/demo/notebooks.tar.gz .
... View more
Labels:
05-21-2017
02:59 PM
6 Kudos
Raspberry Pi Killer? Nope, but this device has twice the RAM and a bit more performance. It's mostly compatible with Pi, but not fully. It is very new and has little ecosystem but can get the job done. Device Setup It is easy to install Python and all the libraries required for IoT and some deep learning. I found most instructions worked for Raspberry Pi on this device. It has more RAM which helps on some of these activities. I downloaded and burned with Etcher a MicroSD image of TinkerOS_Debian V1.8 (Beta version). It's a Debian variant close enough to Raspian for most IoT developers and users to be comfortable. An Android OS is now available for download as well and that may be worth trying, I am wondering if Google will add this device to the AndroidThings supported devices? Perhaps. One quirk, make sure you remember this: TinkerOS default username is “linaro”, password is “linaro”. Connect to the device via ssh linaro@SOMEIP. Python Setup sudo apt-get update
sudo apt-get install cmake gcc g++ libxml2 libxml2-* leveldb*
sudo apt-get install python-dev python3-dev
sudo apt-get install python-setuptools
sudo apt-get install python3-setuptools
pip install twython
pip install numpy
pip install wheel
pip install --user numpy scipy matplotlib ipython jupyter pandas sympy nose
sudo pip install -U nltk
python
import nltk
nltk.download()
quit()
pip install -U spacy
python -m spacy.en.download all
sudo python -m textblob.download_corpora
# TensorFlow
get https://github.com/samjabrahams/tensorflow-on-raspberry-pi/releases/download/v1.0.1/tensorflow-1.0.1-cp27-none-linux_armv7l.whl
sudo pip install tensorflow-1.0.1-cp27-none-linux_armv7l.whl
# For Python 3.4
wget https://github.com/samjabrahams/tensorflow-on-raspberry-pi/releases/download/v1.0.1/tensorflow-1.0.1-cp34-cp34m-linux_armv7l.whl
sudo pip3 install tensorflow-1.0.1-cp34-cp34m-linux_armv7l.whl
# For Python 2.7
sudo pip uninstall mock
sudo pip install mock
# For Python 3.4
sudo pip3 uninstall mock
sudo pip3 install mock
sudo apt-get install git
git clone https://github.com/tensorflow/tensorflow.git
# PAHO for MQTT
pip install paho-mqtt
# Flask for Web Apps
pip install flask
Python 2.7 and 3.4 both work fine on this device, I was also able to install the major NLP libraries including SpaCy and NLTK. TensorFlow installed using the Raspberry PI build and ran without incident. I believe it's a bit faster than the RPI version. I will have to run some tests on that. Run The Regular TensorFlow Inception V3 Demo python -W ignore /tensorflow/models/tutorials/image/imagenet/classify_image.py --image_file /opt/demo/tensorflow/TimSpann.jpg I hacked that version to add code to send the results to MQTT so I could process with most IoT hubs and Apache NiFi with ease. JSON is a very simple format to work with. Custom Python to Call NiFi # .... imports
import paho.mqtt.client as paho
import os
import json
# .... later in the code
top_k = predictions.argsort()[-FLAGS.num_top_predictions:][::-1]
for node_id in top_k:
human_string = node_lookup.id_to_string(node_id)
score = predictions[node_id]
print('==> %s (score = %.5f)' % (human_string, score))
row = [ { 'human_string,': str(human_string), 'score,': str(score)} ]
json_string = json.dumps(row)
client = paho.Client()
client.connect("192.168.1.151", 1883, 60)
client.publish("tinker1", payload=json_string, qos=0, retain=True)
NIFI Ingest Ingesting MQTT is easy and again that's our choice from the TinkerBoard. I have formatted the TensorFlow data as JSON and we quickly ingest and drop to a file. We could do anything with this flow file include store in Hadoop, Hive, Phoenix, HBase or send it to Kafka or transform it. So now we have yet another platform that can be used for IoT and basic Deep Learning and NLP. All enabled by a small fast linux device that runs Python. Enjoy your SBC! I am hoping that they add hats, a hard drive and some other ASUS accessories. Make your own mini Debian laptop would be cool. The next device I am looking at is NVIDIA's powerful GPU SBCs. There's a couple options from 192 GPU cores up to 256 with smoking high-end specs. Example Data [{"score,": "0.034884", "human_string,": "neck brace"}] Downloads ASUS SBC Download TinkerBoard FAQ Scripts Modified TensorFlow example /models/tutorials/image/imagenet/classify_image.py
... View more
Labels: