1973
Posts
1225
Kudos Received
124
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 2486 | 04-03-2024 06:39 AM | |
| 3840 | 01-12-2024 08:19 AM | |
| 2078 | 12-07-2023 01:49 PM | |
| 3062 | 08-02-2023 07:30 AM | |
| 4195 | 03-29-2023 01:22 PM |
09-21-2018
03:37 PM
3 Kudos
Running Apache MXNet Deep Learning on YARN 3.1 - HDP 3.0
With Hadoop 3.1 / HDP 3.0, we can easily run distributed classification, training and other deep learning jobs. I am using Apache MXNet with Python. You can also do TensorFlow or Pytorch.
If you need GPU resources, you can specify them as such:
yarn.io/gpu=2
My cluster does not have an NVidia GPU unfortunately.
See:
https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.0/data-operating-system/content/dosg_recommendations_for_running_docker_containers_on_yarn.html
Running App on YARN
[root@princeton0 ApacheDeepLearning101]# ./yarn.sh
18/09/21 15:31:22 INFO distributedshell.Client: Initializing Client
18/09/21 15:31:22 INFO distributedshell.Client: Running Client
18/09/21 15:31:22 INFO client.RMProxy: Connecting to ResourceManager at princeton0.field.hortonworks.com/172.26.208.140:8050
18/09/21 15:31:23 INFO client.AHSProxy: Connecting to Application History server at princeton0.field.hortonworks.com/172.26.208.140:10200
18/09/21 15:31:23 INFO distributedshell.Client: Got Cluster metric info from ASM, numNodeManagers=1
18/09/21 15:31:23 INFO distributedshell.Client: Got Cluster node info from ASM
18/09/21 15:31:23 INFO distributedshell.Client: Got node report from ASM for, nodeId=princeton0.field.hortonworks.com:45454, nodeAddress=princeton0.field.hortonworks.com:8042, nodeRackName=/default-rack, nodeNumContainers=4
18/09/21 15:31:23 INFO distributedshell.Client: Queue info, queueName=default, queueCurrentCapacity=0.4, queueMaxCapacity=1.0, queueApplicationCount=8, queueChildQueueCount=0
18/09/21 15:31:23 INFO distributedshell.Client: User ACL Info for Queue, queueName=root, userAcl=SUBMIT_APPLICATIONS
18/09/21 15:31:23 INFO distributedshell.Client: User ACL Info for Queue, queueName=root, userAcl=ADMINISTER_QUEUE
18/09/21 15:31:23 INFO distributedshell.Client: User ACL Info for Queue, queueName=default, userAcl=SUBMIT_APPLICATIONS
18/09/21 15:31:23 INFO distributedshell.Client: User ACL Info for Queue, queueName=default, userAcl=ADMINISTER_QUEUE
18/09/21 15:31:23 INFO distributedshell.Client: Max mem capability of resources in this cluster 15360
18/09/21 15:31:23 INFO distributedshell.Client: Max virtual cores capability of resources in this cluster 12
18/09/21 15:31:23 WARN distributedshell.Client: AM Memory not specified, use 100 mb as AM memory
18/09/21 15:31:23 WARN distributedshell.Client: AM vcore not specified, use 1 mb as AM vcores
18/09/21 15:31:23 WARN distributedshell.Client: AM Resource capability=<memory:100, vCores:1>
18/09/21 15:31:23 INFO distributedshell.Client: Copy App Master jar from local filesystem and add to local environment
18/09/21 15:31:24 INFO distributedshell.Client: Set the environment for the application master
18/09/21 15:31:24 INFO distributedshell.Client: Setting up app master command
18/09/21 15:31:24 INFO distributedshell.Client: Completed setting up app master command {{JAVA_HOME}}/bin/java -Xmx100m org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster --container_type GUARANTEED --container_memory 512 --container_vcores 1 --num_containers 1 --priority 0 1><LOG_DIR>/AppMaster.stdout 2><LOG_DIR>/AppMaster.stderr
18/09/21 15:31:24 INFO distributedshell.Client: Submitting application to ASM
18/09/21 15:31:24 INFO impl.YarnClientImpl: Submitted application application_1536697796040_0022
18/09/21 15:31:25 INFO distributedshell.Client: Got application report from ASM for, appId=22, clientToAMToken=null, appDiagnostics=AM container is launched, waiting for AM container to Register with RM, appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, appStartTime=1537543884622, yarnAppState=ACCEPTED, distributedFinalState=UNDEFINED, appTrackingUrl=http://princeton0.field.hortonworks.com:8088/proxy/application_1536697796040_0022/, appUser=root
18/09/21 15:31:26 INFO distributedshell.Client: Got application report from ASM for, appId=22, clientToAMToken=null, appDiagnostics=AM container is launched, waiting for AM container to Register with RM, appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, appStartTime=1537543884622, yarnAppState=ACCEPTED, distributedFinalState=UNDEFINED, appTrackingUrl=http://princeton0.field.hortonworks.com:8088/proxy/application_1536697796040_0022/, appUser=root
18/09/21 15:31:27 INFO distributedshell.Client: Got application report from ASM for, appId=22, clientToAMToken=null, appDiagnostics=AM container is launched, waiting for AM container to Register with RM, appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, appStartTime=1537543884622, yarnAppState=ACCEPTED, distributedFinalState=UNDEFINED, appTrackingUrl=http://princeton0.field.hortonworks.com:8088/proxy/application_1536697796040_0022/, appUser=root
18/09/21 15:31:28 INFO distributedshell.Client: Got application report from ASM for, appId=22, clientToAMToken=null, appDiagnostics=, appMasterHost=princeton0/172.26.208.140, appQueue=default, appMasterRpcPort=-1, appStartTime=1537543884622, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, appTrackingUrl=http://princeton0.field.hortonworks.com:8088/proxy/application_1536697796040_0022/, appUser=root
18/09/21 15:31:29 INFO distributedshell.Client: Got application report from ASM for, appId=22, clientToAMToken=null, appDiagnostics=, appMasterHost=princeton0/172.26.208.140, appQueue=default, appMasterRpcPort=-1, appStartTime=1537543884622, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, appTrackingUrl=http://princeton0.field.hortonworks.com:8088/proxy/application_1536697796040_0022/, appUser=root
18/09/21 15:31:30 INFO distributedshell.Client: Got application report from ASM for, appId=22, clientToAMToken=null, appDiagnostics=, appMasterHost=princeton0/172.26.208.140, appQueue=default, appMasterRpcPort=-1, appStartTime=1537543884622, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, appTrackingUrl=http://princeton0.field.hortonworks.com:8088/proxy/application_1536697796040_0022/, appUser=root
18/09/21 15:31:31 INFO distributedshell.Client: Got application report from ASM for, appId=22, clientToAMToken=null, appDiagnostics=, appMasterHost=princeton0/172.26.208.140, appQueue=default, appMasterRpcPort=-1, appStartTime=1537543884622, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, appTrackingUrl=http://princeton0.field.hortonworks.com:8088/proxy/application_1536697796040_0022/, appUser=root
18/09/21 15:31:32 INFO distributedshell.Client: Got application report from ASM for, appId=22, clientToAMToken=null, appDiagnostics=, appMasterHost=princeton0/172.26.208.140, appQueue=default, appMasterRpcPort=-1, appStartTime=1537543884622, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, appTrackingUrl=http://princeton0.field.hortonworks.com:8088/proxy/application_1536697796040_0022/, appUser=root
18/09/21 15:31:33 INFO distributedshell.Client: Got application report from ASM for, appId=22, clientToAMToken=null, appDiagnostics=, appMasterHost=princeton0/172.26.208.140, appQueue=default, appMasterRpcPort=-1, appStartTime=1537543884622, yarnAppState=FINISHED, distributedFinalState=SUCCEEDED, appTrackingUrl=http://princeton0.field.hortonworks.com:8088/proxy/application_1536697796040_0022/, appUser=root
18/09/21 15:31:33 INFO distributedshell.Client: Application has completed successfully. Breaking monitoring loop
18/09/21 15:31:33 INFO distributedshell.Client: Application completed successfully
Results:
https://github.com/tspannhw/ApacheDeepLearning101/blob/master/run.log
Script:
https://github.com/tspannhw/ApacheDeepLearning101/blob/master/yarn.sh yarn jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar -jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar -shell_command python3.6 -shell_args "/opt/demo/ApacheDeepLearning101/analyzex.py /opt/demo/images/201813161108103.jpg" -container_resources memory-mb=512,vcores=1 For pre-HDP 3.0, see my older script using the DMLC YARN runner. We don't need that anymore. No Spark either.
https://github.com/tspannhw/nifi-mxnet-yarn
Python MXNet Script:
https://github.com/tspannhw/ApacheDeepLearning101/blob/master/analyzehdfs.py
Since we are distributed, let's write the results to HDFS. We can use and install the Python HDFS library that works on Python 2.7 and 3.x. So let's pip install it.
pip install hdfs
In our code:
from hdfs import InsecureClient
client = InsecureClient('http://princeton0.field.hortonworks.com:50070', user='root')
from json import dumps
client.write('/mxnetyarn/' + uniqueid + '.json', dumps(row))
We write our row as JSON to HDFS. When the job completes in YARN, we get a new JSON file written to HDFS. hdfs dfs -ls /mxnetyarn
Found 2 items
-rw-r--r-- 3 root hdfs 424 2018-09-21 17:50 /mxnetyarn/mxnet_uuid_img_20180921175007.json
-rw-r--r-- 3 root hdfs 424 2018-09-21 17:55 /mxnetyarn/mxnet_uuid_img_20180921175552.json
hdfs dfs -cat /mxnetyarn/mxnet_uuid_img_20180921175552.json
{"uuid": "mxnet_uuid_img_20180921175552", "top1pct": "49.799999594688416", "top1": "n03063599 coffee mug", "top2pct": "21.50000035762787", "top2": "n07930864 cup", "top3pct": "12.399999797344208", "top3": "n07920052 espresso", "top4pct": "7.500000298023224", "top4": "n07584110 consomme", "top5pct": "5.200000107288361", "top5": "n04263257 soup bowl", "imagefilename": "/opt/demo/images/201813161108103.jpg", "runtime": "0"}
HDP Assemblies
https://github.com/hortonworks/hdp-assemblies/
https://github.com/hortonworks/hdp-assemblies/blob/master/tensorflow/markdown/Dockerfile.md
https://github.com/hortonworks/hdp-assemblies/blob/master/tensorflow/markdown/TensorflowOnYarnTutorial.md
https://github.com/hortonworks/hdp-assemblies/blob/master/tensorflow/markdown/RunTensorflowJobUsingHelperScript.md
*** SUBMARINE **
Coming soon, Submarine is really cool new way.
https://github.com/leftnoteasy/hadoop-1/tree/submarine/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine
See this awesome presentation from Strata NYC 2018 by Wangda Tan (Hortonworks): https://conferences.oreilly.com/strata/strata-ny/public/schedule/detail/68289
See the quick start for setting Docker and GPU options:
https://github.com/leftnoteasy/hadoop-1/blob/submarine/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/src/site/QuickStart.md
Resources:
https://community.hortonworks.com/articles/60480/using-images-stored-in-hdfs-for-web-pages.html
... View more
Labels:
09-11-2018
01:27 PM
If you see ParseSQL is an extracttext processor. We use extract text to get the SQL statement created by generate table fetch. We add a new attribute, sql, with value, ^(.*).
... View more
09-10-2018
04:23 PM
3 Kudos
IoT Edge Processing with Apache NiFi and MiniFi and Multiple Deep Learning Libraries Series For: https://conferences.oreilly.com/strata/strata-ny/public/schedule/detail/68140 See Part 1: https://community.hortonworks.com/articles/215079/iot-edge-processing-with-deep-learning-on-hdf-32-a.html See Part 2: https://community.hortonworks.com/articles/215258/iot-edge-processing-with-deep-learning-on-hdf-32-a-1.html See Part 3: https://community.hortonworks.com/articles/215271/iot-edge-processing-with-deep-learning-on-hdf-32-a-2.html You will notice a bit of a travel theme this article, it's because some of the images and work was done while on various holidays in August and September. Deep Learning We are running TensorFlow 1.10, Apache MXNet 1.3, NCSDK 2.05 and Neural Compute Application Zoo (NC App Zoo). Device Type 1: Plain Raspberry Pi (Found some old Kodak slides...) Main things to do is upgrade to Python 3.6, upgrade Raspberry PI to Stretch, upgrade libraries and a few reboots. Install OpenCV (or upgrade) and install Apache MXNet. You want to make sure you are on the latest version of Stretch and everything is cleaned up. Example: sudo apt-get upgrade sudo apt-get install build-essential tk-dev libncurses5-dev libncursesw5-dev libreadline6-dev libdb5.3-dev libgdbm-dev libsqlite3-dev libssl-dev libbz2-dev libexpat1-dev liblzma-dev zlib1g-dev sudo apt autoremove pip3.6 install --upgrade pip pip3.6 install mxnet git clone https://github.com/apache/incubator-mxnet.git --recursive Device Type 2: Raspberry Pi Enhanced with Movidius Neural Compute Stick I have updated the code to work with the new Movidius NCSDK 2.05. See: https://github.com/tspannhw/StrataNYC2018/blob/master/all2.py I also updated some variable formatting and added some additional values. Evolve that schema! So you can see some additional data: {"uuid": "mxnet_uuid_json_20180911021437.json", "label3": "n04081281 restaurant, eating house, eating place, eatery", "label1": "n03179701 desk", "roll": 4.0, "y": 0.0, "value5": "3.5%", "ipaddress": "192.168.1.156", "top5": "n03637318 lampshade, lamp shade", "label5": "n02788148 bannister, banister, balustrade, balusters, handrail", "host": "sensehatmovidius", "cputemp": 53, "top3pct": "6.5%", "diskfree": "5289.1 MB", "pressure": 1018.6, "cafferuntime": "111.685844ms", "label4": "n04009552 projector", "top4": "n03742115 medicine chest, medicine cabinet", "humidity": 42.5, "cputemp2": 52.62, "value2": "6.1%", "value3": "6.0%", "top2pct": "6.9%", "top1": "n02788148 bannister, banister, balustrade, balusters, handrail", "top4pct": "6.4%", "currenttime": "2018-09-11 02:14:44", "label2": "n03924679 photocopier", "top1pct": "7.3%", "top3": "n04286575 spotlight, spot", "starttime": "2018-09-11 02:14:33", "top5pct": "3.9%", "memory": 35.2, "value4": "5.0%", "top2": "n03250847 drumstick", "runtime": "11", "z": 1.0, "pitch": 360.0, "imagefilename": "/opt/demo/images/2018-09-10_2214.jpg", "tempf": 75.25, "temp": 35.14, "yaw": 86.0, "value1": "8.5%", "x": 0.0} Apache NiFi and MiniFi Process, Proxy, Access, Filter and Transform Data Anywhere, Anytime, Any Platform Apache NiFi and minifi Works in Moab Utah Resources:
https://github.com/tspannhw/StrataNYC2018 https://www.geomesa.org/documentation/current/tutorials/geomesa-quickstart-nifi.html
https://github.com/cinci/rpi-sense-hat-java https://movidius.github.io/ncsdk/install.html https://movidius.github.io/ncsdk/tf_modelzoo.html https://github.com/movidius/ncappzoo/ https://github.com/movidius/ncappzoo/blob/ncsdk2/tensorflow/facenet/README.md https://github.com/movidius/ncappzoo/blob/ncsdk2/tensorflow/inception_v4/README.md https://medium.com/tensorflow/tensorflow-1-9-officially-supports-the-raspberry-pi-b91669b0aa0 https://github.com/lhelontra/tensorflow-on-arm/releases/download/v1.10.0/tensorflow-1.10.0-cp35-none-linux_armv7l.whl https://github.com/movidius/ncappzoo/blob/ncsdk2/apps/image-classifier/README.md
... View more
Labels:
08-31-2018
05:31 PM
4 Kudos
IoT Edge Processing with Apache NiFi and MiniFi and Multiple Deep Learning Libraries Series For: https://conferences.oreilly.com/strata/strata-ny/public/schedule/detail/68140 See Part 1: https://community.hortonworks.com/articles/215079/iot-edge-processing-with-deep-learning-on-hdf-32-a.html See Part 2: https://community.hortonworks.com/articles/215258/iot-edge-processing-with-deep-learning-on-hdf-32-a-1.html Hive - SQL - IoT Data Storage In this section, we will focus on converting JSON to AVRO to Apache ORC and storage options in Apache Hive 3. I am doing two styles of storage for one of the tables, rainbow. I am storing ORC files with an external table as well as using the Streaming API to store into an ACID table. NiFi - SQL - On Streams - Calcite SELECT *
FROM FLOWFILE
WHERE CAST(memory AS FLOAT) > 0
SELECT *
FROM FLOWFILE
WHERE CAST(tempf AS FLOAT) > 65 I check the flows as they are ingested real-time and filter based on conditions such as memory or temperature. This makes for some powerful and easy simple event processing. This is very handy when you may want to filter out standard conditions where no anomaly has occurred. IoT Data Storage Options For time series data, we are blessed with many options in HDP 3.x. The simplest choice I am doing first here. That's a simple Apache Hive 3.x table. This is where we have some tough decisions which engine to use. Hive has the best, most complete SQL and lots of interfaces. This is my default choice for where and how to store my data. If it was more than a few thousand rows a second and has a timestamp then we have to think about the architecture. Apache Druid has a lot of amazing abilities with time series data like what's coming out of these IoT devices. Since we can join Hive and Druid data and put Hive tables on top of Druid, we really should consider using Druid for our storage handler. https://cwiki.apache.org/confluence/display/Hive/Druid+Integration https://cwiki.apache.org/confluence/display/Hive/StorageHandlers https://hortonworks.com/blog/apache-hive-druid-part-1-3/ https://github.com/apache/hive/blob/master/druid-handler/src/java/org/apache/hadoop/hive/druid/DruidStorageHandlerUtils.java https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.0/using-druid/content/druid_anatomy_of_hive_to_druid.html We could create a Hive table backed by Druid thusly: CREATE TABLE rainbow_druid STOREDBY 'org.apache.hadoop.hive.
druid.DruidStorageHandler' TBLPROPERTIES (
"druid.segment.granularity" = "MONTH",
"druid.query.granularity" = "DAY") AS SELECT ts as`__time`, cast(tempf as string) s_tempf, ipaddress, cast(altitude as string) s_altitude, host, diskfree, FROM RAINBOW; For second and sub-second data, we need to consider either Druid or HBase. The nice thing is these NoSQL options also have SQL interfaces to use. It comes down to how you are going to query the data and which one you like. HBase + Phoenix is performant and been used in production forever. With HBase 2.x there are really impressive updates that make this a good option. For richer analytics and some really cool analytics with Apache Superset, it's hard not to recommend Druid. Apache Druid has really been improved recently and well integrated with Hive 3's rich querying. Example of Our Geo Data {"speed": "0.145", "diskfree": "4643.2 MB", "altitude": "6.2", "ts": "2018-08-30 17:47:03", "cputemp": 52.0, "latitude": "38.9789405", "track": "0.0", "memory": 26.5, "host": "rainbow", "uniqueid": "gps_uuid_20180830174705", "ipaddress": "172.20.10.8", "epd": "nan", "utc": "2018-08-30T17:47:05.000Z", "epx": "21.91", "epy": "31.536", "epv": "73.37", "ept": "0.005", "eps": "63.07", "longitude": "-74.824475167", "mode": "3", "time": "1535651225.0", "climb": "0.0", "epc": "nan"} Hive 3 Tables CREATE EXTERNAL TABLE IF NOT EXISTS rainbow (tempf DOUBLE, cputemp DOUBLE, pressure DOUBLE, host STRING, uniqueid STRING, ipaddress STRING, temp DOUBLE, diskfree STRING, altitude DOUBLE, ts STRING,
tempf2 DOUBLE, memory DOUBLE) STORED AS ORC LOCATION '/rainbow' create table rainbowacid(tempf DOUBLE, cputemp DOUBLE, pressure DOUBLE, host STRING, uniqueid STRING, ipaddress STRING, temp DOUBLE, diskfree STRING, altitude DOUBLE, ts STRING,
tempf2 DOUBLE, memory DOUBLE) STORED AS ORC
TBLPROPERTIES ('transactional'='true') CREATE EXTERNAL TABLE IF NOT EXISTS gps (speed STRING, diskfree STRING, altitude STRING, ts STRING, cputemp DOUBLE, latitude STRING, track STRING, memory DOUBLE, host STRING, uniqueid STRING, ipaddress STRING, epd STRING, utc STRING, epx STRING, epy STRING, epv STRING, ept STRING, eps STRING, longitude STRING, mode STRING, time STRING, climb STRING, epc STRING) STORED AS ORC LOCATION '/gps' CREATE TABLE IF NOT EXISTS gpsacid (speed STRING, diskfree STRING, altitude STRING, ts STRING, cputemp DOUBLE, latitude STRING, track STRING, memory DOUBLE, host STRING, uniqueid STRING, ipaddress STRING, epd STRING, utc STRING, epx STRING, epy STRING, epv STRING, ept STRING, eps STRING, longitude STRING, mode STRING, `time` STRING, climb STRING, epc STRING) STORED AS ORC TBLPROPERTIES ('transactional'='true') CREATE EXTERNAL TABLE IF NOT EXISTS movidiussense (label5 STRING, runtime STRING, label1 STRING, diskfree STRING, top1 STRING, starttime STRING, label2 STRING, label3 STRING, top3pct STRING, host STRING, top5pct STRING, humidity DOUBLE, currenttime STRING, roll DOUBLE, uuid STRING, label4 STRING, tempf DOUBLE, y DOUBLE, top4pct STRING, cputemp2 DOUBLE, top5 STRING, top2pct STRING, ipaddress STRING, cputemp INT, pitch DOUBLE, x DOUBLE, z DOUBLE, yaw DOUBLE, pressure DOUBLE, top3 STRING, temp DOUBLE, memory DOUBLE, top4 STRING, imagefilename STRING, top1pct STRING, top2 STRING) STORED AS ORC LOCATION '/movidiussense' CREATE EXTERNAL TABLE IF NOT EXISTS minitensorflow2 (image STRING, ts STRING, host STRING, score STRING, human_string STRING, node_id INT) STORED AS ORC LOCATION '/minifitensorflow2' Resources: https://github.com/tspannhw/StrataNYC2018 https://www.geomesa.org/documentation/current/tutorials/geomesa-quickstart-nifi.html https://github.com/cinci/rpi-sense-hat-java
... View more
Labels:
08-31-2018
02:34 PM
1 Kudo
IoT Edge Processing with Deep Learning on HDF 3.2 and HDP 3.0 - Part 2 For: https://conferences.oreilly.com/strata/strata-ny/public/schedule/detail/68140 See Pre-Work: https://community.hortonworks.com/articles/203638/ingesting-multiple-iot-devices-with-apache-nifi-17.html See Part 1: https://community.hortonworks.com/articles/215079/iot-edge-processing-with-deep-learning-on-hdf-32-a.html Step By Step Processing Step 1: Install Apache NiFi (One or More Nodes or clusters)
Choose: https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.2.0/installing-hdf/content/install-ambari.html
or
docker pull hortonworks/nifi
Apache NiFi Configuration for IoT
https://community.hortonworks.com/articles/67756/ingesting-log-data-using-minifi-nifi.html
You will need to set: nifi.remote.input.host and nifi.remote.input.socket.port in the conf/nifi.properties or Ambari settings. Step 2: Install Apache NiFi - MiniFi on Your Device(s) Download MiniFi (https://nifi.apache.org/minifi/download.html)
You can choose Java or C++. For your first usage, I recommend the Java edition unless your device is too small.
You can also install on a RHEL or Debian Linux machine or OSX.
Download MiniFi Toolkit (https://nifi.apache.org/minifi/minifi-toolkit.html)
Resources:
https://cwiki.apache.org/confluence/display/MINIFI/Release+Notes#ReleaseNotes-Versioncpp-0.5.0 https://cwiki.apache.org/confluence/display/MINIFI/Release+Notes#ReleaseNotes-Version0.5.0 https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.1.2/bk_release-notes/content/ch_hdf_relnotes.html#centos7 https://community.hortonworks.com/articles/108947/minifi-for-ble-bluetooth-low-energy-beacon-data-in.html https://community.hortonworks.com/content/kbentry/107379/minifi-for-image-capture-and-ingestion-from-raspbe.html Step 3: Install Apache MXNet (On MiniFi Devices and NiFi Nodes - optional) https://mxnet.incubator.apache.org/install/index.html?platform=Devices&language=Python&processor=CPU
Install build tools and build from scratch
Walk through install: https://community.hortonworks.com/articles/176932/apache-deep-learning-101-using-apache-mxnet-on-the.html
Resources and Source
https://github.com/tspannhw/StrataNYC2018
rainbow-processing.xml
rainbow-gateway-processing.xml
display-images-server.xml
rainbowminifi.xml
https://community.hortonworks.com/articles/203638/ingesting-multiple-iot-devices-with-apache-nifi-17.html Resources and Source https://github.com/tspannhw/StrataNYC2018 rainbow-processing.xml rainbow-gateway-processing.xml display-images-server.xml rainbowminifi.xml
... View more
Labels:
08-29-2018
03:43 PM
3 Kudos
IoT Edge Processing with Apache NiFi and MiniFi and Multiple Deep Learning Libraries Series For: https://conferences.oreilly.com/strata/strata-ny/public/schedule/detail/68140 In preparation for my talk on utilizing edge devices for deep learning, IoT sensor reading and big data processing I have updated my environment to the latest and greatest tools available. With the upgrade of HDF to 3.2, I can now use Apache NiFi 1.7 and MiniFi 0.5 for IoT data ingestion, simple event processing, conversion, data processing, data flow and storage. The architecture diagram above shows the basic flow we are utilizing. IoT Step by Step
Raspberry Pi with latest patches, Python, GPS software, USB Camera, Sensor libraries, Java 8, MiniFi 0.5, TensorFlow and Apache MXNet installed. minifi flow pushes JSON and JPEGs over HTTP(s) / Site-to-Site to an Apache NiFi gateway server. Option: NiFi can push to a central NiFi cloud cluster and/or Kafka cluster both of which running on HDF 3.2 environments. Apache NiFi cluster pushes to Hive, HDFS, Dockerized API running in HDP 3.0 and Third Party APIs. NiFi and Kafka integrate with Schema Registry for our tabular data including rainbow and gps JSON data. SQL Tables in Hive I stream my data into Apache ORC files stored on HDP 3.0 HDFS directories and build external tables on them. CREATE EXTERNAL TABLE IF NOT EXISTS rainbow (tempf DOUBLE, cputemp DOUBLE, pressure DOUBLE, host STRING, uniqueid STRING, ipaddress STRING, temp DOUBLE, diskfree STRING, altitude DOUBLE, ts STRING,
tempf2 DOUBLE, memory DOUBLE)
STORED AS ORC LOCATION '/rainbow';
CREATE EXTERNAL TABLE IF NOT EXISTS gps (speed STRING, diskfree STRING, altitude STRING, ts STRING, cputemp DOUBLE, latitude STRING, track STRING, memory DOUBLE, host STRING, uniqueid STRING, ipaddress STRING, epd STRING, utc STRING, epx STRING, epy STRING, epv STRING, ept STRING, eps STRING, longitude STRING, mode STRING, time STRING, climb STRING, epc STRING)
STORED AS ORC LOCATION '/gps';
For my processing needs I also have a Hive 3 ACID table for general table usage and updates. create table rainbowacid(tempf DOUBLE, cputemp DOUBLE, pressure DOUBLE, host STRING, uniqueid STRING, ipaddress STRING, temp DOUBLE, diskfree STRING, altitude DOUBLE, ts STRING,
tempf2 DOUBLE, memory DOUBLE) STORED AS ORC
TBLPROPERTIES ('transactional'='true');
CREATE TABLE IF NOT EXISTS gpsacid (speed STRING, diskfree STRING, altitude STRING, ts STRING, cputemp DOUBLE, latitude STRING, track STRING, memory DOUBLE, host STRING, uniqueid STRING, ipaddress STRING, epd STRING, utc STRING, epx STRING, epy STRING, epv STRING, ept STRING, eps STRING, longitude STRING, mode STRING, time STRING, climb STRING, epc STRING) STORED AS ORC
TBLPROPERTIES ('transactional'='true');
Then I load my initial data. insert into rainbowacid
select * from rainbow;
insert into gpsacid
select * from gps; Hive 3.x Updates %jdbc(hive) CREATE TABLE Persons_default (
ID Int NOT NULL,
Name String NOT NULL,
Age Int,
Creator String DEFAULT CURRENT_USER(),
CreateDate Date DEFAULT CURRENT_DATE()
) One of the cool new features in Hive is that you can now have defaults, as you can see which are helpful for things like standard defaults you might want like current data. This gives us even more relational style features in Hive. Another very interesting feature is materialized views which help you for having clean and fast subqueries. Here is a cool example: CREATE MATERIALIZED VIEW mv1
AS
SELECT dest,origin,count(*)
FROM flights_hdfs
GROUP BY dest,origin References: https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.0/hive-overview/content/hive_whats_new_in_this_release_hive.html https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.0/using-hiveql/content/hive_3_internals.html https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.0/hive-overview/content/hive-apache-hive-3-architecturural-overview.html https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.0/materialized-view/content/hive_create_materialized_view.html
... View more
Labels:
07-26-2018
11:56 AM
if you add -ot json to the end your output will be in JSON format. Which you can parse with your favorite tool, again I was thinking to call it from NiFi and process the output. Perhaps this is Nifinception all over again, perhaps this is Nifinception all over again. By default it's nice, easy to read text which is perfect if a person is watching it. It's really cool to be able to start and stop processor groups remotely via a command line command.
... View more
07-24-2018
09:02 PM
6 Kudos
HDF DevOps
It's become enough of an ask that I had to post to answer it. The ask is something like this, "What's with all this UI stuff, I want devops, automation, command line et al." So did I in 2002. It's nice to see everything and have a nice diagram on a website without any extra tools or ssh. Okay that didn't convince anyone. So here is a proper DevOps solution for you.
Option 1: REST
The full documentation for the NiFi REST API is here: https://nifi.apache.org/docs/nifi-docs/rest-api/
The follow is some examples I have accessed via CURL (if you have security, you will need to account for that, see the specifications.
curl http://hw13125.local:8080/nifi-api/resources
curl http://hw13125.local:8080/nifi-api/tenants/user-groups
curl http://hw13125.local:8080/nifi-api/tenants/users
curl http://hw13125.local:8080/nifi-api/flow/about
curl http://hw13125.local:8080/nifi-api/flow/banners
curl http://princeton1.field.hortonworks.com:8080/nifi-api/flow/bulletin-board
curl http://hw13125.local:8080/nifi-api/flow/cluster/summary
curl http://hw13125.local:8080/nifi-api/flow/config
{"flowConfiguration":{"supportsManagedAuthorizer":false,"supportsConfigurableAuthorizer":false,"supportsConfigurableUsersAndGroups":false,"autoRefreshIntervalSeconds":30,"currentTime":"16:12:51 EDT","timeOffset":-14400000,"defaultBackPressureObjectThreshold":10000,"defaultBackPressureDataSizeThreshold":"1 GB"}}%
curl http://hw13125.local:8080/nifi-api/flow/controller/bulletins
curl http://princeton1.field.hortonworks.com:8080/nifi-api/flow/history\?offset\=1\&count\=100
curl http://princeton1.field.hortonworks.com:8080/nifi-api/flow/prioritizers
curl http://princeton1.field.hortonworks.com:8080/nifi-api/flow/processor-types
curl http://princeton1.field.hortonworks.com:8080/nifi-api/flow/registries
curl http://princeton1.field.hortonworks.com:8080/nifi-api/flow/reporting-tasks
curl http://princeton1.field.hortonworks.com:8080/nifi-api/flow/search-results\?\q\=mxnet
curl http://princeton1.field.hortonworks.com:8080/nifi-api/flow/status
curl http://princeton1.field.hortonworks.com:8080/nifi-api/flow/templates
curl http://hw13125.local:8080/nifi-api/system-diagnostics
curl http://hw13125.local:8080/nifi-api/flow/controller/bulletins
curl http://hw13125.local:8080/nifi-api/flow/status
curl http://hw13125.local:8080/nifi-api/flow/cluster/summary
curl http://hw13125.local:8080/nifi-api/site-to-site
curl http://princeton1.field.hortonworks.com:8080/nifi-api/flow/process-groups/root
curl http://princeton1.field.hortonworks.com:8080/nifi-api/flow/process-groups/root/controller-services
curl http://princeton1.field.hortonworks.com:8080/nifi-api/flow/process-groups/root/status
curl http://princeton1.field.hortonworks.com:8080/nifi-api/flow/process-groups/7a01d441-0164-1000-ec7a-54109819f084
Option 2: Python: http://nipyapi.readthedocs.io/en/latest/readme.html Now in version 0.91.
This library is awesome, very easy to use and I love it. See here for a deep dive: https://community.hortonworks.com/articles/177301/big-data-devops-apache-nifi-flow-versioning-and-au.html
Option 3: Forget about it, just use Ambari, NiFi, Cloudbreak and DPS. Problem solved. WebGUIs are killer.
Option 4: The New NiFi Toolkit CLI
Let's examine the New NiFi CLI. I am using the version for Apache NiFi 1.7.
To install the CLI, you need to download Apache NiFi Toolkit (https://github.com/apache/nifi/tree/master/nifi-toolkit/nifi-toolkit-cli)
(https://www.apache.org/dyn/closer.lua?path=/nifi/1.7.1/nifi-toolkit-1.7.1-bin.zip)
Once you unzip it, you can run one of two ways. With no parameters and you will bring up an interactive console.
Now you an type help to see a nice list of commands. I think of this like the Spark Shell or Apache Zeppelin where you can experiment, find out what you want then you can use that single command with your automation suite. The toolkit lets you automate a number of actions in Apache NiFi and it's registry.
Below are a number of non-interactive commands:
./bin/cli.sh nifi pg-list -u http://hw13125.local:8080 -ot json
./bin/cli.sh registry list-buckets -u http://localhost:18080
./bin/cli.sh nifi pg-status -u http://hw13125.local:8080 --processGroupId f10700ba-3d5e-30a8-ea5d-33c59771d4f1
./bin/cli.sh nifi pg-get-services -u http://hw13125.local:8080 --processGroupId f10700ba-3d5e-30a8-ea5d-33c59771d4f1
./bin/cli.sh registry list-flows -bucketId 36cb79a4-f735-4f77-ba55-606718a9c3c9 -u http://localhost:18080
./bin/cli.sh registry list-buckets -u http://princeton1.field.hortonworks.com:18080/
./bin/cli.sh registry list-flows -u http://princeton1.field.hortonworks.com:18080/ -bucketIdentifier 36cb79a4-f735-4f77-ba55-606718a9c3c9
# Name Id Description- ------------------ ------------------------------------ -------------------------1 NiFi 1.7 Features 54b37ad8-274b-4d9d-a09c-0ee2816f271c NiFi 1.72 Rainbow Processing 5ebc2183-954e-4887-a28c-9d0ee54a02ed server rainbow processing
./bin/cli.sh registry export-flow-version -u http://princeton1.field.hortonworks.com:18080/ -f 5ebc2183-954e-4887-a28c-9d0ee54a02ed -o rainbow.json -ot json
How to Backup Registry
You can run this from the interactive command line or as a one-off command. You would have to capture the list of buckets from the list, use it to get flows and then use the list of flows to get versions. This could easily be in a for-loop in shell, Python, Go or automation scripting tool of your choice. I would probably do this in NiFi.
registry list-buckets -u http://localhost:18080
registry list-flows -u http://localhost:18080 -b 36cb79a4-f735-4f77-ba55-606718a9c3c9
registry export-flow-version -f 5ebc2183-954e-4887-a28c-9d0ee54a02ed -o rainbow.json -ot json
List What’s Running
nifi pg-list -u http://princeton1.field.hortonworks.com:8080
You will get a list of all the Processor Groups.
An Example Processor Group List from HDF NiFi Server in the Cloud
List of Commands
commands:
demo quick-import
nifi current-user
nifi get-root-id
nifi list-reg-clients
nifi create-reg-client
nifi update-reg-client
nifi get-reg-client-id
nifi pg-import
nifi pg-start
nifi pg-stop
nifi pg-get-vars
nifi pg-set-var
nifi pg-get-version
nifi pg-change-version
nifi pg-get-all-versions
nifi pg-list
nifi pg-status
nifi pg-get-services
nifi pg-enable-services
nifi pg-disable-services
registry current-user
registry list-buckets
registry create-bucket
registry delete-bucket
registry list-flows
registry create-flow
registry delete-flow
registry list-flow-versions
registry export-flow-version
registry import-flow-version
registry sync-flow-versions
registry transfer-flow-version
session keys
session show
session get
session set
session remove
session clear
exit
help
Transfer Between Servers (NiFi Registries)
registry transfer-flow-version
Transfers a version of a flow directly from one flow to another, without needing
to export/import. If --sourceProps is not specified, the source flow is assumed
to be in the same registry as the destination flow. If --sourceFlowVersion is
not specified, then the latest version will be transferred.
usage: transfer-flow-version
-f,--flowIdentifier <arg> A flow identifier
-h,--help Help
-kp,--keyPasswd <arg> The key password of the keystore being used
-ks,--keystore <arg> A keystore to use for TLS/SSL connections
-ksp,--keystorePasswd <arg> The password of the keystore being used
-kst,--keystoreType <arg> The type of key store being used (JKS or
PKCS12)
-ot,--outputType <arg> The type of output to produce (json or
simple)
-p,--properties <arg> A properties file to load arguments from,
command line values will override anything
in the properties file, must contain full
path to file
-pe,--proxiedEntity <arg> The identity of an entity to proxy
-sf,--sourceFlowIdentifier <arg> A flow identifier from the source registry
-sfv,--sourceFlowVersion <arg> A version of a flow from the source registry
-sp,--sourceProps <arg> A properties file to load for the source
-ts,--truststore <arg> A truststore to use for TLS/SSL connections
-tsp,--truststorePasswd <arg> The password of the truststore being used
-tst,--truststoreType <arg> The type of trust store being used (JKS or
PKCS12)
-u,--baseUrl <arg> The URL to execute the command against
-verbose,--verbose Indicates that verbose output should be
provided
An Example List of My Local Apache NiFi Flows
NIFI TOOLKIT Flow Analyzer
bin/flow-analyzer.sh
To run this with my massive amount of flows, I edited the flow-analyze.sh and upped java memory to below:
${JAVA_OPTS:--Xms2G -Xmx2G}
The rest of this article is a big command line dump, seems a huge text list is the way to go:
➜ nifi-toolkit-1.7.0 bin/flow-analyzer.sh /Volumes/seagate/apps/nifi-1.7.0/conf/flow.xml.gz
Using flow=/Volumes/seagate/apps/nifi-1.7.0/conf/flow.xml.gz
Total Bytes Utilized by System=519 GB
Max Back Pressure Size=1 GB
Min Back Pressure Size=1 GB
Average Back Pressure Size=0.990458015 GB
Max Flowfile Queue Size=10000
Min Flowfile Queue Size=10000
Avg Flowfile Queue Size=9904.580152672
bin/file-manager.sh
usage: org.apache.nifi.toolkit.admin.filemanager.FileManagerTool [-b <arg>] [-c <arg>] [-d <arg>] [-h] [-i <arg>] [-m] [-o <arg>] [-r <arg>] [-t <arg>] [-v] [-x]
This tool is used to perform backup, install and restore activities for a NiFi node.
-b,--backupDir <arg> Backup NiFi Directory (used with backup or restore operation)
-c,--nifiCurrentDir <arg> Current NiFi Installation Directory (used optionally with install or restore operation)
-d,--nifiInstallDir <arg> NiFi Installation Directory (used with install or restore operation)
-h,--help Print help info (optional)
-i,--installFile <arg> NiFi Install File
-m,--moveRepositories Allow repositories to be moved to new/restored nifi directory from existing installation, if available (used optionally with
install or restore operation)
-o,--operation <arg> File operation (install | backup | restore)
-r,--nifiRollbackDir <arg> NiFi Installation Directory (used with install or restore operation)
-t,--bootstrapConf <arg> Current NiFi Bootstrap Configuration File (optional)
-v,--verbose Set mode to verbose (optional, default is false)
-x,--overwriteConfigs Overwrite existing configuration directory with upgrade changes (used optionally with install or restore operation)
Java home: /Library/Java/Home
NiFi Toolkit home: /Volumes/seagate/apps/nifi-toolkit-1.7.0
Backups
nifi-toolkit-1.7.0 bin/file-manager.sh -o backup -b /Volumes/seagate/backupsNIFI/ -c /Volumes/seagate/apps/nifi-1.7.0 -v
➜ nifi-toolkit-1.7.0 bin/notify.sh
usage: org.apache.nifi.toolkit.admin.notify.NotificationTool [-b <arg>] [-d <arg>] [-h] [-l <arg>] [-m <arg>] [-p <arg>] [-v]
This tool is used to send notifications (bulletins) to a NiFi cluster.
-b,--bootstrapConf <arg> Existing Bootstrap Configuration file
-d,--nifiInstallDir <arg> NiFi Installation Directory
-h,--help Print help info
-l,--level <arg> Level for notification bulletin INFO,WARN,ERROR
-m,--message <arg> Notification message for nifi instance or cluster
-p,--proxyDn <arg> User or Proxy DN that has permission to send a notification. User must have view and modify privileges to 'access the controller'
in NiFi
-v,--verbose Set mode to verbose (default is false)
Java home: /Library/Java/Home
NiFi Toolkit home: /Volumes/seagate/apps/nifi-toolkit-1.7.0
nifi-toolkit-1.7.0 bin/s2s.sh
Must specify either Port Name or Port Identifier to build Site-to-Site client
s2s is a command line tool that can either read a list of DataPackets from stdin to send over site-to-site or write the received DataPackets to stdout
The s2s cli input/output format is a JSON list of DataPackets. They can have the following formats:
[{"attributes":{"key":"value"},"data":"aGVsbG8gbmlmaQ=="}]
Where data is the base64 encoded value of the FlowFile content (always used for received data) or
[{"attributes":{"key":"value"},"dataFile":"/Volumes/seagate/apps/nifi-toolkit-1.7.0/EXAMPLE"}]
Where dataFile is a file to read the FlowFile content from
Example usage to send a FlowFile with the contents of "hey nifi" to a local unsecured NiFi over http with an input port named input:
echo '[{"data":"aGV5IG5pZmk="}]' | bin/s2s.sh -n input -p http
usage: s2s
--batchCount <arg> Number of flow files in a batch
--batchDuration <arg> Duration of a batch
--batchSize <arg> Size of flow files in a batch
-c,--compression Use compression
-d,--direction <arg> Direction (valid directions: SEND, RECEIVE) (default: SEND)
-h,--help Show help message and exit
-i,--portIdentifier <arg> Port id
--keyStore <arg> Keystore
--keyStorePassword <arg> Keystore password
--keyStoreType <arg> Keystore type (default: JKS)
-n,--portName <arg> Port name
--needClientAuth Need client auth
-p,--transportProtocol <arg> Site to site transport protocol (default: RAW)
--peerPersistenceFile <arg> File to write peer information to so it can be recovered on restart
--penalization <arg> Penalization period
--proxyHost <arg> Proxy hostname
--proxyPassword <arg> Proxy password
--proxyPort <arg> Proxy port
--proxyUsername <arg> Proxy username
--timeout <arg> Timeout
--trustStore <arg> Truststore
--trustStorePassword <arg> Truststore password
--trustStoreType <arg> Truststore type (default: JKS)
-u,--url <arg> NiFI URL to connect to (default: http://localhost:8080/nifi)
I can see this being used for integration testing.
Notify
Send a bulletin to A Nifi Server
https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.1.2/bk_administration/content/notify.html
notify.sh
References:
https://community.hortonworks.com/articles/183217/devops-backing-up-apache-nifi-registry-flows.html
For Hadoop Friends https://community.hortonworks.com/articles/108610/hadoop-devops-better-together.html
https://community.hortonworks.com/articles/177349/big-data-devops-apache-nifi-hwx-schema-registry-sc.html
https://community.hortonworks.com/articles/177301/big-data-devops-apache-nifi-flow-versioning-and-au.html
https://community.hortonworks.com/articles/161761/new-features-in-apache-nifi-15-apache-nifi-registr.html
https://community.hortonworks.com/articles/191658/devops-tips-using-the-apache-nifi-toolkit-with-apa.html
https://community.hortonworks.com/articles/191546/automated-provisioning-of-hdp-for-data-governance.html
https://community.hortonworks.com/articles/202559/distributed-pricing-engine-using-dockerized-spark.html
https://github.com/tspannhw/BackupRegistry
... View more
Labels:
07-19-2018
01:49 PM
3 Kudos
Topic: IoT Edge Processing with Apache NiFi and MiniFi and Multiple Deep Learning Libraries Part 1: Multiple Devices with Data Keywords: Deep Learning On The Edge, GPS Ingestion, Sense-Hat and Rainbow Hat Sensor Ingest, WebCam Image Ingest In preparation for my talk at Strata in NYC, I am updating my IoT demos for more devices, more data types and more actions. I have three streams coming from each device including web camera images. When we are sending data from a MiniFi agent we need to define a port on an Apache NiFi server/cluster to receive it. So I design my MiniFi flow in the Apache NiFi UI (pretty soon there will be a special designer for this). You then highlight everything there and hit Create Template. You can then export it and convert to config.yml. Again, this process will be automated and connected with the NiFi Registry very shortly to make this less clicks. This is an example. When you connect to it in your flow you design in Apache NiFi UI you will connect to this port on the Remote Processor Group. If you are manually editing one (okay never do this, but sometimes I have to). You can copy that ID from this Port Details and past it in the file. Once MiniFi has it's config.yml and it's started, we will start getting messages to that Port. You can see I have two inputs, one for Movidius and one for Rainbow. I could just have one and route to what I want. It's up to you how you want to segment these flows. Welcome to Apache NiFi registry v0.2.0, this one works just as well. Very stable, but with some new magic. You can now connect to Git and Github!!!! We have structured JSON, so let's Infer a schema, clean it up and store it in the Hortonworks Schema Registry. That will make it versioned and REST enabled. I add one for the each of the two JSON file types I am sending from the rainbow device. You can see the schemas in full at the bottom of the article. The data is received from MiniFi on my local NiFi edge server for simple event processing, filtering and analysis. I route based on the two types of files, apply their schema, do a simple filter via SQL and send the converted AVRO formatted file to my cloud hosted cluster. Once I get the data I send it from my edge server to my cloud HDF 3.2 cluster. For images, I send them to my existing image storage processor group. For my other two types of files I convert them to Apache ORC and store in HDFS as Apache Hive tables. Server Dashboard Rainbow Processing Routing is Easy On High Humidity, Send a Slack Message (Query on humidity value) We can dive into any flowfile as it travels through the system and examine it's data and metadata. Now that my data is saved in HDFS with Hive tables on top I can use the latest version of Apache Zeppelin to analyze the data. I added some maps to Zeppelin via Helium, which is now available in HDP 3.0. I found a bunch of new chart types, this one could be insightful. So with the latest NiFi 1.7.1 and HDP 3.0 I can do a lot of interesting things. Next up, let's run some Dockerized TensorFlow application in my HDP 3.0 cluster. Strata Talk: https://conferences.oreilly.com/strata/strata-ny/public/schedule/detail/68140 Python Scripts https://github.com/tspannhw/StrataNYC2018/tree/master Schemas rainbow {
"type": "record",
"name": "rainbow",
"fields": [
{
"name": "tempf",
"type": "double",
"doc": "Type inferred from '84.15'"
},
{
"name": "cputemp",
"type": "double",
"doc": "Type inferred from '53.0'"
},
{
"name": "pressure",
"type": "double",
"doc": "Type inferred from '101028.56'"
},
{
"name": "host",
"type": "string",
"doc": "Type inferred from '\"rainbow\"'"
},
{
"name": "uniqueid",
"type": "string",
"doc": "Type inferred from '\"rainbow_uuid_20180718234222\"'"
},
{
"name": "ipaddress",
"type": "string",
"doc": "Type inferred from '\"192.168.1.165\"'"
},
{
"name": "temp",
"type": "double",
"doc": "Type inferred from '38.58'"
},
{
"name": "diskfree",
"type": "string",
"doc": "Type inferred from '\"4831.2 MB\"'"
},
{
"name": "altitude",
"type": "double",
"doc": "Type inferred from '80.65'"
},
{
"name": "ts",
"type": "string",
"doc": "Type inferred from '\"2018-07-18 23:42:22\"'"
},
{
"name": "tempf2",
"type": "double",
"doc": "Type inferred from '28.97'"
},
{
"name": "memory",
"type": "double",
"doc": "Type inferred from '32.3'"
}
]
}
gps {
"type": "record",
"name": "gps",
"fields": [
{
"name": "speed",
"type": "string",
"doc": "Type inferred from '\"0.066\"'"
},
{
"name": "diskfree",
"type": "string",
"doc": "Type inferred from '\"4830.3 MB\"'"
},
{
"name": "altitude",
"type": "string",
"doc": "Type inferred from '\"43.0\"'"
},
{
"name": "ts",
"type": "string",
"doc": "Type inferred from '\"2018-07-18 23:46:39\"'"
},
{
"name": "cputemp",
"type": "double",
"doc": "Type inferred from '54.0'"
},
{
"name": "latitude",
"type": "string",
"doc": "Type inferred from '\"40.2681555\"'"
},
{
"name": "track",
"type": "string",
"doc": "Type inferred from '\"0.0\"'"
},
{
"name": "memory",
"type": "double",
"doc": "Type inferred from '32.3'"
},
{
"name": "host",
"type": "string",
"doc": "Type inferred from '\"rainbow\"'"
},
{
"name": "uniqueid",
"type": "string",
"doc": "Type inferred from '\"gps_uuid_20180718234640\"'"
},
{
"name": "ipaddress",
"type": "string",
"doc": "Type inferred from '\"192.168.1.165\"'"
},
{
"name": "epd",
"type": "string",
"doc": "Type inferred from '\"nan\"'"
},
{
"name": "utc",
"type": "string",
"doc": "Type inferred from '\"2018-07-18T23:46:40.000Z\"'"
},
{
"name": "epx",
"type": "string",
"doc": "Type inferred from '\"40.135\"'"
},
{
"name": "epy",
"type": "string",
"doc": "Type inferred from '\"42.783\"'"
},
{
"name": "epv",
"type": "string",
"doc": "Type inferred from '\"171.35\"'"
},
{
"name": "ept",
"type": "string",
"doc": "Type inferred from '\"0.005\"'"
},
{
"name": "eps",
"type": "string",
"doc": "Type inferred from '\"85.57\"'"
},
{
"name": "longitude",
"type": "string",
"doc": "Type inferred from '\"-74.529094\"'"
},
{
"name": "mode",
"type": "string",
"doc": "Type inferred from '\"3\"'"
},
{
"name": "time",
"type": "string",
"doc": "Type inferred from '\"2018-07-18T23:46:40.000Z\"'"
},
{
"name": "climb",
"type": "string",
"doc": "Type inferred from '\"0.0\"'"
},
{
"name": "epc",
"type": "string",
"doc": "Type inferred from '\"nan\"'"
}
]
}
SQL %sql
CREATE EXTERNAL TABLE IF NOT EXISTS movidiussense (label5 STRING, runtime STRING, label1 STRING, diskfree STRING, top1 STRING, starttime STRING, label2 STRING, label3 STRING, top3pct STRING, host STRING, top5pct STRING, humidity DOUBLE, currenttime STRING, roll DOUBLE, uuid STRING, label4 STRING, tempf DOUBLE, y DOUBLE, top4pct STRING, cputemp2 DOUBLE, top5 STRING, top2pct STRING, ipaddress STRING, cputemp INT, pitch DOUBLE, x DOUBLE, z DOUBLE, yaw DOUBLE, pressure DOUBLE, top3 STRING, temp DOUBLE, memory DOUBLE, top4 STRING, imagefilename STRING, top1pct STRING, top2 STRING) STORED AS ORC LOCATION '/movidiussense'
%sql
CREATE EXTERNAL TABLE IF NOT EXISTS minitensorflow2 (image STRING, ts STRING, host STRING, score STRING, human_string STRING, node_id INT) STORED AS ORC LOCATION '/minifitensorflow2'
%sql
CREATE EXTERNAL TABLE IF NOT EXISTS gps (speed STRING, diskfree STRING, altitude STRING, ts STRING, cputemp DOUBLE, latitude STRING, track STRING, memory DOUBLE, host STRING, uniqueid STRING, ipaddress STRING, epd STRING, utc STRING, epx STRING, epy STRING, epv STRING, ept STRING, eps STRING, longitude STRING, mode STRING, time STRING, climb STRING, epc STRING) STORED AS ORC LOCATION '/gps'
%sql
CREATE EXTERNAL TABLE IF NOT EXISTS rainbow (tempf DOUBLE, cputemp DOUBLE, pressure DOUBLE, host STRING, uniqueid STRING, ipaddress STRING, temp DOUBLE, diskfree STRING, altitude DOUBLE, ts STRING,
tempf2 DOUBLE, memory DOUBLE) STORED AS ORC LOCATION '/rainbow'
References
https://community.hortonworks.com/articles/176932/apache-deep-learning-101-using-apache-mxnet-on-the.html https://cwiki.apache.org/confluence/display/MINIFI/Release+Notes#ReleaseNotes-Versioncpp-0.5.0 https://cwiki.apache.org/confluence/display/MINIFI/Release+Notes#ReleaseNotes-Version0.5.0 https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.1.2/bk_release-notes/content/ch_hdf_relnotes.html#centos7 https://community.hortonworks.com/articles/108947/minifi-for-ble-bluetooth-low-energy-beacon-data-in.html https://community.hortonworks.com/content/kbentry/107379/minifi-for-image-capture-and-ingestion-from-raspbe.html NiFi Flows rainbow-server-processing.xml rainbow-minifi-ingest-in-nifi.xml
... View more
Labels: