1973
Posts
1225
Kudos Received
124
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 2494 | 04-03-2024 06:39 AM | |
| 3850 | 01-12-2024 08:19 AM | |
| 2083 | 12-07-2023 01:49 PM | |
| 3075 | 08-02-2023 07:30 AM | |
| 4215 | 03-29-2023 01:22 PM |
03-15-2018
01:30 PM
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_spark-component-guide/content/spark-dataframe-api.html
http://spark.apache.org/docs/2.2.0/ http://spark.apache.org/docs/2.2.0/rdd-programming-guide.html http://spark.apache.org/docs/2.2.0/quick-start.html https://community.hortonworks.com/articles/151164/how-to-submit-spark-application-through-livy-rest.html curl -H "Content-Type: application/json" -H "X-Requested-By: admin" -X POST -d '{"file": "/apps/example.jar","className": "com.dataflowdeveloper.example.Links"}' http://server:8999/batches curl -H "Content-Type: application/json" -H "X-Requested-By: admin" -X POST -d '{"file": "hdfs://server:8020/apps/example_2.11-1.0.jar","className": "com.dataflowdeveloper.example.Links"}' http://server:8999/batches FYI 18/03/14 11:54:54 INFO LineBufferedStream: stdout: 18/03/14 11:54:54 INFO Client: Source and destination file systems are the same. Not copying hdfs://server:8020/opt/demo/example.jar
... View more
03-11-2018
03:02 PM
2 Kudos
Extracting Text or HTML from PDF, Excel and Word Documents via Apache NiFi This version has been tested with HDF 3.1 and Apache NiFi 1.5. This processor is using Apache Tika 1.17 and is a non-supported Open Source Community processor that I have written. A user posted asking about HTML output, I took a look and it was easy so I added an option for that. Apache NiFi Flow You must download or build the nifi-extracttextprocessor nar and put in your lib, then you can add the processor. Select html or text Here's is the autogenerate documentation: You can see we set the output mime.type to text/html. Apache NiFi Example Flow to Read a File and Convert to HTML Source and Junit in Eclipse Example Output HTML <html xmlns="http://www.w3.org/1999/xhtml">
<head><meta name="pdf:PDFVersion" content="1.3"/>
<meta name="X-Parsed-By" content="org.apache.tika.parser.DefaultParser"/>
<meta name="X-Parsed-By" content="org.apache.tika.parser.pdf.PDFParser"/>
<meta name="xmp:CreatorTool" content="Rave (http://www.nevrona.com/rave)"/>
<meta name="access_permission:modify_annotations" content="true"/>
<meta name="access_permission:can_print_degraded" content="true"/>
<meta name="meta:creation-date" content="2006-03-01T07:28:26Z"/>
<meta name="created" content="Wed Mar 01 02:28:26 EST 2006"/>
<meta name="access_permission:extract_for_accessibility" content="true"/><meta name="access_permission:assemble_document" content="true"/><meta name="xmpTPg:NPages" content="2"/><meta name="Creation-Date" content="2006-03-01T07:28:26Z"/><meta name="dcterms:created" content="2006-03-01T07:28:26Z"/><meta name="dc:format" content="application/pdf; version=1.3"/><meta name="access_permission:extract_content" content="true"/><meta name="access_permission:can_print" content="true"/><meta name="pdf:docinfo:creator_tool" content="Rave (http://www.nevrona.com/rave)"/><meta name="access_permission:fill_in_form" content="true"/><meta name="pdf:encrypted" content="false"/><meta name="producer" content="Nevrona Designs"/><meta name="access_permission:can_modify" content="true"/><meta name="pdf:docinfo:producer" content="Nevrona Designs"/><meta name="pdf:docinfo:created" content="2006-03-01T07:28:26Z"/>
<meta name="Content-Type" content="application/pdf"/>
<title></title></head>
<body>
<div class="page"><p/><p>
A Simple PDF File
This is a small demonstration .pdf file -</p><p> just for use in the Virtual Mechanics tutorials. More text. And moretext. And more text. And more text. And more text.
</p><p> And more text. And more text. And more text. And more text. And moretext. And more text. Boring, zzzzz. And more text. And more text. Andmore text. And more text. And more text. And more text. And more text.And more text. And more text.</p><p> And more text. And more text. And more text. And more text. And moretext. And more text. And more text. Even more. Continued on page 2 ...</p><p/></div>
<div class="page"><p/><p>
Simple PDF File 2...continued from page 1. Yet more text. And more text. And more text.And more text. And more text. And more text. And more text. And moretext. Oh, how boring typing this stuff. But not as boring as watching paint dry. And more text. And more text. And more text. And more text.Boring. More, a little more text. The end, and just as well.
</p><p/></div></body></html> Source Code: https://github.com/tspannhw/nifi-extracttext-processor NAR Release https://github.com/tspannhw/nifi-extracttext-processor/releases/tag/html Resources: See Part 1: https://community.hortonworks.com/articles/81694/extracttext-nifi-custom-processor-powered-by-apach.html https://community.hortonworks.com/articles/76924/data-processing-pipeline-parsing-pdfs-and-identify.html https://community.hortonworks.com/articles/163776/parsing-any-document-with-apache-nifi-15-with-apac.html
... View more
Labels:
03-11-2018
01:43 AM
3 Kudos
Big Data DevOps: Part 2: Schemas, Schemas, Schemas. Know Your Records, Know Your DataTypes, Know Your Fields, Know Your Data. Since we can do process records in Apache NiFi, Streaming Analytics Manager, Apache Kafka and any tool that can work with a schema, we have a real need to use a Schema Registry. I have mentioned them before. One thing that is important is to be able to automate the management of schemas. Today we will be listing and exporting them for backup and migration purposes. We will also cover how to upload new schemas and version of schemas. The steps to backup schemas with Apache NiFi 1.5+ is easy. Backup All Schemas
GetHTTP: Get the List of Schemas for SR via GET SplitJson to turn list into individual records EvaluateJsonPath: get the schema name. InvokeHTTP: get the schema body EvaluateJsonPath: turn the schema text into a separate flow file Rename and save both the full JSON record from the registry and the schema only. NiFi Flow Initial Call to List All Schemas Get The Schema Name Example Schema with Text An Example of JSON Schema Text Build a New Flow File from The Schema Text JSON Get the Latest Version of the Schema Text For this Schema By Name The List Returned Swagger Documentation for SR Example Flow backup-schema.xml Schema List JSON Formatting "entities" : [ {
"schemaMetadata" : {
"type" : "avro",
"schemaGroup" : "Kafka",
"name" : "adsb",
"description" : "adsb",
"compatibility" : "BACKWARD",
"validationLevel" : "ALL",
"evolve" : true
},
"id" : 3,
"timestamp" : 1520460239420 Get Schema List REST URL (GET) http://server:7788/api/v1/schemaregistry/schemas Get Schema Body REST URL (GET) http://server:7788/api/v1/schemaregistry/schemas/${schema}/versions/latest?branch=MASTER See: https://community.hortonworks.com/articles/177301/big-data-devops-apache-nifi-flow-versioning-and-au.html If you wish you can use the Confluent style API against SR and against Confluent Schema Registry. it is slighty different, but easy to change our REST calls to process this. Swagger Docs http://YourHWXRegistry:7788/swagger#!/4._Confluent_Schema_Registry_compatible_API/getSubjects Hortonworks Schema Registry from HDF 3.1 https://community.hortonworks.com/articles/171893/hdf-31-executing-apache-spark-via-executesparkinte-1.html
... View more
Labels:
03-12-2018
01:37 PM
For now, you can use this NiFi flow to do schema registry stuff: https://community.hortonworks.com/articles/177349/big-data-devops-apache-nifi-hwx-schema-registry-sc.html
... View more
03-08-2018
08:18 PM
2 Kudos
This is for people preparing to attend my talk on Deep Learning at DataWorks Summit Berling 2018 (https://dataworkssummit.com/berlin-2018/#agenda) on Thursday April 19, 2018 at 11:50AM Berlin time. In this example we required Apache NiFi 1.5 or newer. This is part 2 of https://community.hortonworks.com/articles/155435/using-the-new-mxnet-model-server.html Our flow that receives the JSON files from the server does some minimal processing. We add some meta data fields, infer an AVRO schema from the JSON file (we only need to do this once in development and then you can delete that box from your flow). As you can see I can easily push that data to HDFS as a parquet file. This is if you wish to not install Apache MXNet on your HDF, HDP or related nodes. You can now install Apache MXNet plus MMS on a cloud or edge server and call it via HTTP from Apache NiFi for processing. Local Apache NiFi Flow To Call Our SSD Predict and Squeeze Net Predict REST Services Cluster Receiving The Two Remote Ports Server Apache NiFi Flow Example Squeeze Net JSON Data Processed by Apache NiFi Set the Schema and Mime Type Storage Settings For Apache Parquet Files on HDFS SSD MMS Logs Squeeze Net MMS Logs Schemas Used An Example Prediction returned, as you can see you get the coordinates for drawing a box. To Store Apache Parquet Files: hdfs dfs -mkdir /ssdpredict hdfs dfs -chmod 755 /ssdpredict Inside one of the files stored by Apache NiFi in HDFS, as your can see there is an embedded Apache Avro schema in JSON format built by Avro Parquet MR tool version 1.8.2. parquet.avro.schema�{"type":"record","name":"ssdpredict","fields":[{"name":"prediction","type":{"type":"array","items":{"type":"array","items":["string","int"]}},"doc":"Type inferred from '[[\"person\",385,329,466,498],[\"bicycle\",96,386,274,498]]'"}]}writer.model.nameavroIparquet-mr version 1.8.2 (build c6522788629e590a53eb79874b95f6c3ff11f16c)sPAR1 Example File -rw-r--r-- 3 nifi hdfs 688 2018-03-08 18:32 /ssdpredict/201801081202602.jpg.parquet.avro Apache NiFi Flow File: apache-mxnet-cluster-processing.xml Reference: http://parquet.apache.org/documentation/latest/
... View more
Labels:
03-07-2018
09:26 PM
3 Kudos
Ingest All The Things Series: Flight Data Via Radio I am using the FlightAware Pro Stick Plus ADS-B USB Receiver with Built-in Filter on a Mac, I should hook this up to one of my raspberry pis and add a longer antenna outside. You need a good antenna, a good location and nothing blocking your signal. It also depends on what air traffic is nearby. For a proof of concept it's pretty cool to see air data going through a cheap USB stick into a computer stored in a file and loaded into Apache NiFi to send on for data processing. There is a web server you can run to see the planes on a map which is pretty cool, but I want to just ingest the data for processing. My Equipment If you wish to watch the data flash by in a command-line interface, you can run with interactive flag and watch all the updates. We are dumping the data as it streams as raw text into a file. A snippet of it tailed in Apache NiFi is shown below: We are also ingesting ADS-B data that is provided by an open data REST API (https://public-api.adsbexchange.com..) at https://www.adsbexchange.com/. Like everything else, we may want to add a schema to parse into records. Our ingest flow: I am getting the REST data from the ADSB Exchange REST API, tailing the raw text dump from dump1090 and reading the aircraft history JSON files produced by dump1090 as well. For further processing, I send the Aircraft History JSON files to my server cluster to send to a cloud hosted MongoDB database. Thanks to a free tier from mLab. And our data quickly arrives as JSON in Mongo. The main install is the dump1090 github and is pretty straight forward. Installation on OSX brew update
brew install librtlsdr pkg-config
make Running ./run2.sh >> raw.txt 2>&1
run2.sh
./dump1090 --net --lat 40.265887 --lon -74.534610 --modeac --mlat --write-json-every 1 --json-location-accuracy 2 --write-json /volumes/seagate/projects/dump1090/data I have entered my local latitude and longitude above. I also write to a local directory that we will read from in Apache NiFi. Example Data { "now" : 1507344034.5,
"messages" : 1448,
"aircraft" : [
{"hex":"a6cb48","lat":40.169403,"lon":-74.526123,"nucp":7,"seen_pos":6.1,"altitude":33000,"mlat":[],"tisb":[],"messages":9,"seen":4.9,"rssi":-6.1},
{"hex":"a668e2","altitude":17250,"mlat":[],"tisb":[],"messages":31,"seen":4.2,"rssi":-7.9},
{"hex":"a8bcdd","flight":"NKS710 ","lat":40.205841,"lon":-74.491150,"nucp":7,"seen_pos":1.5,"altitude":9875,"vert_rate":0,"track":45,"speed":369,"category":"A0","mlat":[],"tisb":[],"messages":17,"seen":1.5,"rssi":-5.0},
{"hex":"a54cd9","mlat":[],"tisb":[],"messages":44,"seen":94.4,"rssi":-7.2},
{"hex":"a678c3","mlat":[],"tisb":[],"messages":60,"seen":133.2,"rssi":-7.1},
{"hex":"a1ff83","mlat":[],"tisb":[],"messages":47,"seen":212.3,"rssi":-7.9},
{"hex":"a24ce0","mlat":[],"tisb":[],"messages":160,"seen":276.3,"rssi":-6.2}
]
}
cat /usr/local/var/dump1090-mut-data/history_75.json
{ "now" : 1507344034.5,
"messages" : 1448,
"aircraft" : [
{"hex":"a6cb48","lat":40.169403,"lon":-74.526123,"nucp":7,"seen_pos":6.1,"altitude":33000,"mlat":[],"tisb":[],"messages":9,"seen":4.9,"rssi":-6.1},
{"hex":"a668e2","altitude":17250,"mlat":[],"tisb":[],"messages":31,"seen":4.2,"rssi":-7.9},
{"hex":"a8bcdd","flight":"NKS710 ","lat":40.205841,"lon":-74.491150,"nucp":7,"seen_pos":1.5,"altitude":9875,"vert_rate":0,"track":45,"speed":369,"category":"A0","mlat":[],"tisb":[],"messages":17,"seen":1.5,"rssi":-5.0},
{"hex":"a54cd9","mlat":[],"tisb":[],"messages":44,"seen":94.4,"rssi":-7.2},
{"hex":"a678c3","mlat":[],"tisb":[],"messages":60,"seen":133.2,"rssi":-7.1},
{"hex":"a1ff83","mlat":[],"tisb":[],"messages":47,"seen":212.3,"rssi":-7.9},
{"hex":"a24ce0","mlat":[],"tisb":[],"messages":160,"seen":276.3,"rssi":-6.2}
]
}
There is also an open data API available at https://www.adsbexchange.com/data/# So I grabbed this via REST API: https://public-api.adsbexchange.com/VirtualRadar/AircraftList.json Again using my Latitude and Longitude. Alternative Approach For Ingestion: @Hellmar Becker has a really well developed example and presentation on how he is processing this data. See the Apache NiFi code, Python, Setup Scripts and Presentation here: https://github.com/hellmarbecker/plt-airt-2000 My example is with a different USB stick and a different continent. Resources: http://realadsb.com/ http://realadsb.com/piaware.html https://github.com/mutability/dump1090.git https://www.dzombak.com/blog/2017/01/Monitoring-aircraft-via-ADS-B-on-OS-X.html https://www.faa.gov/nextgen/programs/adsb/ https://community.hortonworks.com/articles/177232/apache-deep-learning-101-processing-apache-mxnet-m.html https://www.dzombak.com/blog/2017/08/Quick-ADS-B-monitoring-on-OS-X.html https://github.com/fredpalmer/flightaware https://walac.github.io/pyusb/ http://www.stuffaboutcode.com/2015/09/read-piaware-flight-data-with-python.html https://github.com/hellmarbecker/plt-airt-2000 https://github.com/jojonas/py1090 https://gist.github.com/fasiha/c123a9c6b6c78df7597bb45e0fed808f
... View more
Labels:
03-05-2018
07:38 PM
3 Kudos
This is for people preparing to attend my talk on Deep Learning at DataWorks Summit Berling 2018 (https://dataworkssummit.com/berlin-2018/#agenda) on Thursday April 19, 2018 at 11:50AM Berlin time. This is for running Apache MXNet on a Raspberry Pi. Let's get this installed! git clone https://github.com/apache/incubator-mxnet.git The installation instructions at Apache MXNet's website (http://mxnet.incubator.apache.org/install/index.html) are amazing. Pick your platform and your style. I am doing this the simplest way on Linux path. Installation: This builds on previous builds, so see those articles. We installed the drivers for Sense Hat, Intel Movidius and the USB Web Cam previously. Please note that versions for Raspberry Pi, Apache MXNet, Python and other drivers are updated every few months so if you are reading this post DWS 2018 you should check the relevant libraries and update to the latest versions. You need Python, Python Devel and PIP installed and you may need to run as root. You will also need OpenCV installed as mentioned in the previous article. In this combined Python script we grab Sense-Hat sensors for temperature, humidity and more. We also run Movidius image analysis and Apache MXNet Inception on the image that we capture with our web cam. Apache MXNet is now in version 1.1, so you may want to upgrade. pip install --upgrade pip
pip install scikit-image
git clone https://github.com/tspannhw/mxnet_rpi.git
sudo apt-get update -y
sudo apt-get install python-pip python-opencv python-scipy python-picamera -y
sudo apt-get -y install git cmake build-essential g++-4.8 c++-4.8 liblapack* libblas* libopencv*
git clone --recursive https://github.com/apache/incubator-mxnet.git mxnet --branch 1.0.0
cd incubator-mxnet
export USE_OPENCV = 0
make
cd python
pip install --upgrade pip
pip install -e .
pip install mxnet==1.0.0
MiniFi Flow to Run Python Script and Send Over Images (Running on Raspberry Pi) Routing on Server to Process Either an Image or a JSON Our Apache NiFi Server Receiving Input from Raspberry Pi Apache NiFi Server Processing The Input We route to two different processing flows, with one for saving images, the other adds a schema and converts the JSON data into Apache AVRO. The AVRO content is merged and we send that to a central HDF 3.1 cluster that can write to HDFS. We can either stream to an ACID Hive table or convert AVRO to Apache ORC and store it to HDFS and autogenerate an external Hive table on top of it. You can find many examples of both of these processes in my links below. We could also insert into Apache HBase or insert into an Apache Phoenix table. Or do all of those and send it to Slack, Email, Store in an RDBMS like MySQL and anything else you could think of. Generated Schema Running: We are using Apache MiniFi Java Agent 0.3.0. I will be adding a follow up including MiniFi 0.40 with the native C++ TensorFlow and USB Cam. See this awesome article for TensorFlow: https://community.hortonworks.com/articles/174520/minifi-c-iot-cat-sensor.html Source Code: https://github.com/tspannhw/rpi-mxnet-movidius-minifi This is too easy! References: https://github.com/tspannhw/ApacheBigData101/ https://community.hortonworks.com/articles/171960/using-apache-mxnet-on-an-apache-nifi-15-instance-w.html https://community.hortonworks.com/articles/174227/apache-deep-learning-101-using-apache-mxnet-on-an.html https://community.hortonworks.com/articles/171960/using-apache-mxnet-on-an-apache-nifi-15-instance-w.html https://community.hortonworks.com/articles/174227/apache-deep-learning-101-using-apache-mxnet-on-an.html https://community.hortonworks.com/articles/174399/apache-deep-learning-101-using-apache-mxnet-on-apa.html https://community.hortonworks.com/articles/176784/deep-learning-101-using-apache-mxnet-in-dsx-notebo.html https://community.hortonworks.com/articles/176789/apache-deep-learning-101-using-apache-mxnet-in-apa.html https://community.hortonworks.com/articles/174538/apache-deep-learning-101-using-apache-mxnet-with-h.html https://community.hortonworks.com/articles/83100/deep-learning-iot-workflows-with-raspberry-pi-mqtt.html https://community.hortonworks.com/articles/167193/building-and-running-minifi-cpp-in-orangepi-zero.html https://community.hortonworks.com/articles/118132/minifi-capturing-converting-tensorflow-inception-t.html https://community.hortonworks.com/articles/130814/sensors-and-image-capture-and-deep-learning-analys.html https://community.hortonworks.com/articles/83100/deep-learning-iot-workflows-with-raspberry-pi-mqtt.html https://community.hortonworks.com/articles/155475/powering-apache-minifi-flows-with-a-movidius-neura.html http://mxnet.incubator.apache.org/install/index.html https://mxnet.incubator.apache.org/tutorials/embedded/wine_detector.html https://github.com/tspannhw/ApacheBigData101 https://github.com/tspannhw/mxnet-in-notebooks https://github.com/tspannhw/nifi-mxnet-yarn/ https://github.com/tspannhw/nvidiajetsontx1-mxnet https://github.com/tspannhw/mxnet_rpi https://github.com/tspannhw/rpi-sensehat-minifi-python/ https://github.com/tspannhw/rpi-minifi-movidius-sensehat
... View more
03-02-2018
05:32 PM
3 Kudos
This is for people preparing to attend my talk on Deep Learning at DataWorks Summit Berling 2018 (https://dataworkssummit.com/berlin-2018/#agenda) on Thursday April 19, 2018 at 11:50AM Berlin time. Another way to work with Apache MXNet is by using your Apache Zeppelin notebook to run your Python deep learning scripts. Apache Zeppelin Notebook As you can see we can format that data as a table using Apache Zeppelin display technology. Use this print statement: print("%table top1pct\ttop1\top2\ttop2pct\ttop3pct\ttop3\ttop4pct\ttop4\ttop5pct\ttop5\timagefilename\truntime\tuuid\n" + top1pct + "\t" + top1 + "\t" + top2pct + "\t" + top2 + "\t" + top3pct + "\t" + top3 + "\t" + top4pct + "\t" + top4 + "\t" + top5pct + "\t" + top5 + "\t" + filename + "\t" + str(round(end - start)) + "\t" + uniqueid + "\n" ) We use the pyspark interpreter to run this Python script, but there's no Spark in here yet. This data also gets loaded in Apache Hive via Apache NiFi as shown here: Deep Learning Models You will need to download the pre-built Inception models and reference them on your server. synset.txt Inception-BN-0000.params Inception-BN-symbol.json See: https://mxnet.incubator.apache.org/tutorials/embedded/wine_detector.html curl --header 'Host: data.mxnet.io' --header 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Firefox/45.0' --header 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' --header 'Accept-Language: en-US,en;q=0.5' --header 'Referer: http://data.mxnet.io/models/imagenet/' --header 'Connection: keep-alive' 'http://data.mxnet.io/models/imagenet/inception-bn.tar.gz' -o 'inception-bn.tar.gz' -L
curl http://data.mxnet.io/models/imagenet/synset.txt More Models http://data.mxnet.io/models/imagenet/ Source Code https://github.com/tspannhw/mxnet-in-notebooks https://github.com/tspannhw/ApacheBigData101 References If you want to run in DSX or Jupyter: https://community.hortonworks.com/articles/176784/deep-learning-101-using-apache-mxnet-in-dsx-notebo.html Setup If you need to setup Apache MXNet on HDF: https://community.hortonworks.com/articles/174227/apache-deep-learning-101-using-apache-mxnet-on-an.html Other Articles in The Series https://community.hortonworks.com/articles/174538/apache-deep-learning-101-using-apache-mxnet-with-h.html https://community.hortonworks.com/articles/174399/apache-deep-learning-101-using-apache-mxnet-on-apa.html https://community.hortonworks.com/articles/155435/using-the-new-mxnet-model-server.html https://community.hortonworks.com/articles/171960/using-apache-mxnet-on-an-apache-nifi-15-instance-w.html
... View more
Labels:
03-02-2018
04:47 PM
4 Kudos
This is for people preparing to attend my talk on Deep Learning at DataWorks Summit Berling 2018 (https://dataworkssummit.com/berlin-2018/#agenda) on Thursday April 19, 2018 at 11:50AM Berlin time. Many people are using IBM's excellent DSX platform which uses Jupyter Notebooks and the ever popular Kubernetes. I wanted to try out Apache MXNet in this environment. It's great. Create or reuse an existing notebook. For Python, the default is Jupyter. Zeppelin is now also supported. I am using Python 2.7 with DSX Desktop on an OSX workstation. This supports Apache MXNet. My local Apache MXNet installation and MXNet python installation worked well with DSX. I needed OpenCV for this example, so I was able to install right inside IBM DSX via !pip install --user opencv-python. Very easy to start a notebook and add your code, you get nice syntax coloring. I uploaded the precompiled model Here we can check our list of Python libraries with !pip list --isolated --format=columns. Very easy to run your Apache MXNet code right in a notebook. Easy to share with other data scientists and engineers in your group and others. IBM DSX Assets You will need to download the pre-built Inception model and add that to assets. synset.txt Inception-BN-0000.params Inception-BN-symbol.json See: https://mxnet.incubator.apache.org/tutorials/embedded/wine_detector.html curl --header 'Host: data.mxnet.io' --header 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Firefox/45.0' --header 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' --header 'Accept-Language: en-US,en;q=0.5' --header 'Referer: http://data.mxnet.io/models/imagenet/' --header 'Connection: keep-alive' 'http://data.mxnet.io/models/imagenet/inception-bn.tar.gz' -o 'inception-bn.tar.gz' -L curl http://data.mxnet.io/models/imagenet/synset.txt More Models http://data.mxnet.io/models/imagenet/ Source Code https://github.com/tspannhw/mxnet-in-notebooks https://github.com/tspannhw/ApacheBigData101
... View more
Labels:
02-27-2018
08:44 PM
1 Kudo
This is for people preparing to attend my talk on Deep Learning at DataWorks Summit Berling 2018 (https://dataworkssummit.com/berlin-2018/#agenda) on Thursday April 19, 2018 at 11:50AM Berlin time. See: https://community.hortonworks.com/content/kbentry/174399/apache-deep-learning-101-using-apache-mxnet-on-apa.html To do proper analytics and provide fast SQL access to our inception data generated by Apache MXNet from our images, we need to land it into Apache Hive Transactional tables. We will use the Apache NiFi PutHiveStreaming processor to insert data into our ACID table at a rapid rate. This only works if you create a transactional table with Apache ORC, see the DDL below. You must also be running a new version of HDP 2.6+ that has ACID turned on. Tip: In HDP 2.6.4, you will need to create and work with Apache Hive ACID tables with Hive. Not sql in Apache Zeppelin, since that is Apache Spark. jdbc(hive) is Apache Hive. See the configuration below to hive CBO and TEZ enabled as well. Ambari View of Hive SQL DDL %jdbc(hive)
CREATE TABLE `inception`(
uuid STRING, top1pct STRING, top1 STRING, top2pct STRING, top2 STRING, top3pct STRING, top3 STRING, top4pct STRING, top4 STRING, top5pct STRING, top5 STRING, imagefilename STRING,
runtime STRING)
CLUSTERED BY ( top1)
INTO 3 BUCKETS
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
TBLPROPERTIES ( 'transactional'='true')
%jdbc(hive)
select * from inception The PutHiveStreaming processor requires that you have a table that is bucketed, uses Apache ORC and you have permissions. See the example above for a table DDL to use. You also need ACID and LLAP enabled on your Apache Hive cluster. Details for PutHiveStreaming Processor An Example Apache MXNet to Hive Streaming View The Hive View 2.0 of the Data Apache Zeppelin Table DDL and Query
... View more
Labels: