Community Articles

Find and share helpful community-sourced technical articles.
Labels (1)
avatar
Master Guru

Apache Deep Learning 101 Series

56690-icon.png

This is for people preparing to attend my talk on Deep Learning at DataWorks Summit Berling 2018 (https://dataworkssummit.com/berlin-2018/#agenda) on Thursday April 19, 2018 at 11:50AM Berlin time.

56692-dwsberling.jpg

You can easily run Apache MXNet on an OSX machine or a Linux workstation utilizing a Python script. I have forked the standard Apache MXNet Wine Detector Tutorial (http://mxnet.incubator.apache.org/tutorials/embedded/wine_detector.html) to read our local OSX webcam (you may need to change your OpenCV WebCam port from 0 to 1 or to 2, depending on your number of webcams and which one you want to use. I am running this on an OSX laptop connected to a monitor that has a built in webcam, so I use that one which is 1. The webcam numbering starts at 0. If you only have one, then use 0.

Let's get this installed!

git clone https://github.com/apache/incubator-mxnet.git

The installation instructions at Apache MXNet's website (http://mxnet.incubator.apache.org/install/index.html) are amazing. Pick your platform and your style. I am doing this the simplest way on a Mac, but you can use Virtual Python Environment which may be best for you.

git clone https://github.com/tspannhw/ApacheBigData101.git

You will want to copy my shell script osxlocalrun.sh, inception copy and analyze.py script to your machine. If you don't have a webcam you will want to use the Centos version of the shell and Python. That one works with a static image that you supply. I am assuming you are running a recently updated Mac with 16GB of RAM or more, PIP, Brew and Python 3 installed already. If not, do that. If you have a pre-1.0 Apache MXNet, please upgrade. You will need curl and tar installed which they should be.

cd incubator-mxnet
mkdir images
curl --header 'Host: data.mxnet.io' --header 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Firefox/45.0' --header 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' --header 'Accept-Language: en-US,en;q=0.5' --header 'Referer: http://data.mxnet.io/models/imagenet/' --header 'Connection: keep-alive' 'http://data.mxnet.io/models/imagenet/inception-bn.tar.gz' -o 'inception-bn.tar.gz' -L
tar -xvzf inception-bn.tar.gz
cp Inception-BN-0126.params Inception-BN-0000.params


Then

brew update
pip install --upgrade pip
pip install --upgrade setuptools
pip install mxnet==1.0.0
brew install graphviz
pip install graphviz

For your machine if you have two versions of Python, you may need to do pip3 and you may need to run via sudo. It depends on how your machine is setup and how locked down it is.

We are creating a directory called images that will fill with OpenCV capture images. You probably want to delete them or ingest them. It's very easy to ingest with Apache NiFi or MiniFi both of which run on OSX with ease. See: https://community.hortonworks.com/articles/107379/minifi-for-image-capture-and-ingestion-from-raspbe...

So we call a simple shell script (osxlocalrun.sh), which calls our custom Python 3 script (you can easily convert this to Python 2 if you need to, in a future article I have this running on Python 2.7 on a Centos 7 HDP 2.6.4 cluster node). I send warnings to /dev/null to get rid of them since they are related to OSX configuration that you may or may not have and cannot easily change. Nothing to see here. You will probably need to chmod 755 your osxlocalrun.sh. If you are running on a Linux variant, follow this directions on the Apache MXNet site or wait for my next article on installing and using Apache MXNet in Centos-based HDP 2.6.4 and HDF 3.1 clusters.

 python3 -W ignore analyze.py 2>/dev/null


For Apache NiFi Flow Templates

You can download my Apache NiFi flows from github or this article.

Architecture

  • Local Apache NiFi 1.5 with NiFi Registry running with JDK 8 on OSX
  • Local Apache MXNet installation with Python 3
  • Remote HDF 3.1 Cluster Running on Centos 7 on OpenStack with Apache Ambari, Apache NiFi, NiFi Registry, Hortonworks Schema Registry.
  • Remote HDP 2.6.4 Cluster Runniong on Centos 7 on OpenStack with Apache Hive, Apache Ambari

The flow is easy:

  • ExecuteProcess: Execute that shell script
  • UpdateAttribute: Add the schema name
  • InferAvroSchema: Really need this one only once if you don't want to hand create your schema, push the results to an attribute
  • Remote Process Group: Send via HTTP Site-to-Site to an HDF 3.1 cluster.


Local OSX Processing

56684-apachemxnetlocal.png


Cluster based Record Processing

56685-apachemxnetprocessing.png

On the cloud we use ConvertRecord to convert the Apache MXNet Python script generated JSON into AVRO. We merge a bunch of those together then convert that larger AVRO record to ORC. This ORC file is stored in HDFS. Apache NiFi will automatically generate Hive DDL that we can instantly execute via Apache NiFi or do manually. I do this manually in Apache Zeppelin. I could easily augment this data with weather, twitter and other REST feeds. Those have been covered in other articles I have written. I could also push the results to Kafka 1.0 for additional processing in Hortonworks Streaming Analytics Manager. I will do that a future time.

Apache Hive SQL DDL

CREATE EXTERNAL TABLE IF NOT EXISTS inception3 (uuid STRING, top1pct STRING, top1 STRING, top2pct STRING, top2 STRING, top3pct STRING, top3 STRING, top4pct STRING, top4 STRING, top5pct STRING, top5 STRING, imagefilename STRING, runtime STRING) STORED AS ORC
LOCATION '/mxnet/local'

Example Output

{"uuid":
"mxnet_uuid_img_20180208204131", "top1pct":
"30.0999999046", "top1": "n02871525 bookshop,
bookstore, bookstall", "top2pct": "23.7000003457",
"top2": "n04200800 shoe shop, shoe-shop, shoe store",
"top3pct": "4.80000004172", "top3":
"n03141823 crutch", "top4pct": "2.89999991655",
"top4": "n04370456 sweatshirt", "top5pct":
"2.80000008643", "top5": "n02834397 bib", "imagefilename":
"images/tx1_image_img_20180208204131.jpg", "runtime":
"2"}

Query Results

56686-mxnetlocalzeppelinquery.png


Example OpenCV Captured Image

56689-nanotiemxnet.jpeg

{"top1pct": "67.6", "top5": "n03485794 handkerchief, hankie, hanky, hankey", "top4": "n04590129 window shade", "top3": "n03938244 pillow", "top2": "n04589890 window screen", "top1": "n02883205 bow tie, bow-tie, bowtie", "top2pct": "11.5", "imagefilename": "nanotie7.png", "top3pct": "4.5", "uuid": "mxnet_uuid_img_20180211161220", "top4pct": "2.8", "top5pct": "2.8", "runtime": "3.0"}

My cat assists me in some Deep Learning work, so I use Apache NiFi to track him to make sure he's working and hasn't taken his tie off during office hours. I run a strict office here in the Princeton lab.

Source Code

References:

In the Series:

  • Interfacing with MXNet Model Server
  • Using Apache MXNet with HDF 3.1 Clusters
  • Using Apache MXNet with HDP 2.6.4 Clusters
  • Using Apache MXNet with Hadoop 3.0 YARN 3.0 HDP 3.0 Dockerized GPU Aware Clusters


56691-profile1.png

1,866 Views