Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (2)
avatar
Master Guru

If you have not attended a DataWorksSummit, I highly recommend it. It is an amazing event held at three locations a year and is a great community experience. The content is deep and highly technical and you will learn about the current state of the art and what is coming next. It's not just Big Data, but AI, Streaming, Microservices, Containers, Cloud and many other topics that startups and enterprises alike need to know.

My topic was a simple talk on using Apache NiFi to ingest and transform various data types.

There is a small group forming around my quickly released Inception V3 TensorFlow Apache NiFi Processor, I encourage you to try it and provide feedback, pull requests, bug reports, documentation, unit tests, examples and more. The Java API for TensorFlow is new so this is really basic. Thanks to @Simon Elliston Ball for a major cleanup on it.

https://github.com/tspannhw/nifi-tensorflow-processor

40003-tensorflow2.png

What do we want to do?

  • MiniFi ingests camera images and sensor data
  • Run TensorFlow Inception v3 to recognize objects in image
  • NiFi stores images, metadata and enriched data in Hadoop
  • NiFi ingests social data and feeds
  • NiFi analyzes sentiment of textual data

•TensorFlow (C++, Python, Java)
via ExecuteStreamCommand • •TensorFlow NiFi Java Custom Processor • •TensorFlow Running on Edge Nodes (MiniFi) • • •

•TensorFlow Mobile (iOS, Android, RPi) • •TensorFlow on Spark (Yahoo) via Livy, S2S, Kafka • •TensorFlow Running in Containers in YARN 3.0 on Hadoop •

(NiFI 1.4) gRPC Call to TensorFlow Serving

python classify_image.py
--image_file/dir/solarroofpanel.jpg<br>solar dish, solar collector, solar furnace (score
= 0.98316)<br>window screen
(score = 0.00196)<br>manhole cover
(score = 0.00070)<br>radiator (score
= 0.00041)<br>doormat,
welcome mat (score = 0.00041)

Python Uses

pip install -U textblob python -m textblob.download_corpora  pip install -U spacy python -m spacy.en.download all 

pip install -U nltk pip install -U numpy

run.sh
python sentiment.py "$@”

sentiment.py sentiment.pyfrom
nltk.sentiment.vader import SentimentIntensityAnalyzer import sys sid = SentimentIntensityAnalyzer() ss = sid.polarity_scores(sys.argv[1]) print('Compound {0} Negative {1} Neutral {2} Positive {3} '.format( ss['compound'],ss['neg'],ss['neu'],ss['pos']))

These are some good Python libraries to be using. I recommend using Python 3.X unless you are stuck with 2.6/2.7.

I have also created two processors for working with text/NLP, these are listed below for Apache OpenNLP and Stanford CoreNLP.

Please comment in HCC (here), check out github and do pull requests (https://github.com/tspannhw) and come to a meetup (https://www.meetup.com/futureofdata-princeton/).

References:

2,512 Views