InvokeHttp: I used this to download the first image URL from tweets.
GetTwitter: This is our primary source of data and the most important. You must have a twitter account, a twitter developer account and create a twitter application. Then you can access the keywords and hashtags above. So far I've ingested 14,211 tweets into Phoenix. This included many times I've shut it down for testing and moving things around. I've had this run live as I've added pieces. I do not recommend this development process, but it's good for exploring data.
RouteOnAttribute: To only process tweets with an actual messages, sometimes they are damaged or missing. Don't waste our time.
ExecuteStreamCommand: To call shell scripts that call TensorFlow C++ binaries and Python scripts. Many ways to do this, but this is the easiest.
UpdateAttribute: To change the file name for files I downloaded to HDFS.
For output sinks:
PutHDFS: Saved to HDFS in a few different directories (the first attached image); the raw JSON tweet, a limited set of fields such as handle, message, geolocation and a fully processed file that I added TensorFlow Inception v3 image recognition for images attached to Strata tweets and sentiment analysis using VADER on the text of the tweet.
PutSQL: I upserted all tweets that were enriched with HDF called TensorFlow & Python Sentiment Analysis into a Phoenix Table;
If you have Python 2.7 installed, in previous articles I have shown how to install PiP and NLTK. Very easy to do some simple Sentiment Analysis. I also have a version where I just return the polarity_scores (compound, negative, neutral and positive).
from nltk.sentiment.vader import SentimentIntensityAnalyzer
sid = SentimentIntensityAnalyzer()
ss = sid.polarity_scores(sys.argv)
if ss['compound'] == 0.00:
elif ss['compound'] < 0.00: