Created on 05-07-2018 08:19 PM - edited 08-17-2019 07:27 AM
Flow
We can remove Sentiment if you don't want to install my custom processor:
This is what they look like:
I grab some fields I like:
These are fields I want to save:
This is a simple version of the flow to just ingest tweets, run sentiment analysis and store in directory as clean JSON.
You can drop the sentiment analysis and do it later. You can also run a python script for that.
We could make this simpler and just have GetTwitter then PutFile. This will store the RAW Twitter JSON file which is a very sparse nested JSON file. if you want the raw data, that is an option. It's a pain to work with that format and not perfect for analytics. I flatten it and just grab what I have seen as the core attributes, you can add more or drop some of them easily.
This is a simple version that could be used for Art or Personal Projects or anyone who wants to store their own tweets and related items.
Get Your Twitter ID: https://tweeterid.com/
Documentation: https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/tweet-object
Create Your Application: https://apps.twitter.com/ https://apps.twitter.com/app/new
Application Settings
You Need: Consumer Key (API Key) and Consumer Secret (API Secret)
Your Access Token
You Need: Access Token and Access Token Secret.
Make sure you keep the secrets secure as you don't want people tweeting in your name or reading your stuff.
You will place these in the GetTwitter processor. Click start once you add that. You can filter out languages like en for English and es for Spanish.
We just save these JSON files to a directory for later use. We could also aggregate them and compress them if you like. Or send them to an Amazon S3, email them, or whatever. We can also retweet those, but now we are getting fancy and we already wrote that article this morning.
Custom Processor:
Example Tweet in JSON Stored:
{ "msg" : "RT @PaasDev Tim said @ApacheNiFi is awesome", "unixtime" : "1525724645676", "friends_count" : "5268", "sentiment" : "POSITIVE", "hashtags" : "[\"ApacheNiFi\"]", "listed_count" : "25", "tweet_id" : "993587294715203584", "user_name" : "Tim Spann", "favourites_count" : "5348", "source" : "NiFiTweetBot", "placename" : "", "media_url" : "[]", "retweet_count" : "0", "user_mentions_name" : "[]", "geo" : "", "urls" : "[]", "countryCode" : "", "user_url" : "", "place" : "", "timestamp" : "1525724645676", "coordinates" : "", "handle" : "PaasDev", "profile_image_url" : "http://pbs.twimg.com/profile_images/34343/34343.jpg", "time_zone" : "Eastern Time (US & Canada)", "ext_media" : "[]", "statuses_count" : "5994", "followers_count" : "1963", "location" : "Princeton, NJ", "time" : "Mon May 07 20:24:05 +0000 2018", "user_mentions" : "[]", "user_description" : "Tim NiFi Guy" }
Download and Import to Apache NiFi This Template
Setup
Get some Apache NiFi
https://www.apache.org/dyn/closer.lua?path=/nifi/1.6.0/nifi-1.6.0-bin.zip
Unzip it. On some Linux's you may need to apt-get install unzip or yum install unzip. You may need to be root, so you can do something like sudo su.
You will need Java installed. For a low cost small Linux server, you can use one of these two services, and they also tell you how to install Java. There are many low cost options. This application is small enough to also run on your laptop, an old desktop PC or a small cloud instance.
https://www.digitalocean.com/community/tutorials/how-to-install-java-on-centos-and-fedora
https://www.linode.com/docs/development/java/install-java-on-centos/
https://www.digitalocean.com/community/tutorials/how-to-install-java-with-apt-get-on-ubuntu-16-04
Generally something like this:
or
sudo yum install java-1.8.0-openjdk-devel
OpenJDK 8 or Oracle JDK 8 are perfect.
You can also run some Docker containers if you like that sort of thing: https://github.com/minyk/nifi-sandbox
You can also download one of the Hortonworks HDF 3.1 Sandboxes to run this as well:
https://hortonworks.com/downloads/#sandbox
Those have Apache NiFi and Java preinstalled!
Here are some Docker Instructions:
https://hortonworks.com/tutorial/sandbox-deployment-and-install-guide/section/3/
Resources: