Community Articles
Find and share helpful community-sourced technical articles
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Labels (1)
Super Guru

Flow

72672-simpleflow.png

We can remove Sentiment if you don't want to install my custom processor:

72673-sentiment.png

This is what they look like:

72674-tweetattributes.png

I grab some fields I like:

72676-twitterfields.png

These are fields I want to save:

72677-usefulfields.png

This is a simple version of the flow to just ingest tweets, run sentiment analysis and store in directory as clean JSON.

You can drop the sentiment analysis and do it later. You can also run a python script for that.

We could make this simpler and just have GetTwitter then PutFile. This will store the RAW Twitter JSON file which is a very sparse nested JSON file. if you want the raw data, that is an option. It's a pain to work with that format and not perfect for analytics. I flatten it and just grab what I have seen as the core attributes, you can add more or drop some of them easily.

This is a simple version that could be used for Art or Personal Projects or anyone who wants to store their own tweets and related items.

Get Your Twitter ID: https://tweeterid.com/

Documentation: https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/tweet-object

Create Your Application: https://apps.twitter.com/ https://apps.twitter.com/app/new

Application Settings

You Need: Consumer Key (API Key) and Consumer Secret (API Secret)

Your Access Token

You Need: Access Token and Access Token Secret.

Make sure you keep the secrets secure as you don't want people tweeting in your name or reading your stuff.

72675-gettwitter.png

You will place these in the GetTwitter processor. Click start once you add that. You can filter out languages like en for English and es for Spanish.

We just save these JSON files to a directory for later use. We could also aggregate them and compress them if you like. Or send them to an Amazon S3, email them, or whatever. We can also retweet those, but now we are getting fancy and we already wrote that article this morning.

Custom Processor:

Example Tweet in JSON Stored:

{
  "msg" : "RT @PaasDev Tim said @ApacheNiFi is awesome",
  "unixtime" : "1525724645676",
  "friends_count" : "5268",
  "sentiment" : "POSITIVE",
  "hashtags" : "[\"ApacheNiFi\"]",
  "listed_count" : "25",
  "tweet_id" : "993587294715203584",
  "user_name" : "Tim Spann",
  "favourites_count" : "5348",
  "source" : "NiFiTweetBot",
  "placename" : "",
  "media_url" : "[]",
  "retweet_count" : "0",
  "user_mentions_name" : "[]",
  "geo" : "",
  "urls" : "[]",
  "countryCode" : "",
  "user_url" : "",
  "place" : "",
  "timestamp" : "1525724645676",
  "coordinates" : "",
  "handle" : "PaasDev",
  "profile_image_url" : "http://pbs.twimg.com/profile_images/34343/34343.jpg",
  "time_zone" : "Eastern Time (US & Canada)",
  "ext_media" : "[]",
  "statuses_count" : "5994",
  "followers_count" : "1963",
  "location" : "Princeton, NJ",
  "time" : "Mon May 07 20:24:05 +0000 2018",
  "user_mentions" : "[]",
  "user_description" : "Tim NiFi Guy"
}


Download and Import to Apache NiFi This Template

simplenifitwitter.xml

Setup

Get some Apache NiFi

https://www.apache.org/dyn/closer.lua?path=/nifi/1.6.0/nifi-1.6.0-bin.zip

Unzip it. On some Linux's you may need to apt-get install unzip or yum install unzip. You may need to be root, so you can do something like sudo su.

You will need Java installed. For a low cost small Linux server, you can use one of these two services, and they also tell you how to install Java. There are many low cost options. This application is small enough to also run on your laptop, an old desktop PC or a small cloud instance.

https://www.digitalocean.com/community/tutorials/how-to-install-java-on-centos-and-fedora

https://www.linode.com/docs/development/java/install-java-on-centos/

https://www.digitalocean.com/community/tutorials/how-to-install-java-with-apt-get-on-ubuntu-16-04

Generally something like this:

  • sudo add-apt-repository ppa:webupd8team/java
  • sudo apt-get update
  • sudo apt-get install oracle-java8-installer

or

sudo yum install java-1.8.0-openjdk-devel

OpenJDK 8 or Oracle JDK 8 are perfect.

You can also run some Docker containers if you like that sort of thing: https://github.com/minyk/nifi-sandbox

You can also download one of the Hortonworks HDF 3.1 Sandboxes to run this as well:

https://hortonworks.com/downloads/#sandbox

Those have Apache NiFi and Java preinstalled!

Here are some Docker Instructions:

https://hortonworks.com/tutorial/sandbox-deployment-and-install-guide/section/3/

https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.1.1/bk_installing-nifi/content/ch_nifi-installa...

Resources:

4,896 Views
Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
2 of 2
Last update:
‎08-17-2019 07:27 AM
Updated by:
 
Contributors
Top Kudoed Authors