How to add sentiment analytics to Twitter/Apache N...
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.
Created on 08-03-201609:19 PM - edited 08-17-201910:58 AM
Starting Assumption: Your Nifi data flow similar to the TwitterSolr.xml template on the Nifi website(Nifi Templates).
Here is the template workflow (MergeContent is optional) :
In order to add sentiment analytics, we need to alter the
data flow by updating the flowfile (the actual data being passed in Nifi) with
attributes that represent the sentiment score of the tweet in question.
1)Importing the JSON and accessing tweet text
2)Running a sentiment analyzer on the tweet text and
getting sentiment scores
3)Appending sentiment scores to JSON and writing
back to the flowfile
For the actual script itself I used Groovy (a Java-compatible scripting language), but you can use other scripting languages such as Lua, Python, Ruby, or ECMAScript (Script Code). For the sentiment analytics portion, I used a Java port of the Python sentiment analytics module vaderSentiment, but there are a plethora of open-source sentiment analytics APIs such as Python’s NLTK module, Stanford’s CoreNLP API, and Alchemy’s NLP API to name a few.
To actually utilize the above script, we will use a experimental feature of Apache Nifi called ExecuteScript.
1)Drag the processor into the Nifi dataflow
2)Right-click onto processor and press “Configure”
3)Go to Settings and check box Auto terminate relationships on failure
4)Go to Scheduling and put a non-zero input into “Run
5)Go to properties and choose "Scripting Language"
7)Then either add the script body to the “Script
Body” section or put location of the script in “Script File”, but not both.
8)In the Module Directory add required external
JAR files needed for external libraries or modules.
9)If you face issues regarding external
dependencies make sure to update the dependencies in the pom.xml file (this will vary based
off of external modules/libraries used)
10)Next place the ExecuteScript after MergeContent and before PutSolrContentStream. (Upon connection, make sure all the relationships are valid)
1)Right-click PutSolrContentStream and click on “Configure”
2)Go to Properties and add the new properties
4)Title each of the properties f.# (the number
being one more than the existing properties)
5)In the Value section put in the name desired for the
field in Solr with a colon and the path to the field in the incoming JSON file. Don’t
forget to update the path to the existing attributes in the JSON, now that the JSON
has been modified.
Updated Solr Dashboard:
If you run into any issues with the data showing up in Solr you may need to either restart Solr or create a new
shard. If you are using Banana UI for visualization, all you need to do is update the default.json to include the new shard and add a panel that uses the added sentiment score fields.