Community Articles

monilpat · ‎08-03-2016

Starting Assumption: Your Nifi data flow similar to the TwitterSolr.xml template on the Nifi website(Nifi Templates).

Here is the template workflow (MergeContent is optional) :

In order to add sentiment analytics, we need to alter the data flow by updating the flowfile (the actual data being passed in Nifi) with attributes that represent the sentiment score of the tweet in question.

Script Components:

1)Importing the JSON and accessing tweet text
2)Running a sentiment analyzer on the tweet text and getting sentiment scores
3)Appending sentiment scores to JSON and writing back to the flowfile

***Note***:

For the actual script itself I used Groovy (a Java-compatible scripting language), but you can use other scripting languages such as Lua, Python, Ruby, or ECMAScript (Script Code). For the sentiment analytics portion, I used a Java port of the Python sentiment analytics module vaderSentiment, but there are a plethora of open-source sentiment analytics APIs such as Python’s NLTK module, Stanford’s CoreNLP API, and Alchemy’s NLP API to name a few.

To actually utilize the above script, we will use a experimental feature of Apache Nifi called ExecuteScript.

ExecuteScript Configuration:

1)Drag the processor into the Nifi dataflow
2)Right-click onto processor and press “Configure”
3)Go to Settings and check box Auto terminate relationships on failure
4)Go to Scheduling and put a non-zero input into “Run Schedule”
5)Go to properties and choose "Scripting Language"
7)Then either add the script body to the “Script Body” section or put location of the script in “Script File”, but not both.
8)In the Module Directory add required external JAR files needed for external libraries or modules.
9)If you face issues regarding external dependencies make sure to update the dependencies in the pom.xml file (this will vary based off of external modules/libraries used)
10)Next place the ExecuteScript after MergeContent and before PutSolrContentStream. (Upon connection, make sure all the relationships are valid)

PutSolrContentStream Configuration:

1)Right-click PutSolrContentStream and click on “Configure”
2)Go to Properties and add the new properties
4)Title each of the properties f.# (the number being one more than the existing properties)
5)In the Value section put in the name desired for the field in Solr with a colon and the path to the field in the incoming JSON file. Don’t forget to update the path to the existing attributes in the JSON, now that the JSON has been modified.

Updated Workflow:

Updated Solr Dashboard:

***Note***:

If you run into any issues with the data showing up in Solr you may need to either restart Solr or create a new shard. If you are using Banana UI for visualization, all you need to do is update the default.json to include the new shard and add a panel that uses the added sentiment score fields.

TimothySpann · ‎08-04-2016

What jars are needed? Can you attach a NiFi template

Cloudera Community

Community Articles

How to add sentiment analytics to Twitter/Apache Nifi Demo

Apache NiFi

Apache Solr

Re: How to add sentiment analytics to Twitter/Apache Nifi Demo

Twitter Sentiment using Spark Core NLP in Apache Z...

Apache NiFi - Part 2 (Twitter Flow)

Processing Real-Time Social Media (Twitter) with A...

HDF/HDP Twitter Sentiment Analysis End-to-End Solu...

Apache Zeppelin (Hive & Spark Demo)

CDP Public Cloud Cyber Security Demo

Apache Calcite - Introduction and Demo

Better Together: NiFi, Schema Registry and Streami...

Spark Text Analytics - Uncovering Data-Driven Topi...

Ranger Audit Analytics with NiFi and Zeppelin