Community Articles
Find and share helpful community-sourced technical articles
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.
Labels (2)
New Contributor

Starting Assumption: Your Nifi data flow similar to the TwitterSolr.xml template on the Nifi website(Nifi Templates).

Here is the template workflow (MergeContent is optional) :

6314-starting-workflow.png

In order to add sentiment analytics, we need to alter the data flow by updating the flowfile (the actual data being passed in Nifi) with attributes that represent the sentiment score of the tweet in question.

Script Components:

  • 1)Importing the JSON and accessing tweet text
  • 2)Running a sentiment analyzer on the tweet text and getting sentiment scores
  • 3)Appending sentiment scores to JSON and writing back to the flowfile

6312-groovy-script-snippet.png

***Note***:

For the actual script itself I used Groovy (a Java-compatible scripting language), but you can use other scripting languages such as Lua, Python, Ruby, or ECMAScript (Script Code). For the sentiment analytics portion, I used a Java port of the Python sentiment analytics module vaderSentiment, but there are a plethora of open-source sentiment analytics APIs such as Python’s NLTK module, Stanford’s CoreNLP API, and Alchemy’s NLP API to name a few.

To actually utilize the above script, we will use a experimental feature of Apache Nifi called ExecuteScript.

ExecuteScript Configuration:

  • 1)Drag the processor into the Nifi dataflow
  • 2)Right-click onto processor and press “Configure”
  • 3)Go to Settings and check box Auto terminate relationships on failure
  • 4)Go to Scheduling and put a non-zero input into “Run Schedule”
  • 5)Go to properties and choose "Scripting Language"
  • 7)Then either add the script body to the “Script Body” section or put location of the script in “Script File”, but not both.
  • 8)In the Module Directory add required external JAR files needed for external libraries or modules.
  • 9)If you face issues regarding external dependencies make sure to update the dependencies in the pom.xml file (this will vary based off of external modules/libraries used)
  • 10)Next place the ExecuteScript after MergeContent and before PutSolrContentStream. (Upon connection, make sure all the relationships are valid)

PutSolrContentStream Configuration:

  • 1)Right-click PutSolrContentStream and click on “Configure”
  • 2)Go to Properties and add the new properties
  • 4)Title each of the properties f.# (the number being one more than the existing properties)
  • 5)In the Value section put in the name desired for the field in Solr with a colon and the path to the field in the incoming JSON file. Don’t forget to update the path to the existing attributes in the JSON, now that the JSON has been modified.

Updated Workflow:6316-final-work-flow.png

Updated Solr Dashboard:

6317-solr-screenshot.png

***Note***:

If you run into any issues with the data showing up in Solr you may need to either restart Solr or create a new shard. If you are using Banana UI for visualization, all you need to do is update the default.json to include the new shard and add a panel that uses the added sentiment score fields.


before-nifi-screenshot.png
2,591 Views
Comments
Super Guru

What jars are needed? Can you attach a NiFi template

Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
2 of 2
Last update:
‎08-17-2019 10:58 AM
Updated by:
 
Contributors
Top Kudoed Authors