Community Articles

Find and share helpful community-sourced technical articles.
Labels (1)
avatar
Master Guru

Open NLP Example Apache NiFi Processor

I wanted to be able to add NLP processing to my dataflow without calling to Apache Spark jobs or other disconnected ways. A custom processor let's me write fast Java 8 microservices to process functionality in my stream in a concise way. All the source code for this processor is available with the Apache license in github. So I wrote one.

11913-nifinlpoverview.png

See the attached generated HTML documentation for the processor.

11914-nlpdocs.png

If you would like to use this processor.

git clone https://github.com/tspannhw/nifi-nlp-processor

mvn package

cp nifi-nlp-nar/target/nifi-nlp-nar-1.0.nar /usr/hdf/current/nifi/lib/

You can also download a prebuilt NAR from github.

Then restart NIFI via Ambari and you can start using it.

11922-nlpnifirestart.png

This has been tested for HDF 2.x NiFi.

Add the NLP Processor.

11919-nlpaddprocessor.png

Then set the properties, you need to set sentence that you want parsed. You can use expression language to grab a field from an attribute like I am doing to grab the Tweet.

11921-nifinlpproperties.png

Send it a sentence, say from Twitter and you will get back. You need to set the Extra Resources to a directory where you have downloaded the Apache OpenNLP prebuilt models referenced below.

Results

11915-nlpresults.png

Two attributes get added to your flow. They contain JSON arrays of locations and names extracted from your sentence (or page of text).

Locations

{"locations":[{"location":"Sydney"}]}

Names

{"names":[{"name":"Tim Spann"},{"name":"Peter Smith"}]}

Entities extracted from the text using Apache OpenNLP via a custom NiFi Processor.

Current Version Uses (Apache OpenNLP Pre-built Models v1.5)

  • en-token.bin
  • en-ner-person.bin
  • en-ner-location.bin

You can add other languages and models as enhancements.

11917-nifinlpproperties.png

If you would like to extend the processor, it includes a JUnit test for you to run and extend. If uses the NiFi TestRunner and will allow you to see the flowfile, set inputs and get outputs.

11918-nlpnifijunit.png

Note:

The current version supports English only, if you want to extend it, please fork the project and I will merge code in.

References:

Models to Download and install to /usr/hdf/current/nifi/lib/

http://opennlp.sourceforge.net/models-1.5/

https://community.hortonworks.com/articles/76240/using-opennlp-for-identifying-names-from-text.html

twittertonlp.xml


nlpconfigureprocessor.png
6,602 Views
Comments
avatar
Contributor

Really cool implementation of OpenNLP, thanks for sharing.

Having one issue with it, but i think it's my understanding. Can i send the content of the entire flow file to the processor to be analized or must it go through the json parser first?

avatar
Master Guru

the processor takes a property to run against. You just need to pass something in the sentence parameter. You can concatenate a few fields there.

The source is open, it would be easy to ingest a flowfile and process that instead of doing an input attribute. It's changing 2-3 lines and rebuilding.