Community Articles

Find and share helpful community-sourced technical articles.
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.
Labels (1)
Super Guru

Open NLP Example Apache NiFi Processor

I wanted to be able to add NLP processing to my dataflow without calling to Apache Spark jobs or other disconnected ways. A custom processor let's me write fast Java 8 microservices to process functionality in my stream in a concise way. All the source code for this processor is available with the Apache license in github. So I wrote one.


See the attached generated HTML documentation for the processor.


If you would like to use this processor.

git clone

mvn package

cp nifi-nlp-nar/target/nifi-nlp-nar-1.0.nar /usr/hdf/current/nifi/lib/

You can also download a prebuilt NAR from github.

Then restart NIFI via Ambari and you can start using it.


This has been tested for HDF 2.x NiFi.

Add the NLP Processor.


Then set the properties, you need to set sentence that you want parsed. You can use expression language to grab a field from an attribute like I am doing to grab the Tweet.


Send it a sentence, say from Twitter and you will get back. You need to set the Extra Resources to a directory where you have downloaded the Apache OpenNLP prebuilt models referenced below.



Two attributes get added to your flow. They contain JSON arrays of locations and names extracted from your sentence (or page of text).




{"names":[{"name":"Tim Spann"},{"name":"Peter Smith"}]}

Entities extracted from the text using Apache OpenNLP via a custom NiFi Processor.

Current Version Uses (Apache OpenNLP Pre-built Models v1.5)

  • en-token.bin
  • en-ner-person.bin
  • en-ner-location.bin

You can add other languages and models as enhancements.


If you would like to extend the processor, it includes a JUnit test for you to run and extend. If uses the NiFi TestRunner and will allow you to see the flowfile, set inputs and get outputs.



The current version supports English only, if you want to extend it, please fork the project and I will merge code in.


Models to Download and install to /usr/hdf/current/nifi/lib/



Really cool implementation of OpenNLP, thanks for sharing.

Having one issue with it, but i think it's my understanding. Can i send the content of the entire flow file to the processor to be analized or must it go through the json parser first?

Super Guru

the processor takes a property to run against. You just need to pass something in the sentence parameter. You can concatenate a few fields there.

The source is open, it would be easy to ingest a flowfile and process that instead of doing an input attribute. It's changing 2-3 lines and rebuilding.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.
Version history
Last update:
‎08-17-2019 05:18 AM
Updated by:
Top Kudoed Authors