1973
Posts
1225
Kudos Received
124
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1999 | 04-03-2024 06:39 AM | |
| 3175 | 01-12-2024 08:19 AM | |
| 1727 | 12-07-2023 01:49 PM | |
| 2507 | 08-02-2023 07:30 AM | |
| 3517 | 03-29-2023 01:22 PM |
01-15-2017
03:23 AM
Bigger files are better than millions of little one.
... View more
08-09-2017
03:03 PM
I think there's a bug though - if you revert to a previous config (as you usually would, when undoing a change), the HS2 interactive is not removed from the previously selected host.
... View more
01-11-2017
04:25 PM
ghost.xml NIFI Template
... View more
08-22-2017
04:58 PM
For Sentiment Analysis with NiFi Processors Download and build these https://github.com/tspannhw/nifi-corenlp-processor https://github.com/tspannhw/nifi-nlp-processor
... View more
05-07-2018
08:22 PM
That is in com.dataflowdeveloper. It is a one method class I wrote to hold the string.
... View more
01-04-2017
05:01 PM
6 Kudos
My first caveat would be that in my tests, the pre-trained models is missing a lot of names. If this is for a production work load, I would recommend training your own models using your own data. Maybe use all of your corporate directory, client list, Salesforce data, LinkedIn and social media. I would recommend full name, first names and any nicknames that are commonly used. The current version is 1.7.0 and there are pre-trained 1.5.0 models that work. They have a number of pre-trained models in a few human languages. I chose English (http://opennlp.sourceforge.net/models-1.5/en-ner-person.bin). Walk Through: Create TokenNameFinderModel from pre-built person model. Tokenize the input sentence. Find the identified people. Convert to JSON array. You can easily plug this into a custom NiFi processor, microservice, command line tool or routine in a larger Apache Storm or Apache Spark pipeline. Code (JavaBean) public class PersonName {
private String name = "";
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
} Code (getPeople) import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.List;
import com.google.gson.Gson;
import opennlp.tools.namefind.NameFinderME;
import opennlp.tools.namefind.TokenNameFinderModel;
import opennlp.tools.tokenize.SimpleTokenizer;
import opennlp.tools.tokenize.Tokenizer;
import opennlp.tools.tokenize.TokenizerME;
import opennlp.tools.tokenize.TokenizerModel;
import opennlp.tools.util.InvalidFormatException;
import opennlp.tools.util.Span;
public String getPeople(String sentence) {
String outputJSON = "";
TokenNameFinderModel model = null;
try {
model = new TokenNameFinderModel(
new File("en-ner-person.bin"));
} catch (InvalidFormatException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
NameFinderME finder = new NameFinderME(model);
Tokenizer tokenizer = SimpleTokenizer.INSTANCE;
String[] tokens = tokenizer.tokenize(sentence);
Span[] nameSpans = finder.find(tokens);
List<PersonName> people = new ArrayList<PersonName>();
String[] spanns = Span.spansToStrings(nameSpans, tokens);
for (int i = 0; i < spanns.length; i++) {
people.add(new PersonName(spanns[i]));
}
outputJSON = new Gson().toJson(people);
finder.clearAdaptiveData();
return "{\"names\":" + outputJSON + "}";
}
I used Eclipse for building and testing and you can build it with mvn package. Maven <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.dataflowdeveloper</groupId>
<artifactId>categorizer</artifactId>
<packaging>jar</packaging>
<version>1.0</version>
<name>categorizer</name>
<url>http://maven.apache.org</url>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-simple</artifactId>
<version>1.7.7</version>
</dependency>
<dependency>
<groupId>org.apache.opennlp</groupId>
<artifactId>opennlp-tools</artifactId>
<version>1.7.0</version>
</dependency>
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.8.0</version>
</dependency>
</dependencies>
</project>
Run Input: Tim Spann is going to the store. Peter Smith is using Hortonworks Hive.
Output: {"names":[{"name":"Tim Spann"},{"name":"Peter Smith"}]} Reference: http://opennlp.apache.org/ http://opennlp.apache.org/documentation/1.7.0/manual/opennlp.html#tools.namefind https://www.packtpub.com/books/content/finding-people-and-things http://opennlp.sourceforge.net/models-1.5/
... View more
01-01-2017
03:29 PM
You may need to open a JIRA with spark.apache.org or parquet. Seems an issue in one of them.
... View more
01-16-2017
10:23 PM
You install both, not even HDP Client
... View more