About TimothySpann

TimothySpann · ‎01-15-2017

Bigger files are better than millions of little one.

rick_moritz · ‎08-09-2017

I think there's a bug though - if you revert to a previous config (as you usually would, when undoing a change), the HS2 interactive is not removed from the previously selected host.

TimothySpann · ‎01-11-2017

ghost.xml NIFI Template

TimothySpann · ‎08-22-2017

For Sentiment Analysis with NiFi Processors Download and build these https://github.com/tspannhw/nifi-corenlp-processor https://github.com/tspannhw/nifi-nlp-processor

TimothySpann · ‎05-07-2018

That is in com.dataflowdeveloper. It is a one method class I wrote to hold the string.

TimothySpann · ‎01-04-2017

I increased it in nifi-ambari-config Max memory allocation

TimothySpann · ‎01-04-2017

My first caveat would be that in my tests, the pre-trained models is missing a lot of names. If this is for a production work load, I would recommend training your own models using your own data. Maybe use all of your corporate directory, client list, Salesforce data, LinkedIn and social media. I would recommend full name, first names and any nicknames that are commonly used. The current version is 1.7.0 and there are pre-trained 1.5.0 models that work. They have a number of pre-trained models in a few human languages. I chose English (http://opennlp.sourceforge.net/models-1.5/en-ner-person.bin). Walk Through: Create TokenNameFinderModel from pre-built person model. Tokenize the input sentence. Find the identified people. Convert to JSON array. You can easily plug this into a custom NiFi processor, microservice, command line tool or routine in a larger Apache Storm or Apache Spark pipeline. Code (JavaBean) public class PersonName { private String name = ""; public String getName() { return name; } public void setName(String name) { this.name = name; } } Code (getPeople) import java.io.File; import java.io.FileInputStream; import java.io.IOException; import java.io.InputStream; import java.util.ArrayList; import java.util.List; import com.google.gson.Gson; import opennlp.tools.namefind.NameFinderME; import opennlp.tools.namefind.TokenNameFinderModel; import opennlp.tools.tokenize.SimpleTokenizer; import opennlp.tools.tokenize.Tokenizer; import opennlp.tools.tokenize.TokenizerME; import opennlp.tools.tokenize.TokenizerModel; import opennlp.tools.util.InvalidFormatException; import opennlp.tools.util.Span; public String getPeople(String sentence) { String outputJSON = ""; TokenNameFinderModel model = null; try { model = new TokenNameFinderModel( new File("en-ner-person.bin")); } catch (InvalidFormatException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } NameFinderME finder = new NameFinderME(model); Tokenizer tokenizer = SimpleTokenizer.INSTANCE; String[] tokens = tokenizer.tokenize(sentence); Span[] nameSpans = finder.find(tokens); List<PersonName> people = new ArrayList<PersonName>(); String[] spanns = Span.spansToStrings(nameSpans, tokens); for (int i = 0; i < spanns.length; i++) { people.add(new PersonName(spanns[i])); } outputJSON = new Gson().toJson(people); finder.clearAdaptiveData(); return "{\"names\":" + outputJSON + "}"; } I used Eclipse for building and testing and you can build it with mvn package. Maven <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.dataflowdeveloper</groupId> <artifactId>categorizer</artifactId> <packaging>jar</packaging> <version>1.0</version> <name>categorizer</name> <url>http://maven.apache.org</url> <dependencies> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>3.8.1</version> <scope>test</scope> </dependency> <dependency> <groupId>org.slf4j</groupId> <artifactId>slf4j-simple</artifactId> <version>1.7.7</version> </dependency> <dependency> <groupId>org.apache.opennlp</groupId> <artifactId>opennlp-tools</artifactId> <version>1.7.0</version> </dependency> <dependency> <groupId>com.google.code.gson</groupId> <artifactId>gson</artifactId> <version>2.8.0</version> </dependency> </dependencies> </project> Run Input: Tim Spann is going to the store. Peter Smith is using Hortonworks Hive. Output: {"names":[{"name":"Tim Spann"},{"name":"Peter Smith"}]} Reference: http://opennlp.apache.org/ http://opennlp.apache.org/documentation/1.7.0/manual/opennlp.html#tools.namefind https://www.packtpub.com/books/content/finding-people-and-things http://opennlp.sourceforge.net/models-1.5/

TimothySpann · ‎01-01-2017

You may need to open a JIRA with spark.apache.org or parquet. Seems an issue in one of them.

TimothySpann · ‎01-16-2017

You install both, not even HDP Client

TimothySpann · ‎01-15-2017

complete uninstall remove old repos reboot and fixed

Online	Offline
Last Visited	‎05-20-2024 05:42 PM

Member Since	‎01-07-2019 11:58 AM
Last Visited	‎05-20-2024 05:42 PM
Posts	1,973
Kudos received	1122

Cloudera Community

Re: Has anyone tried NiFi consuming (JMSConsume) f...

Re: NiFi Crash after runing chain of lookups

Re: Recommend approach for listening to RSS Feed i...

Re: NiFi ListenFTP Processor Default Data Port

Re: Nifi: Kafka Producer with Avro format in both ...

Re: Suggestions to handle high volume streaming da...

Re: How do you change HiveServer2 Interactive Host

Re: Basic Image Processing and Linux Utilities As ...

Re: Using Sentiment Analysis and NLP Tools With HD...

Re: Data Processing Pipeline: Parsing PDFs and Id...

Re: NIFI Warnings and Errors

Using OpenNLP for Identifying Names From Text

Re: Spark 1.6.1 - how to skip corrupted parquet bl...

Re: Trying to install NiFi 1.1 from HDF Repo

Re: HDF 2.0 Install Fail