Code Repositories
Find and share code repositories
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Labels (1)
Super Guru
Repo Description

NiFi Custom Processor for Extracting Text from Documents via Apache Tika

This processor will extract the raw text from PDF, Word, HTML, XML, Excel, Powerpoint and other formats supported by Apache Tika.

Repo Info
Github Repo URL https://github.com/tspannhw/nifi-extracttext-processor
Github account name tspannhw
Repo name nifi-extracttext-processor
1,216 Views
Comments
Contributor

Does that mean that Nifi has built in Apache Tika into it or should we install Apache Tika externally

Super Guru

If you look in my processor, I am including Tika

https://github.com/tspannhw/nifi-extracttext-processor/blob/master/nifi-extracttext-processors/pom.x...

There are other processors using Tika in Apache NiFi.

Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
1 of 1
Last update:
‎02-06-2017 08:16 PM
Updated by:
 
Contributors
Top Kudoed Authors