NiFi Custom Processor for Extracting Text from Documents via Apache Tika
This processor will extract the raw text from PDF, Word, HTML, XML, Excel, Powerpoint and other formats supported by Apache Tika.
Does that mean that Nifi has built in Apache Tika into it or should we install Apache Tika externally
If you look in my processor, I am including Tika
There are other processors using Tika in Apache NiFi.