Created 08-10-2023 10:23 PM
Hello I am new to NIFI and I have requirement like use Apache Tika in NIFI to extract metadata of file . any help would be much appreciated .
Created 08-11-2023 05:47 AM
I am not aware of any direct connectivity between Tika and NiFi.
Straight from my mind, The only solution I would think is to create a brand new NiFi Processor and integrate the parsing logic from Tika directly within NiFi. The code can be written in Java and then integrate afterwards directly in NiFi.( have a look here maybe -- https://medium.com/hashmapinc/creating-custom-processors-and-controllers-in-apache-nifi-e14148740ea )
Another option, if not working on something to complex, might be to try to implement this logic in a script and execute it in NiFi with ExecuteScript (see some great tutorials here --> https://community.cloudera.com/t5/Community-Articles/ExecuteScript-Cookbook-part-3/ta-p/249148 )
Created 08-11-2023 08:18 AM
@Madhav_VD
Apache NiFi contains no native processors that utilize Apache Tika other than IdentifyMimeType (this processor does not do any extraction), but you can find others in the Apache that have created custom processors that utilize Apache Tika. Adding custom nars to Apache NiFi is as easy as adding the custom nar to the auto-load directory:
https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#autoloading-processors
While I have no experience with any of these custom nars, you can give them a try to see if they meet your needs. If not they may provide you with a stepping stone for creating your own custom variant.
https://github.com/tspannhw/nifi-extracttext-processor/releases/tag/html
https://community.cloudera.com/t5/Community-Articles/ExtractText-NiFi-Custom-Processor-Powered-by-Ap...
https://community.cloudera.com/t5/Community-Articles/Creating-HTML-from-PDF-Excel-and-Word-Documents...
https://github.com/tspannhw/nifi-extracttext-processor
If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped.
Thank you,
Matt