Support Questions

Find answers, ask questions, and share your expertise

How to use Apache Tika in NIFI to extract metadata of file

avatar
Explorer

Hello I am new to NIFI and I have requirement like  use Apache Tika in NIFI to extract metadata of file . any help would be much appreciated .

2 REPLIES 2

avatar

I am not aware of any direct connectivity between Tika and NiFi.

Straight from my mind, The only solution I would think is to create a brand new NiFi Processor and integrate the parsing logic from Tika directly within NiFi. The code can be written in Java and then integrate afterwards directly in NiFi.( have a look here maybe -- https://medium.com/hashmapinc/creating-custom-processors-and-controllers-in-apache-nifi-e14148740ea )

 

Another option, if not working on something to complex, might be to try to implement this logic in a script and execute it in NiFi with ExecuteScript (see some great tutorials here --> https://community.cloudera.com/t5/Community-Articles/ExecuteScript-Cookbook-part-3/ta-p/249148 )

avatar
Master Mentor

@Madhav_VD 
Apache NiFi contains no native processors that utilize Apache Tika other than IdentifyMimeType (this processor does not do any extraction), but you can find others in the Apache that have created custom processors that utilize Apache Tika.  Adding custom nars to Apache NiFi is as easy as adding the custom nar to the auto-load directory:
https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#autoloading-processors

While I have no experience with any of these custom nars, you can give them a try to see if they meet your needs.  If not they may provide you with a stepping stone for creating your own custom variant.

https://github.com/tspannhw/nifi-extracttext-processor/releases/tag/html
https://community.cloudera.com/t5/Community-Articles/ExtractText-NiFi-Custom-Processor-Powered-by-Ap...

https://community.cloudera.com/t5/Community-Articles/Creating-HTML-from-PDF-Excel-and-Word-Documents...
https://github.com/tspannhw/nifi-extracttext-processor

If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped.

Thank you,

Matt