<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: How  to use Apache Tika in NIFI to extract metadata  of file in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/How-to-use-Apache-Tika-in-NIFI-to-extract-metadata-of-file/m-p/375150#M242307</link>
    <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/106024"&gt;@Madhav_VD&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;Apache NiFi contains no native processors that utilize Apache Tika other than &lt;A href="https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.23.0/org.apache.nifi.processors.standard.IdentifyMimeType/index.html" target="_self"&gt;IdentifyMimeType&lt;/A&gt;&amp;nbsp;(this processor does not do any extraction), but you can find others in the Apache that have created custom processors that utilize Apache Tika.&amp;nbsp; Adding custom nars to Apache NiFi is as easy as adding the custom nar to the auto-load directory:&lt;BR /&gt;&lt;A href="https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#autoloading-processors" target="_blank"&gt;https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#autoloading-processors&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;While I have no experience with any of these custom nars, you can give them a try to see if they meet your needs.&amp;nbsp; If not they may provide you with a stepping stone for creating your own custom variant.&lt;BR /&gt;&lt;BR /&gt;&lt;A href="https://github.com/tspannhw/nifi-extracttext-processor/releases/tag/html" target="_blank"&gt;https://github.com/tspannhw/nifi-extracttext-processor/releases/tag/html&lt;/A&gt;&lt;BR /&gt;&lt;A href="https://community.cloudera.com/t5/Community-Articles/ExtractText-NiFi-Custom-Processor-Powered-by-Apache-Tika/ta-p/249392" target="_blank"&gt;https://community.cloudera.com/t5/Community-Articles/ExtractText-NiFi-Custom-Processor-Powered-by-Apache-Tika/ta-p/249392&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;A href="https://community.cloudera.com/t5/Community-Articles/Creating-HTML-from-PDF-Excel-and-Word-Documents-using-Apache/ta-p/247968" target="_blank"&gt;https://community.cloudera.com/t5/Community-Articles/Creating-HTML-from-PDF-Excel-and-Word-Documents-using-Apache/ta-p/247968&lt;/A&gt;&lt;BR /&gt;&lt;A href="https://github.com/tspannhw/nifi-extracttext-processor" target="_blank"&gt;https://github.com/tspannhw/nifi-extracttext-processor&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="batang,apple gothic"&gt;If you found that the provided solution(s) assisted you with your query, please take a moment to login and click&lt;/FONT&gt;&amp;nbsp;&lt;FONT face="arial black,avant garde" color="#FF0000"&gt;Accept as Solution&amp;nbsp;&lt;/FONT&gt;&lt;FONT face="batang,apple gothic" color="#000000"&gt;below each response that helped.&lt;BR /&gt;&lt;BR /&gt;Thank you,&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="batang,apple gothic" color="#000000"&gt;Matt&lt;/FONT&gt;&lt;/P&gt;</description>
    <pubDate>Fri, 11 Aug 2023 15:18:01 GMT</pubDate>
    <dc:creator>MattWho</dc:creator>
    <dc:date>2023-08-11T15:18:01Z</dc:date>
    <item>
      <title>How  to use Apache Tika in NIFI to extract metadata  of file</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-use-Apache-Tika-in-NIFI-to-extract-metadata-of-file/m-p/375128#M242299</link>
      <description>&lt;P&gt;Hello I am new to NIFI and I have requirement like&amp;nbsp; use Apache Tika in NIFI to extract metadata of file . &lt;SPAN&gt;any help would be much appreciated .&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 11 Aug 2023 05:23:52 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-use-Apache-Tika-in-NIFI-to-extract-metadata-of-file/m-p/375128#M242299</guid>
      <dc:creator>Madhav_VD</dc:creator>
      <dc:date>2023-08-11T05:23:52Z</dc:date>
    </item>
    <item>
      <title>Re: How  to use Apache Tika in NIFI to extract metadata  of file</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-use-Apache-Tika-in-NIFI-to-extract-metadata-of-file/m-p/375146#M242304</link>
      <description>&lt;P&gt;I am not aware of any direct connectivity between Tika and NiFi.&lt;BR /&gt;&lt;BR /&gt;Straight from my mind, The only solution I would think is to create a brand new NiFi Processor and integrate the parsing logic from Tika directly within NiFi. The code can be written in Java and then integrate afterwards directly in NiFi.( have a look here maybe -- &lt;A href="https://medium.com/hashmapinc/creating-custom-processors-and-controllers-in-apache-nifi-e14148740ea" target="_blank"&gt;https://medium.com/hashmapinc/creating-custom-processors-and-controllers-in-apache-nifi-e14148740ea&lt;/A&gt; )&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Another option, if not working on something to complex, might be to try to implement this logic in a script and execute it in NiFi with ExecuteScript (see some great tutorials here --&amp;gt; &lt;A href="https://community.cloudera.com/t5/Community-Articles/ExecuteScript-Cookbook-part-3/ta-p/249148" target="_blank"&gt;https://community.cloudera.com/t5/Community-Articles/ExecuteScript-Cookbook-part-3/ta-p/249148&lt;/A&gt; )&lt;/P&gt;</description>
      <pubDate>Fri, 11 Aug 2023 12:47:13 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-use-Apache-Tika-in-NIFI-to-extract-metadata-of-file/m-p/375146#M242304</guid>
      <dc:creator>cotopaul</dc:creator>
      <dc:date>2023-08-11T12:47:13Z</dc:date>
    </item>
    <item>
      <title>Re: How  to use Apache Tika in NIFI to extract metadata  of file</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-use-Apache-Tika-in-NIFI-to-extract-metadata-of-file/m-p/375150#M242307</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/106024"&gt;@Madhav_VD&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;Apache NiFi contains no native processors that utilize Apache Tika other than &lt;A href="https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.23.0/org.apache.nifi.processors.standard.IdentifyMimeType/index.html" target="_self"&gt;IdentifyMimeType&lt;/A&gt;&amp;nbsp;(this processor does not do any extraction), but you can find others in the Apache that have created custom processors that utilize Apache Tika.&amp;nbsp; Adding custom nars to Apache NiFi is as easy as adding the custom nar to the auto-load directory:&lt;BR /&gt;&lt;A href="https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#autoloading-processors" target="_blank"&gt;https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#autoloading-processors&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;While I have no experience with any of these custom nars, you can give them a try to see if they meet your needs.&amp;nbsp; If not they may provide you with a stepping stone for creating your own custom variant.&lt;BR /&gt;&lt;BR /&gt;&lt;A href="https://github.com/tspannhw/nifi-extracttext-processor/releases/tag/html" target="_blank"&gt;https://github.com/tspannhw/nifi-extracttext-processor/releases/tag/html&lt;/A&gt;&lt;BR /&gt;&lt;A href="https://community.cloudera.com/t5/Community-Articles/ExtractText-NiFi-Custom-Processor-Powered-by-Apache-Tika/ta-p/249392" target="_blank"&gt;https://community.cloudera.com/t5/Community-Articles/ExtractText-NiFi-Custom-Processor-Powered-by-Apache-Tika/ta-p/249392&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;A href="https://community.cloudera.com/t5/Community-Articles/Creating-HTML-from-PDF-Excel-and-Word-Documents-using-Apache/ta-p/247968" target="_blank"&gt;https://community.cloudera.com/t5/Community-Articles/Creating-HTML-from-PDF-Excel-and-Word-Documents-using-Apache/ta-p/247968&lt;/A&gt;&lt;BR /&gt;&lt;A href="https://github.com/tspannhw/nifi-extracttext-processor" target="_blank"&gt;https://github.com/tspannhw/nifi-extracttext-processor&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="batang,apple gothic"&gt;If you found that the provided solution(s) assisted you with your query, please take a moment to login and click&lt;/FONT&gt;&amp;nbsp;&lt;FONT face="arial black,avant garde" color="#FF0000"&gt;Accept as Solution&amp;nbsp;&lt;/FONT&gt;&lt;FONT face="batang,apple gothic" color="#000000"&gt;below each response that helped.&lt;BR /&gt;&lt;BR /&gt;Thank you,&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="batang,apple gothic" color="#000000"&gt;Matt&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 11 Aug 2023 15:18:01 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-use-Apache-Tika-in-NIFI-to-extract-metadata-of-file/m-p/375150#M242307</guid>
      <dc:creator>MattWho</dc:creator>
      <dc:date>2023-08-11T15:18:01Z</dc:date>
    </item>
  </channel>
</rss>

