<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: How to configure Extracting Text custom process in nifi in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/How-to-configure-Extracting-Text-custom-process-in-nifi/m-p/227215#M189075</link>
    <description>&lt;P&gt;My custom processor is pretty easy to customize.&lt;/P&gt;&lt;P&gt;&lt;A href="https://github.com/tspannhw/nifi-extracttext-processor" target="_blank"&gt;https://github.com/tspannhw/nifi-extracttext-processor&lt;/A&gt;&lt;/P&gt;&lt;P&gt;You can tweak it to extract just somethings, Apache Tika is very powerful.&lt;/P&gt;</description>
    <pubDate>Mon, 07 May 2018 20:30:02 GMT</pubDate>
    <dc:creator>TimothySpann</dc:creator>
    <dc:date>2018-05-07T20:30:02Z</dc:date>
    <item>
      <title>How to configure Extracting Text custom process in nifi</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-configure-Extracting-Text-custom-process-in-nifi/m-p/227210#M189070</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I've followed the following document and added ExtractText processor supported by TIKA. &lt;/P&gt;&lt;H4&gt;&lt;A href="https://community.hortonworks.com/articles/81694/extracttext-nifi-custom-processor-powered-by-apach.html"&gt;ExtractText NiFi Custom Processor Powered by Apache Tika&lt;/A&gt;&lt;/H4&gt;&lt;P&gt;Could you please help me what properties i need to add for the configuration? I used Getfile to ingest the pdf file, but i am not sure how this custom process should be configured. Any help would be appreciated. &lt;/P&gt;&lt;P&gt;SJ&lt;/P&gt;</description>
      <pubDate>Thu, 24 Aug 2017 01:58:06 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-configure-Extracting-Text-custom-process-in-nifi/m-p/227210#M189070</guid>
      <dc:creator>Sanaz_janbakhsh</dc:creator>
      <dc:date>2017-08-24T01:58:06Z</dc:date>
    </item>
    <item>
      <title>Re: How to configure Extracting Text custom process in nifi</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-configure-Extracting-Text-custom-process-in-nifi/m-p/227211#M189071</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/11035/sanazjanbakhsh.html" nodeid="11035" target="_blank"&gt;@Sanaz Janbakhsh&lt;/A&gt;&lt;/P&gt;&lt;P&gt;I just did a quick test using GetFile to ingest a PDF, and used the custom processor as is without any configuration.  I then used a PutFile to drop the output of the Extracted text to a dir.  As expected, the output is the text lifted from the original PDF, in a text file format.  No special configuration required.   If you are looking to play with the metadata using Tika, you can look at the ExtractMediaMetadata processor which comes with modern versions of NiFi out of the box and uses Tika under the hood.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="38478-screen-shot-2017-08-31-at-110825-am.png" style="width: 855px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/15301iC5E7F5D80D3C40D0/image-size/medium?v=v2&amp;amp;px=400" role="button" title="38478-screen-shot-2017-08-31-at-110825-am.png" alt="38478-screen-shot-2017-08-31-at-110825-am.png" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 18 Aug 2019 01:20:26 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-configure-Extracting-Text-custom-process-in-nifi/m-p/227211#M189071</guid>
      <dc:creator>ssahi</dc:creator>
      <dc:date>2019-08-18T01:20:26Z</dc:date>
    </item>
    <item>
      <title>Re: How to configure Extracting Text custom process in nifi</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-configure-Extracting-Text-custom-process-in-nifi/m-p/227212#M189072</link>
      <description>&lt;P&gt;Hi Sonu,&lt;/P&gt;&lt;P&gt;Thanks for the advice. Just a question, what if i want to extract specific text from pdf and not extract the whole pdf to text.Is it possible?&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Wed, 06 Sep 2017 12:46:47 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-configure-Extracting-Text-custom-process-in-nifi/m-p/227212#M189072</guid>
      <dc:creator>Sanaz_janbakhsh</dc:creator>
      <dc:date>2017-09-06T12:46:47Z</dc:date>
    </item>
    <item>
      <title>Re: How to configure Extracting Text custom process in nifi</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-configure-Extracting-Text-custom-process-in-nifi/m-p/227213#M189073</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/11035/sanazjanbakhsh.html" nodeid="11035"&gt;@Sanaz Janbakhsh&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/11035/sanazjanbakhsh.html" nodeid="11035"&gt;&lt;/A&gt;You could probably achieve that by combining processors.  Use the Tika-based processor to extract everything from the pdf in txt form, and then use another processor (ExtractText with RegEx to find your content for example) to extract the specific text you want, and decide what to do with that content from there. &lt;/P&gt;</description>
      <pubDate>Wed, 06 Sep 2017 22:36:28 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-configure-Extracting-Text-custom-process-in-nifi/m-p/227213#M189073</guid>
      <dc:creator>ssahi</dc:creator>
      <dc:date>2017-09-06T22:36:28Z</dc:date>
    </item>
    <item>
      <title>Re: How to configure Extracting Text custom process in nifi</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-configure-Extracting-Text-custom-process-in-nifi/m-p/227214#M189074</link>
      <description>&lt;P&gt;Thanks Sonu&lt;/P&gt;</description>
      <pubDate>Fri, 08 Sep 2017 21:47:35 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-configure-Extracting-Text-custom-process-in-nifi/m-p/227214#M189074</guid>
      <dc:creator>Sanaz_janbakhsh</dc:creator>
      <dc:date>2017-09-08T21:47:35Z</dc:date>
    </item>
    <item>
      <title>Re: How to configure Extracting Text custom process in nifi</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-configure-Extracting-Text-custom-process-in-nifi/m-p/227215#M189075</link>
      <description>&lt;P&gt;My custom processor is pretty easy to customize.&lt;/P&gt;&lt;P&gt;&lt;A href="https://github.com/tspannhw/nifi-extracttext-processor" target="_blank"&gt;https://github.com/tspannhw/nifi-extracttext-processor&lt;/A&gt;&lt;/P&gt;&lt;P&gt;You can tweak it to extract just somethings, Apache Tika is very powerful.&lt;/P&gt;</description>
      <pubDate>Mon, 07 May 2018 20:30:02 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-configure-Extracting-Text-custom-process-in-nifi/m-p/227215#M189075</guid>
      <dc:creator>TimothySpann</dc:creator>
      <dc:date>2018-05-07T20:30:02Z</dc:date>
    </item>
  </channel>
</rss>

