<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Nifi best parctice to filter flowfiles using external file in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-best-parctice-to-filter-flowfiles-using-external-file/m-p/166868#M45422</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;Just wondering what is the best practice for my use-case. My flowfiles are json objects and I need to filter/route them using external file (with list of values) - i.e. per flowfile to check if the value of some field (key) X is in the file or not.&lt;/P&gt;&lt;P&gt;The only two processors I noticed I can use for that are ScanContent and ReplaceTextWithMapping (which will "replace" a value in identical one).&lt;/P&gt;&lt;P&gt;ScanContent seems to be more appropriate since it does not perform a redundant 'Replace' action, but on the other hand it does not have the 'File Refresh Interval' property as the ReplaceTextWithMapping. Hence I'm guessing it continuously refresh the dictionary file (I didn't find relevant information about this issue in the documents), which is also an expensive (and redundant for my use-case) action that can harm the performance of the flow.&lt;/P&gt;&lt;P&gt;I tend to use the ReplaceTextWithMapping approach and skip the continuous refreshing of the file, but just wanted to ask around here, to check if there is another best-practice approach and make sure I get things right / didn't miss something.&lt;/P&gt;&lt;P&gt;Thanks in advance,&lt;/P&gt;&lt;P&gt;Liran&lt;/P&gt;</description>
    <pubDate>Sun, 06 Nov 2016 13:08:37 GMT</pubDate>
    <dc:creator>nahum_liran</dc:creator>
    <dc:date>2016-11-06T13:08:37Z</dc:date>
    <item>
      <title>Nifi best parctice to filter flowfiles using external file</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-best-parctice-to-filter-flowfiles-using-external-file/m-p/166868#M45422</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;Just wondering what is the best practice for my use-case. My flowfiles are json objects and I need to filter/route them using external file (with list of values) - i.e. per flowfile to check if the value of some field (key) X is in the file or not.&lt;/P&gt;&lt;P&gt;The only two processors I noticed I can use for that are ScanContent and ReplaceTextWithMapping (which will "replace" a value in identical one).&lt;/P&gt;&lt;P&gt;ScanContent seems to be more appropriate since it does not perform a redundant 'Replace' action, but on the other hand it does not have the 'File Refresh Interval' property as the ReplaceTextWithMapping. Hence I'm guessing it continuously refresh the dictionary file (I didn't find relevant information about this issue in the documents), which is also an expensive (and redundant for my use-case) action that can harm the performance of the flow.&lt;/P&gt;&lt;P&gt;I tend to use the ReplaceTextWithMapping approach and skip the continuous refreshing of the file, but just wanted to ask around here, to check if there is another best-practice approach and make sure I get things right / didn't miss something.&lt;/P&gt;&lt;P&gt;Thanks in advance,&lt;/P&gt;&lt;P&gt;Liran&lt;/P&gt;</description>
      <pubDate>Sun, 06 Nov 2016 13:08:37 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-best-parctice-to-filter-flowfiles-using-external-file/m-p/166868#M45422</guid>
      <dc:creator>nahum_liran</dc:creator>
      <dc:date>2016-11-06T13:08:37Z</dc:date>
    </item>
    <item>
      <title>Re: Nifi best parctice to filter flowfiles using external file</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-best-parctice-to-filter-flowfiles-using-external-file/m-p/166869#M45423</link>
      <description>&lt;P&gt;The code for ScanContent looks like it watches the specified file for changes; otherwise it shouldn't refresh the dictionary file (unless something weird happens with the internal search mechanism).&lt;/P&gt;&lt;P&gt;Alternatively, I answered a &lt;A target="_blank" href="https://stackoverflow.com/questions/37577453/apache-nifi-executescript-groovy-script-to-replace-json-values-via-a-mapping-fi/37581695"&gt;Stack Overflow question&lt;/A&gt; with a similar use case, using ExecuteScript to check the JSON (and in their case, replace the value from an external file). That example also reads the file every time, but you could use a similar approach with &lt;A target="_blank" href="https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.script.InvokeScriptedProcessor/index.html"&gt;InvokeScriptedProcessor&lt;/A&gt; to read the file in the initialize() method, then it will not be re-read during onTrigger (which is called when the processor is scheduled).&lt;/P&gt;</description>
      <pubDate>Sun, 06 Nov 2016 23:15:56 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-best-parctice-to-filter-flowfiles-using-external-file/m-p/166869#M45423</guid>
      <dc:creator>mburgess</dc:creator>
      <dc:date>2016-11-06T23:15:56Z</dc:date>
    </item>
    <item>
      <title>Re: Nifi best parctice to filter flowfiles using external file</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-best-parctice-to-filter-flowfiles-using-external-file/m-p/166870#M45424</link>
      <description>&lt;P&gt;if the ScanContent watch for changes, I think it solve my problem. Thanks ! &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 07 Nov 2016 12:55:18 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-best-parctice-to-filter-flowfiles-using-external-file/m-p/166870#M45424</guid>
      <dc:creator>nahum_liran</dc:creator>
      <dc:date>2016-11-07T12:55:18Z</dc:date>
    </item>
  </channel>
</rss>

