<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Compare attributes of different flowfiles in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Compare-attributes-of-different-flowfiles/m-p/181081#M143317</link>
    <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/18856/jrodriguez.html" nodeid="18856"&gt;@Jon Rodriguez Breton&lt;/A&gt; &lt;/P&gt;&lt;P&gt;Glad this worked for you.&lt;/P&gt;&lt;P&gt;As far as your new question:&lt;/P&gt;&lt;P&gt;The value written to the DistributedMapCache remains in the cache for a configured amount of time or until x configured number of entries exist.  So you can compare many files against this store value. So any FlowFile that matches a stored value is consider a duplicate. It is not a one time match of a single duplicate.&lt;/P&gt;&lt;P&gt;It would be very expensive to build a NiFi processor that would read in large batches of queued FlowFiles form a inbound queue to do comparisons on FlowFile Attributes (FlowFile attributes live in heap memory space, so the more FlowFile you pull in to do a comparison on, the more likely you are to encounter Out Of Memory).  So if you limit the size of the comparisons, how do you know a given batch contains the actual FlowFiles you want to compare?&lt;/P&gt;&lt;P&gt;This is why the detect duplicate makes use of an external service and compares FlowFiles against a stored value one FlowFile at a time.&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Matt&lt;/P&gt;</description>
    <pubDate>Wed, 13 Sep 2017 19:48:49 GMT</pubDate>
    <dc:creator>MattWho</dc:creator>
    <dc:date>2017-09-13T19:48:49Z</dc:date>
    <item>
      <title>Compare attributes of different flowfiles</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Compare-attributes-of-different-flowfiles/m-p/181073#M143309</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;Is it possible to compare the attributes of two different flowfiles and only pass one if the comparisson results matched?&lt;/P&gt;&lt;P&gt;Thank you,&lt;/P&gt;&lt;P&gt;Jon&lt;/P&gt;</description>
      <pubDate>Thu, 07 Sep 2017 22:12:36 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Compare-attributes-of-different-flowfiles/m-p/181073#M143309</guid>
      <dc:creator>j_rodriguez</dc:creator>
      <dc:date>2017-09-07T22:12:36Z</dc:date>
    </item>
    <item>
      <title>Re: Compare attributes of different flowfiles</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Compare-attributes-of-different-flowfiles/m-p/181074#M143310</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/18856/jrodriguez.html" nodeid="18856"&gt;@Jon Rodriguez Breton&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Have you checked out NiFi's &lt;EM&gt;RouteOnAttribute&lt;/EM&gt; processor? It can compare the attributes of incoming flowfiles and handle accordingly based on the routing strategy you select. &lt;/P&gt;</description>
      <pubDate>Tue, 12 Sep 2017 19:28:29 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Compare-attributes-of-different-flowfiles/m-p/181074#M143310</guid>
      <dc:creator>rmoran</dc:creator>
      <dc:date>2017-09-12T19:28:29Z</dc:date>
    </item>
    <item>
      <title>Re: Compare attributes of different flowfiles</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Compare-attributes-of-different-flowfiles/m-p/181075#M143311</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/18856/jrodriguez.html" nodeid="18856"&gt;@Jon Rodriguez Breton&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Are you trying to see if all attributes from both FlowFiles match exactly or is their a specific attribute from each FlowFile you want to compare?&lt;BR /&gt;&lt;BR /&gt;My initial thought would be to use the DetectDuplicate processor.&lt;BR /&gt;&lt;BR /&gt;You could write the unique attribute to the DistributedMapCache service.  &lt;BR /&gt;Then compare new FlowFiles against that stored value and deleted any duplicates.&lt;BR /&gt;That way only the first FlowFile would get passed on.&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Matt&lt;/P&gt;</description>
      <pubDate>Tue, 12 Sep 2017 19:45:16 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Compare-attributes-of-different-flowfiles/m-p/181075#M143311</guid>
      <dc:creator>MattWho</dc:creator>
      <dc:date>2017-09-12T19:45:16Z</dc:date>
    </item>
    <item>
      <title>Re: Compare attributes of different flowfiles</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Compare-attributes-of-different-flowfiles/m-p/181076#M143312</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/525/mclark.html" nodeid="525"&gt;@Matt Clarke&lt;/A&gt;,&lt;/P&gt;&lt;P&gt;I just want to compare some attributes from both flowfiles... I'll try with that processor and I'll be back!&lt;/P&gt;</description>
      <pubDate>Tue, 12 Sep 2017 19:48:33 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Compare-attributes-of-different-flowfiles/m-p/181076#M143312</guid>
      <dc:creator>j_rodriguez</dc:creator>
      <dc:date>2017-09-12T19:48:33Z</dc:date>
    </item>
    <item>
      <title>Re: Compare attributes of different flowfiles</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Compare-attributes-of-different-flowfiles/m-p/181077#M143313</link>
      <description>&lt;P&gt;Yes, I've tried to use RouteOnAttribute but the thing is that I want to compare two different flowfiles attributes... and as far as I understand, RouteOnAttribute doesn't allow this kind of comparison... tell me if I'm wrong!&lt;/P&gt;</description>
      <pubDate>Tue, 12 Sep 2017 19:49:56 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Compare-attributes-of-different-flowfiles/m-p/181077#M143313</guid>
      <dc:creator>j_rodriguez</dc:creator>
      <dc:date>2017-09-12T19:49:56Z</dc:date>
    </item>
    <item>
      <title>Re: Compare attributes of different flowfiles</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Compare-attributes-of-different-flowfiles/m-p/181078#M143314</link>
      <description>&lt;P&gt;Ah, I overlooked the "only pass one" goal in the original question. As &lt;A rel="user" href="https://community.cloudera.com/users/525/mclark.html" nodeid="525"&gt;@Matt Clarke&lt;/A&gt; mentioned, looks like DetectDuplicate might help with that part. &lt;/P&gt;</description>
      <pubDate>Tue, 12 Sep 2017 19:53:39 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Compare-attributes-of-different-flowfiles/m-p/181078#M143314</guid>
      <dc:creator>rmoran</dc:creator>
      <dc:date>2017-09-12T19:53:39Z</dc:date>
    </item>
    <item>
      <title>Re: Compare attributes of different flowfiles</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Compare-attributes-of-different-flowfiles/m-p/181079#M143315</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/525/mclark.html" nodeid="525"&gt;@Matt Clarke&lt;/A&gt;,&lt;/P&gt;&lt;P&gt;That processor made the trick. It's exactly what I was looking for. Thank you so much.&lt;/P&gt;&lt;P&gt;Best,&lt;/P&gt;&lt;P&gt;Jon&lt;/P&gt;</description>
      <pubDate>Wed, 13 Sep 2017 16:38:16 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Compare-attributes-of-different-flowfiles/m-p/181079#M143315</guid>
      <dc:creator>j_rodriguez</dc:creator>
      <dc:date>2017-09-13T16:38:16Z</dc:date>
    </item>
    <item>
      <title>Re: Compare attributes of different flowfiles</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Compare-attributes-of-different-flowfiles/m-p/181080#M143316</link>
      <description>&lt;P&gt;So, I've accomplished the comparisson of two flowfiles. Trying to complicate this a bit more, what about if I have more than two flowfiles to compare? Is there any processor for this? Other option would be to "play" with the DetectDuplicate and create different levels of comparisson... but I don't like it so much thus the DetectDuplicate itself is very clean.&lt;/P&gt;&lt;P&gt;Any idea &lt;A rel="user" href="https://community.cloudera.com/users/525/mclark.html" nodeid="525"&gt;@Matt Clarke&lt;/A&gt;, @Rob ?&lt;/P&gt;&lt;P&gt;Thanks! &lt;/P&gt;</description>
      <pubDate>Wed, 13 Sep 2017 17:44:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Compare-attributes-of-different-flowfiles/m-p/181080#M143316</guid>
      <dc:creator>j_rodriguez</dc:creator>
      <dc:date>2017-09-13T17:44:31Z</dc:date>
    </item>
    <item>
      <title>Re: Compare attributes of different flowfiles</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Compare-attributes-of-different-flowfiles/m-p/181081#M143317</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/18856/jrodriguez.html" nodeid="18856"&gt;@Jon Rodriguez Breton&lt;/A&gt; &lt;/P&gt;&lt;P&gt;Glad this worked for you.&lt;/P&gt;&lt;P&gt;As far as your new question:&lt;/P&gt;&lt;P&gt;The value written to the DistributedMapCache remains in the cache for a configured amount of time or until x configured number of entries exist.  So you can compare many files against this store value. So any FlowFile that matches a stored value is consider a duplicate. It is not a one time match of a single duplicate.&lt;/P&gt;&lt;P&gt;It would be very expensive to build a NiFi processor that would read in large batches of queued FlowFiles form a inbound queue to do comparisons on FlowFile Attributes (FlowFile attributes live in heap memory space, so the more FlowFile you pull in to do a comparison on, the more likely you are to encounter Out Of Memory).  So if you limit the size of the comparisons, how do you know a given batch contains the actual FlowFiles you want to compare?&lt;/P&gt;&lt;P&gt;This is why the detect duplicate makes use of an external service and compares FlowFiles against a stored value one FlowFile at a time.&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Matt&lt;/P&gt;</description>
      <pubDate>Wed, 13 Sep 2017 19:48:49 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Compare-attributes-of-different-flowfiles/m-p/181081#M143317</guid>
      <dc:creator>MattWho</dc:creator>
      <dc:date>2017-09-13T19:48:49Z</dc:date>
    </item>
    <item>
      <title>Re: Compare attributes of different flowfiles</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Compare-attributes-of-different-flowfiles/m-p/181082#M143318</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/525/mclark.html" nodeid="525"&gt;@Matt Clarke&lt;/A&gt;,&lt;/P&gt;&lt;P&gt;Thank you. So, how about cleaning this cache eventually? Is it possible to clean it whenever a duplicate is found? I'm trying with the Eviction Strategy Property but no getting anything so far... I would like to clean the cache whenever a duplicate is found.&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;</description>
      <pubDate>Fri, 15 Sep 2017 19:03:49 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Compare-attributes-of-different-flowfiles/m-p/181082#M143318</guid>
      <dc:creator>j_rodriguez</dc:creator>
      <dc:date>2017-09-15T19:03:49Z</dc:date>
    </item>
    <item>
      <title>Re: Compare attributes of different flowfiles</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Compare-attributes-of-different-flowfiles/m-p/181083#M143319</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/18856/jrodriguez.html" nodeid="18856"&gt;@Jon Rodriguez Breton&lt;/A&gt;&lt;P&gt;There are no dedicated processors for removing cached entries from the distributed map cache.&lt;/P&gt;&lt;P&gt;You can try using the "Age Off Duration" property in the detect duplicate processor or use a scripting processor in NiFi to execute a script to clear the cache.&lt;/P&gt;&lt;P&gt;The follwoing Jira covers this missing processor as well as provide a sample template &lt;/P&gt;&lt;P&gt;&lt;A href="https://issues.apache.org/jira/browse/NIFI-4173" target="_blank"&gt;https://issues.apache.org/jira/browse/NIFI-4173&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 15 Sep 2017 20:42:25 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Compare-attributes-of-different-flowfiles/m-p/181083#M143319</guid>
      <dc:creator>MattWho</dc:creator>
      <dc:date>2017-09-15T20:42:25Z</dc:date>
    </item>
  </channel>
</rss>

