<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: how would I do a &amp;quot;stream-join&amp;quot; of one data-source to another in NiFi? in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-would-I-do-a-quot-stream-join-quot-of-one-data-source-to/m-p/164301#M24872</link>
    <description>&lt;P&gt;&lt;A href="#"&gt;@Randy Gelhausen&lt;/A&gt; For the specific case you mention, you should be able to use ExtractText to extract the domain of the URL to an attribute. Then you can use ScanAttribute to match against your list of malicious domains and route the FlowFile accordingly. While it doesn't appear to be documented, the dictionary file is scanned periodically for updates, so when you replace the file, ScanAttribute will be running against the updated list.&lt;/P&gt;</description>
    <pubDate>Mon, 11 Apr 2016 06:17:04 GMT</pubDate>
    <dc:creator>jfrazee</dc:creator>
    <dc:date>2016-04-11T06:17:04Z</dc:date>
    <item>
      <title>how would I do a "stream-join" of one data-source to another in NiFi?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-would-I-do-a-quot-stream-join-quot-of-one-data-source-to/m-p/164300#M24871</link>
      <description>&lt;P&gt;I have flatfiles of metadata (with updates every few minutes throughout the day). I have another stream that I need to join to this metadata in real-time. I know I can accomplish this in Storm or Spark Streaming with some code. Can NiFi help me do this without writing code?&lt;/P&gt;&lt;P&gt;For example, I have a list of malicious websites (the metadata), and I'm streaming in http requests.. I need to join the domains on those requests with the list of malicious websites and emit an alert if there are match(es).&lt;/P&gt;&lt;P&gt;Slightly more complex version of the same requirement.. how would I incorporate regular updates to the metadata?&lt;/P&gt;</description>
      <pubDate>Mon, 11 Apr 2016 05:55:03 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-would-I-do-a-quot-stream-join-quot-of-one-data-source-to/m-p/164300#M24871</guid>
      <dc:creator>rgelhausen</dc:creator>
      <dc:date>2016-04-11T05:55:03Z</dc:date>
    </item>
    <item>
      <title>Re: how would I do a "stream-join" of one data-source to another in NiFi?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-would-I-do-a-quot-stream-join-quot-of-one-data-source-to/m-p/164301#M24872</link>
      <description>&lt;P&gt;&lt;A href="#"&gt;@Randy Gelhausen&lt;/A&gt; For the specific case you mention, you should be able to use ExtractText to extract the domain of the URL to an attribute. Then you can use ScanAttribute to match against your list of malicious domains and route the FlowFile accordingly. While it doesn't appear to be documented, the dictionary file is scanned periodically for updates, so when you replace the file, ScanAttribute will be running against the updated list.&lt;/P&gt;</description>
      <pubDate>Mon, 11 Apr 2016 06:17:04 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-would-I-do-a-quot-stream-join-quot-of-one-data-source-to/m-p/164301#M24872</guid>
      <dc:creator>jfrazee</dc:creator>
      <dc:date>2016-04-11T06:17:04Z</dc:date>
    </item>
    <item>
      <title>Re: how would I do a "stream-join" of one data-source to another in NiFi?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-would-I-do-a-quot-stream-join-quot-of-one-data-source-to/m-p/164302#M24873</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/2956/jfrazee.html" nodeid="2956"&gt;@jfrazee&lt;/A&gt;&lt;P&gt;if I match on multiple entries in the dictionary, will this processor emit one FlowFile for every entry match? A single flowfile with attributes of all the matched entries?&lt;/P&gt;</description>
      <pubDate>Mon, 11 Apr 2016 08:22:35 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-would-I-do-a-quot-stream-join-quot-of-one-data-source-to/m-p/164302#M24873</guid>
      <dc:creator>rgelhausen</dc:creator>
      <dc:date>2016-04-11T08:22:35Z</dc:date>
    </item>
    <item>
      <title>Re: how would I do a "stream-join" of one data-source to another in NiFi?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-would-I-do-a-quot-stream-join-quot-of-one-data-source-to/m-p/164303#M24874</link>
      <description>&lt;P&gt;It'll be just a single FlowFile. This is true whether or not Match Criteria is set to 'All Must Match' or 'At Least One Must Match'. The distinction there isn't about entries in the dictionary so much as it is about the attributes. If it's 'All Must Match' then multiple attributes have to match against the single dictionary file (not multiple matches in the dictionary file).&lt;/P&gt;</description>
      <pubDate>Mon, 11 Apr 2016 21:58:17 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-would-I-do-a-quot-stream-join-quot-of-one-data-source-to/m-p/164303#M24874</guid>
      <dc:creator>jfrazee</dc:creator>
      <dc:date>2016-04-11T21:58:17Z</dc:date>
    </item>
  </channel>
</rss>

