Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

how would I do a "stream-join" of one data-source to another in NiFi?

avatar

I have flatfiles of metadata (with updates every few minutes throughout the day). I have another stream that I need to join to this metadata in real-time. I know I can accomplish this in Storm or Spark Streaming with some code. Can NiFi help me do this without writing code?

For example, I have a list of malicious websites (the metadata), and I'm streaming in http requests.. I need to join the domains on those requests with the list of malicious websites and emit an alert if there are match(es).

Slightly more complex version of the same requirement.. how would I incorporate regular updates to the metadata?

1 ACCEPTED SOLUTION

avatar

@Randy Gelhausen For the specific case you mention, you should be able to use ExtractText to extract the domain of the URL to an attribute. Then you can use ScanAttribute to match against your list of malicious domains and route the FlowFile accordingly. While it doesn't appear to be documented, the dictionary file is scanned periodically for updates, so when you replace the file, ScanAttribute will be running against the updated list.

View solution in original post

3 REPLIES 3

avatar

@Randy Gelhausen For the specific case you mention, you should be able to use ExtractText to extract the domain of the URL to an attribute. Then you can use ScanAttribute to match against your list of malicious domains and route the FlowFile accordingly. While it doesn't appear to be documented, the dictionary file is scanned periodically for updates, so when you replace the file, ScanAttribute will be running against the updated list.

avatar
@jfrazee

if I match on multiple entries in the dictionary, will this processor emit one FlowFile for every entry match? A single flowfile with attributes of all the matched entries?

avatar

It'll be just a single FlowFile. This is true whether or not Match Criteria is set to 'All Must Match' or 'At Least One Must Match'. The distinction there isn't about entries in the dictionary so much as it is about the attributes. If it's 'All Must Match' then multiple attributes have to match against the single dictionary file (not multiple matches in the dictionary file).