Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

how would I do a "stream-join" of one data-source to another in NiFi?

Solved Go to solution

how would I do a "stream-join" of one data-source to another in NiFi?

I have flatfiles of metadata (with updates every few minutes throughout the day). I have another stream that I need to join to this metadata in real-time. I know I can accomplish this in Storm or Spark Streaming with some code. Can NiFi help me do this without writing code?

For example, I have a list of malicious websites (the metadata), and I'm streaming in http requests.. I need to join the domains on those requests with the list of malicious websites and emit an alert if there are match(es).

Slightly more complex version of the same requirement.. how would I incorporate regular updates to the metadata?

1 ACCEPTED SOLUTION

Accepted Solutions

Re: how would I do a "stream-join" of one data-source to another in NiFi?

@Randy Gelhausen For the specific case you mention, you should be able to use ExtractText to extract the domain of the URL to an attribute. Then you can use ScanAttribute to match against your list of malicious domains and route the FlowFile accordingly. While it doesn't appear to be documented, the dictionary file is scanned periodically for updates, so when you replace the file, ScanAttribute will be running against the updated list.

3 REPLIES 3

Re: how would I do a "stream-join" of one data-source to another in NiFi?

@Randy Gelhausen For the specific case you mention, you should be able to use ExtractText to extract the domain of the URL to an attribute. Then you can use ScanAttribute to match against your list of malicious domains and route the FlowFile accordingly. While it doesn't appear to be documented, the dictionary file is scanned periodically for updates, so when you replace the file, ScanAttribute will be running against the updated list.

Re: how would I do a "stream-join" of one data-source to another in NiFi?

@jfrazee

if I match on multiple entries in the dictionary, will this processor emit one FlowFile for every entry match? A single flowfile with attributes of all the matched entries?

Re: how would I do a "stream-join" of one data-source to another in NiFi?

It'll be just a single FlowFile. This is true whether or not Match Criteria is set to 'All Must Match' or 'At Least One Must Match'. The distinction there isn't about entries in the dictionary so much as it is about the attributes. If it's 'All Must Match' then multiple attributes have to match against the single dictionary file (not multiple matches in the dictionary file).

Don't have an account?
Coming from Hortonworks? Activate your account here