- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Nifi best parctice to filter flowfiles using external file
- Labels:
-
Apache NiFi
Created ‎11-06-2016 05:08 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Just wondering what is the best practice for my use-case. My flowfiles are json objects and I need to filter/route them using external file (with list of values) - i.e. per flowfile to check if the value of some field (key) X is in the file or not.
The only two processors I noticed I can use for that are ScanContent and ReplaceTextWithMapping (which will "replace" a value in identical one).
ScanContent seems to be more appropriate since it does not perform a redundant 'Replace' action, but on the other hand it does not have the 'File Refresh Interval' property as the ReplaceTextWithMapping. Hence I'm guessing it continuously refresh the dictionary file (I didn't find relevant information about this issue in the documents), which is also an expensive (and redundant for my use-case) action that can harm the performance of the flow.
I tend to use the ReplaceTextWithMapping approach and skip the continuous refreshing of the file, but just wanted to ask around here, to check if there is another best-practice approach and make sure I get things right / didn't miss something.
Thanks in advance,
Liran
Created ‎11-06-2016 03:15 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The code for ScanContent looks like it watches the specified file for changes; otherwise it shouldn't refresh the dictionary file (unless something weird happens with the internal search mechanism).
Alternatively, I answered a Stack Overflow question with a similar use case, using ExecuteScript to check the JSON (and in their case, replace the value from an external file). That example also reads the file every time, but you could use a similar approach with InvokeScriptedProcessor to read the file in the initialize() method, then it will not be re-read during onTrigger (which is called when the processor is scheduled).
Created ‎11-06-2016 03:15 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The code for ScanContent looks like it watches the specified file for changes; otherwise it shouldn't refresh the dictionary file (unless something weird happens with the internal search mechanism).
Alternatively, I answered a Stack Overflow question with a similar use case, using ExecuteScript to check the JSON (and in their case, replace the value from an external file). That example also reads the file every time, but you could use a similar approach with InvokeScriptedProcessor to read the file in the initialize() method, then it will not be re-read during onTrigger (which is called when the processor is scheduled).
Created ‎11-07-2016 04:55 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
if the ScanContent watch for changes, I think it solve my problem. Thanks ! 🙂
