Member since
05-01-2019
3
Posts
0
Kudos Received
0
Solutions
05-03-2019
06:55 PM
I just wanted to take a moment to say thank you for the excellent explanation and for making this all much clearer to me. Have a great rest of day! Chris
... View more
05-03-2019
03:51 PM
First of all, thank you for your response Matt. Sorry I wasn't clear enough in my MergeContent description, but it's as you said; I'm using the Bin-Packing Algorithm merge strategy, and I've set the "Minimum Number of Entries" equal to the amount of ReplaceText processors I have (2 in this case), such that the MergeContent processor will merge a given bin when it meets the configured values for "Minimum Number of Entries" (2) and "group size" (Minimum Group Size is set to 1 B and I haven't set a Maximum Group Size). And with respect to the third point you outlined, I've set the "Max Bin Age" attribute to 10 sec such that if the bin doesn't meet the configured values for "Minimum Number of Entries" and "group size" within 10 sec then it will merge the flowfiles currently allocated to the bin. For more context here's a template of the situation I'm describing, where the second screenshot shows the configurations for the MergeContent processor: For "List1" and "List2" above, as an example, the "Replacement Value" attributes contain the following respectively: abc1 abc2 abc3 and abc2 abc3 abc4 I've since found a way to avoid adding arbitrary max latency and have opted to use a global variable instead that will be incremented after each ReplaceText processor using UpdateAttribute (and connecting UpdateAttribute to MergeContent directly), and I'll set "Minimum Number of Entries" to that global variable. In this way, each time a ReplaceText "stream" is executed, the global variable is incremented signifying that we're expecting one more list, and the MergeContent processor will wait for all the flowfiles required (each containing a different list of course) to merge the bin, assuming of course that I don't assign a value to "Max Bin Age". I believe this approach is more robust than using Max Bin Age/max latency, what do you think? Also, regarding this comment: "Also keep in mind that with a NiFi cluster, each NiFi node is running the MergeContent processor independently of the other nodes and can only merge FlowFiles on the same node. So while an inbound connection queue to the MergeContent processor may show 2 queued FlowFiles, each of those FlowFiles may exist on different nodes and thus would not be merged together in to one FlowFile. " would you suggest then to set the "Execution" of MergeContent to Primary node instead of All nodes? Thank you for your time and I look forward to your response, Chris
... View more
05-02-2019
11:18 PM
We currently have 2 ReplaceText processors (and we'll potentially add more in the future) that each contain a list of file names in their Replacement Value attribute, such that the corresponding flowfiles contain this list of file names in their content. We want to know if there are recommended strategies to merge these flowfiles coming from different streams, but I need the merging to happen at the same time because the output will be used to count the total number of files, and the latter will be passed to a Wait processor for some other functionality of the flow that uses the Wait/Notify pattern (the total number of files is assigned to "Target Signal Count" in the Wait processor). The main problem we're facing is that we don't know in advance which ReplaceText processors will run in the first place (could be one or the other, or both), so we don't know which flowfiles we have to merge, but we need a way to merge the flowfiles of the ReplaceText processors that are running. One way I've solved this is by sending the flowfiles to a MergeContent processor, and setting the "Minimum Number of Entries" to 2, and the "Max Bin Age" to 10 sec; with this approach, the flowfiles in the bin will wait for 10 sec before being merged, regardless of whether we have 2 flowfiles (2 lists) or not in the bin. The idea is to add enough buffer time for the other flowfiles to hit MergeContent. I'm concerned that the above approach is not robust enough as it has the potential to fail as the 10 sec buffer is arbitrary and may not account for any network-related delays and other variables... Do you have any fool-proof way of solving this issue that doesn't involve the MergeContent approach described above or hardcoding the total number of files in the Wait processor used in the Wait/Notify pattern?
... View more
Labels:
- Labels:
-
Apache NiFi