Created 10-20-2017 05:56 PM
Thanks a lot for help on this community
Since the number of events is huge 15000 events per second, in order to make it efficient and quick I am using distributeload processor and 2 merge content in parallel. However my confusion is, will it change the order of events coming ?2017-10-20-photo-00001717.jpg
Created on 10-20-2017 07:39 PM - edited 08-17-2019 05:36 PM
It may be helpful to understand your entire use case here.
There is no guaranteed order in which FlowFile are merged regardless of whether one MergeContent or Multiple MergeContent processors are used.
With your setup the distributeLoad processor with round robin FlowFiles from its incoming queue to its two outbound connections feeding your individual MergeContent processors. Each of those MergeContent processors will generate its own resulting merged FlowFile. One MergeContent processor with 2 concurrent tasks will perform the same as 2 MergeContent with 1 concurrent task each.
If your goal here is to control heap usage by your mergeContent processors, you may want to use two MergeContent processors in series rather then in parallel.
Created on 10-20-2017 07:53 PM - edited 08-17-2019 05:36 PM
By default their is no guaranteed order in which FlowFiles are pulled from he queue feeding any given processor. This is because NiFi favor performance over order. If you want enforce some sort of order in which FlowFiles are pulled from a inbound queue, you must add a "Prioritizer" to the inbound connection. By default, no prioritizers are added.
To apply a prioritizer, simply drag the desired prioritizer(s) to the "Selected Prioritizers" box.
Regardless of strategy used in your DistributeLoad processor (round Robin or next available), There will not be a continuos order to the FlowFiles queued to either MergeContent processor.
Created on 10-20-2017 07:40 PM - edited 08-17-2019 05:36 PM
When you are running distributeload processor with next available as distribution strategy
that means if one of the destinations(either 1 or 2) won't accept flowfiles(i.e if they reached max queue..etc) it will transfers those files to next available destinations.
If you kept distribution strategy as round robin and number of relations as 2
that means only if both of the destinations(1 and 2) are going to accept flowfiles then only the processor distributes the load
if one destination of the queue is full and second destination queue is empty still the processor wont transfer flowfiles to two destination
Because the strategy is round robin it transfers flowfiles only when both destinations accepts the flowfiles.
So if you have configured as Next Available as strategy when one of the destinations is not accepting flowfiles then it changes the order of events coming(all the flowfiles goes to flowfiles accepting destination).
else it wont change any order of events if you got 2 flowfiles then first flowfile goes to one destination and second flowfile goes to second destination or vice versa, no guarantee in order of transferring flowfile.
If you have configured as round robin then the processor evenly distributes the load to both the destinations in addition to that processor checks the destinations are they accepting flowfiles are not, before sending flowfiles to them.
You need to select which strategy will fit for your case now.