Support Questions

Find answers, ask questions, and share your expertise

using distribute load processor to connect to merge content processor in parallel

avatar
Expert Contributor

Hi All,

Thanks a lot for help on this community

Since the number of events is huge 15000 events per second, in order to make it efficient and quick I am using distributeload processor and 2 merge content in parallel. However my confusion is, will it change the order of events coming ?2017-10-20-photo-00001717.jpg

3 REPLIES 3

avatar
Master Mentor
@dhieru singh

It may be helpful to understand your entire use case here.

There is no guaranteed order in which FlowFile are merged regardless of whether one MergeContent or Multiple MergeContent processors are used.

With your setup the distributeLoad processor with round robin FlowFiles from its incoming queue to its two outbound connections feeding your individual MergeContent processors. Each of those MergeContent processors will generate its own resulting merged FlowFile. One MergeContent processor with 2 concurrent tasks will perform the same as 2 MergeContent with 1 concurrent task each.

If your goal here is to control heap usage by your mergeContent processors, you may want to use two MergeContent processors in series rather then in parallel.

39909-screen-shot-2017-10-20-at-33616-pm.png

Thank you,

Matt

avatar
Master Mentor

@Shu @dhieru singh

By default their is no guaranteed order in which FlowFiles are pulled from he queue feeding any given processor. This is because NiFi favor performance over order. If you want enforce some sort of order in which FlowFiles are pulled from a inbound queue, you must add a "Prioritizer" to the inbound connection. By default, no prioritizers are added.

39910-screen-shot-2017-10-20-at-34925-pm.png

To apply a prioritizer, simply drag the desired prioritizer(s) to the "Selected Prioritizers" box.

Regardless of strategy used in your DistributeLoad processor (round Robin or next available), There will not be a continuos order to the FlowFiles queued to either MergeContent processor.

Thanks,
Matt

avatar
Master Guru

Hi @dhieru singh

When you are running distributeload processor with next available as distribution strategy

that means if one of the destinations(either 1 or 2) won't accept flowfiles(i.e if they reached max queue..etc) it will transfers those files to next available destinations.

41453-distributeload.png

If you kept distribution strategy as round robin and number of relations as 2

that means only if both of the destinations(1 and 2) are going to accept flowfiles then only the processor distributes the load

if one destination of the queue is full and second destination queue is empty still the processor wont transfer flowfiles to two destination

Because the strategy is round robin it transfers flowfiles only when both destinations accepts the flowfiles.

So if you have configured as Next Available as strategy when one of the destinations is not accepting flowfiles then it changes the order of events coming(all the flowfiles goes to flowfiles accepting destination).

else it wont change any order of events if you got 2 flowfiles then first flowfile goes to one destination and second flowfile goes to second destination or vice versa, no guarantee in order of transferring flowfile.

If you have configured as round robin then the processor evenly distributes the load to both the destinations in addition to that processor checks the destinations are they accepting flowfiles are not, before sending flowfiles to them.

You need to select which strategy will fit for your case now.