MergeContentProcessor - Nifi: Is there a way that i can specify a bin should wait for any minimum amount of time at least (min Bin Age). I use a split processor to split incoming flow files , enrich each of the split and finally merges them back to the original flow file. My process of enriching might be delayed so i want to wait till all the splits comes together, i cant use Defragment strategy as i may not have all the splits (i want to reject some splits based on some criteria). Can you please help here?
There is no Min Bin Age property in MergeContent but the processor have Max Bin Age property where you can specify the time then Processor force merges flowfiles that are waiting in the queue.
In Addition If you are doing some filtering on the records to exclude them from the flowfile then use QueryRecord processor.
-Configure/enable the Record reader/writer controller services and add the sql query property as
now processor runs the above sql query and send the results as output flowfile.
By following this way we don't need to use merge content (or) split text processors.
@Shu: Thanks for the response. I want to wait my mergecontent processor for sometime before it merges into a single flow file when using Bin packing algorithm (Now it seems like files are merged in less than a second) . Is there a way i can do that? I use RouteOnAttribute processor to filter so that part is okay for me. What is the exact criteria for merging in Bin Packing algorithm (Min Entry: 1, max entry: 1000, Max Bin Age 10 seconds,number of bins 500, yet i have seen files (splits less than 5) being merged under one second)
Change your MergeContent Processor Min Entry to more than 1 if we keep 1 then processor will send as soon as 1 flowfile will reach to the queue(it satisfied min num of entires) and Max Bin Age will not be considered at this time.
With the above configs processor will wait until Min Number of entries reaches to 10k (or) Max Bin Age will be considered if processor not able to reach 10 k in 1 min.
Then merge content processor force merges all flowfiles that are waiting on the queue.
Don't keep too many Bins if not necessary as bins are kept in NiFi memory.
@ShuWow.Thats a good idea to make the processor wait, thank you. I am expecting 500 flowfiles/second tops, thats is why i have kept 500 bins. If i am setting max bin age as 10 seconds,it would cost me 5000 bins right?
I don't think so, Bins are assigned to each unique attribute(if correlation attribute name) specified in your case you haven't specified any attribute name in this property.
We need to keep Maximum number of Bins more than >1 jira addressing this issue.
Refer to this link for more details how bins are assigned in merge content processor.