Community Articles

Find and share helpful community-sourced technical articles.
avatar

The MergeContent processor certainly can be challenging to understand its inner workings. If you are running into the nifi.queue.swap.threshold limit of MergeContent as described in NIFI-697, then you should increase that value in the nifi.properties file and restart your NiFi process. A multiple of 10000 is recommended. You will also likely have to increase your Java memory settings in bootstrap.conf.

MergeContent works like this.

When a FlowFile arrives at MergeContent, it is assigned to a bin based on Merge Strategy and Correlation Attribute Name. Maximum Number of Bins controls resource usage such that if all bins have FlowFiles in them and another FlowFile arrives that doesn't fit into one of those bins, then the oldest bin is automatically marked as complete, and the new FlowFile starts its own new bin.

A bin will be complete once (number of files in bin) >= Minimum Number of Entries AND (number of bytes in bin) >= Minimum Group Size OR the bin has existed for Max Bin Age. Then the FlowFiles in the bin are merged and sent to an output relationship.

The Maximum Number of Entries and Maximum Group Size can prevent bins from becoming "over full". For example, when Maximum Group Size is 1 GB and a bin currently has 900 MB in it, then a flowfile arrives that is 200 MB in size, the 200 MB FlowFile will not make that bin "over full" but instead will get a bin all to itself.

Credit goes to Michael Moser from the NiFi user list.

2,485 Views
Comments
avatar
Expert Contributor

@Andrew Grande - the good news is that in the next version NiFi and HDF, the swapping has been refactored quite, so the comment about having to adjust the nifi.queue.swap.threshold property will no longer be needed.