Support Questions

varun_rathinam · ‎03-04-2020

Hi,

I'm currently using merge_content processor as merge avro files from two sources like kafka_consumer processor and fetchHDFS file. While converting avro file into one with merge content processor yesterday around 680MB trying to convert but processor drop the file and join with new files and i can't able to recover that data also, because content_repository backup i limit. Can you please help me out for this use_case processor how much size can be good or is there any setting needs to modifiy in nifi.properties.

MattWho · ‎03-04-2020

@varun_rathinam

Can you please elaborate on "processor drop the file and join with new files"?

And also "content_repository backup i limit"? <-- Are you referring to the "nifi.content.repository.archive.max.retention.period" and "nifi.content.repository.archive.max.usage.percentage" configuration settings in the nifi.properties file?

Also sharing a screenshot of your current MergeContent processor's configuration along with more details around your use case. What result are you seeing now and what is the desired result?

The MergeContent processor takes multiple FlowFiles from same NiFi node and merges the content of those FlowFiles based on the processor's configuration in to one or more new FlowFile's per node. The processor cannot merge FlowFiles residing on different NiFi nodes in an NiFi cluster into one FlowFile.

FlowFiles from the inbound connection queue are allocated to bins based on the following configuration properties:
Correlation Attribute Name <-- (Optional) when used, only FlowFile with same value in the configured FlowFile attribute will be placed in same bin.

Maximum Number of Entries <-- Maximum number of FlowFiles that can be allocated to a single bin before a new bin is used.

Maximum Group Size <-- (Optional) Maximum cumulative size of the content that can be allocated to a bin

When a "bin" is eligible to be merged is controlled by these configuration properties:

Minimum Number of Entries <-- If at end of thread execution (after all FlowFiles from inbound connection have been allocated to one or more bins) the number of FlowFiles allocated to a bin meets this min and meets configured min group size, the FlowFiles in that bin will be merged.

Minimum Group Size <-- Same as above

Max Bin Age <-- A bin that has not reached or exceeded both above min values will merge once the bin has had FlowFiles in it for this amount of time

Maximum number of Bins <-- If FlowFile have been allocated to every bin and another bin is needed, the oldest bin will be forced to merge to free a bin.

It is possible that one or both min values are never reached if a Max bin setting is reached first. This means that because of max additional FlowFiles can not be allocated to that bin and the only setting that will force that bin to merge is "Max Bin Age" or you run out of free bins.

As far as bin Max values, NiFi really does not care about content size as it streams the merged FlowFiles content in to a new FlowFile and does not hold that content in memory. NiFi can experience Out OF Memory (OOM) conditions if the number of FlowFiles Max is set too high since all the attributes for every FlowFile currently allocated to bin(s) is held in heap memory. NiFi's allocated heap memory is set in the nifi.properties configuration file. So a Max number of entries should be limited to 10000 (but this varies based on memory availability and number and size of attributes on your FlowFiles. You can use multiple MergeContent processors in series (one after another) to merge multiple merged FlowFiles in to even larger merged FlowFiles if desired.

Hope this helps with understanding the MergeContent processor,

Matt

View solution in original post

MattWho · ‎03-04-2020