Support Questions

Find answers, ask questions, and share your expertise

Keep MiNiFi Backpressure Data to Disk

avatar
Explorer

I'm sending data from MiNiFi to the NiFi Cluster. Is there any way to keep MiNiFi backpressure data in the disk rather than the memory? I've also disabled the Swap feature for increasing performance. I'm worried about consuming lots of the memory by MiNiFi.

 

Nifi version: 1.8.0
MiNiFi version: 0.5.0 java

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Arash 

A FlowFile consists of to parts:
- FlowFile content - Content resides on disk in the content repository and not in heap memory.  Some components used may need to load content in to memory to perform the function of that component.
- FlowFile Attributes/metadata - FlowFiles actively queued in a connection will have their attributes/metadata held in heap memory.  Swapping is the only mechanism that can move this FlowFile metadata/attribute data out of heap to swap on disk.

It is important to remember that MiNiFi will only start swapping FlowFiles to disk once the swap threshold  per connection reaches the configured value(default 20,000).  Swap files are created in batches of 10,000.  So in a smoothly running flow there should be very little, if any, swapping of FlowFile attributes/metadata happening.  This should only be happening at times of data bursts.

To keep heap usage down, limit the size of your connection queue backpressure object threshold.  The default is 10000 which means a connection would never accumulate enough FlowFiles to trigger a swap file anyway normally (backpressure is a soft limit, so if a source processor is allowed to execute because the downstream connection is not applying backpressure yet and that source processor execution results in 30,000 FlowFiles being created, then all 30,000 are placed on downstream connection which would result in swap files being created).

When you are building your dataflow via NiFi that you will use on your MiNiFi agent, be mindful of above and and look at the embedded documentation for the components you will be using in that dataflow.  The embedded docs include resource consideration section under each component if there are known impacts on heap memory or cpu.  Processors that merge or split FlowFiles commonly used can have an impact on heap memory if not configured wisely.

Hope this helps remove some concern and provide useful insight.
If you found this helpful, please take a moment to login and click accept on this solution.
Matt

View solution in original post

1 REPLY 1

avatar
Master Mentor

@Arash 

A FlowFile consists of to parts:
- FlowFile content - Content resides on disk in the content repository and not in heap memory.  Some components used may need to load content in to memory to perform the function of that component.
- FlowFile Attributes/metadata - FlowFiles actively queued in a connection will have their attributes/metadata held in heap memory.  Swapping is the only mechanism that can move this FlowFile metadata/attribute data out of heap to swap on disk.

It is important to remember that MiNiFi will only start swapping FlowFiles to disk once the swap threshold  per connection reaches the configured value(default 20,000).  Swap files are created in batches of 10,000.  So in a smoothly running flow there should be very little, if any, swapping of FlowFile attributes/metadata happening.  This should only be happening at times of data bursts.

To keep heap usage down, limit the size of your connection queue backpressure object threshold.  The default is 10000 which means a connection would never accumulate enough FlowFiles to trigger a swap file anyway normally (backpressure is a soft limit, so if a source processor is allowed to execute because the downstream connection is not applying backpressure yet and that source processor execution results in 30,000 FlowFiles being created, then all 30,000 are placed on downstream connection which would result in swap files being created).

When you are building your dataflow via NiFi that you will use on your MiNiFi agent, be mindful of above and and look at the embedded documentation for the components you will be using in that dataflow.  The embedded docs include resource consideration section under each component if there are known impacts on heap memory or cpu.  Processors that merge or split FlowFiles commonly used can have an impact on heap memory if not configured wisely.

Hope this helps remove some concern and provide useful insight.
If you found this helpful, please take a moment to login and click accept on this solution.
Matt