Created 05-30-2016 01:23 PM
Created 05-30-2016 01:29 PM
If I am not mistaken, this property means that attributes linked to the related flow files will be swapped to disk instead of keeping the information in memory. In this case, it happens when there are more than queued 20000 flow files.
Hope this helps.
Created 05-30-2016 01:42 PM
Hi @Pierre Villard,
Thanks for inputs, for example i have a GetFile processor and PutFile processor, Lets consider GetFile having 2lakh files, i started both at one time,for some reason after processing 100 files PutFile gone to error state it is not taking any inputs. So GetFile writing the continuously write data into queue. here how long will keep hold the data in a queue. is there any loss of data when it cross the 20000 flow files, let's consider my PutFile error is resolved after one week. still data in queue???? Thanks for giving response.
Created 05-30-2016 01:50 PM
Swap is just about information stored in JVM memory. Flow files will be stored in the queue as long as you have available space on disks. However you may want to have a look at back pressure features:
You can apply backpressure on a specific connection, it will cause the processor that is the source of the connection to stop being scheduled to run until the queue clears out. However data will still queue up in that processor's incoming connections. So to force backpressure to propagate all the way back to the source, you would need to configure each of the connections in the flow to have backpressure applied.
Created 05-30-2016 02:04 PM
Thanks and sorry for asking these many questions. how do i setup the backpressure on source processor???
Created 05-30-2016 02:14 PM
When you have a connection between two processors, you can right-click on it to configure it. Then, in settings tab, you can configure back pressure:
https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#settings
Created 05-30-2016 02:22 PM
Hello,
Pierre is correct about what the swap threshold is used for. For speed and efficiency, NiFi hold all the FlowFile attributes associated with each FlowFile in JVM memory. In case where queue develop this can result in considerable memory usage. So NiFi has established a default swapping threshold of 20,000 FlowFiles per connection. What this means is that once a queue reaches 30,000 FlowFiles, 10,000 will be swapped out to disk. The 20,000 that are the next to be worked on based on the connection prioritization scheme are left in memory. NiFi will continue to swap 10,000 FlowFiles at a time to disk as the queue continues to grow. Keep in mind that files swapped out must be swapped back in before they can be worked on by the destination processor. NiFi does not throw away any data unless expiration has been set on connections. As long as you have sufficient disk space to hold the data, it will continue to queue. I suggest reading through this article if you have not already: ( https://community.hortonworks.com/content/kbentry/7882/hdfnifi-best-practices-for-setting-up-a-high-... ) That being said, there are dangers to allowing your disk to fill to 100% capacity, so as Pierre mentioned you should be setting backpressure throughout your dataflow to trigger upstream processor to stop and eventually stopping pulling in new data. Thanks,
Matt
Created 05-31-2016 05:04 AM
Hi clark & Pierre,
let me know the swap in swap out location of the queue. i mean where it will store the flowfiles at the time of queue have more than threshold value?
Created 05-31-2016 06:17 AM
Be careful: swap is not for flow files, it is for flow files attributes (it is not flow files content). Flow files content is written in the content repository, have a look here: https://nifi.apache.org/docs/nifi-docs/html/overview.html#nifi-architecture. Regarding swap, it depends of the implementation, but the default is to swap in the Flow File repository.