Created 02-20-2018 01:31 PM
Hi
I have observed a confusing pattern over the last few days to do with the amount of data queued on a connection.
Every day, at the start of the day, we have about 4 million flow files in a particular queue, with the total data shown on the queue as approx 70 GB. As the day goes on the number of flow files in the queue reduces as the queue is processed, but the total size of the queue rises, to over 100GB (it eventually starts dropping again, though).
This behaviour is not what I expect and I can't find anything about it in the user guide. My working theory is that the total might also include archived content (we have retention set to 12 hours) and/or content claims. However I can't find confirmation of this.
Is anyone able to shed some light on this behaviour for me?
Thanks
Richard
Created 02-20-2018 07:40 PM
The queue stats on a connection will reflect the number of FlowFiles and cumulative size of those queued FlowFile's content. It does not include any archived content. I am not aware of your throughput rates or the variations in FlowFile Content sizes in your dataflow that may explain what you are seeing.
A NiFi Content claim cannot be archived until there are no active FlowFiles anywhere in your dataflow pointing to that claim.
Perhaps some screenshots will help make sure we are talking about the same thing when you say "queue".
Thank you,
Matt
Created 02-20-2018 07:40 PM
The queue stats on a connection will reflect the number of FlowFiles and cumulative size of those queued FlowFile's content. It does not include any archived content. I am not aware of your throughput rates or the variations in FlowFile Content sizes in your dataflow that may explain what you are seeing.
A NiFi Content claim cannot be archived until there are no active FlowFiles anywhere in your dataflow pointing to that claim.
Perhaps some screenshots will help make sure we are talking about the same thing when you say "queue".
Thank you,
Matt
Created 02-20-2018 08:37 PM
Hi @Matt Clarke
Thanks for taking the time to answer my question and confirm that the size of the queued content doesn't include any archived content.
I've taken a closer look at the status history of the connection (queue) over the past 24 hours and I can see that the nature of the data flowing in varies depending on the time of day. It seems that earlier in the day a large number of small flow files pass into the queue, but they can be processed rapidly. As the day goes on we start to see flow files of a much larger size. I think this is the explanation as to why the number of flow files decreases but the size of the data increases. This is something I hadn't expected until I looked at the status history closely.
Thanks again for helping me get to the bottom of this!
Richard