Support Questions

Find answers, ask questions, and share your expertise

Why is the total size of a NiFi queue increasing while the number of flowfiles is decreasing?

avatar

Hi

I have observed a confusing pattern over the last few days to do with the amount of data queued on a connection.

Every day, at the start of the day, we have about 4 million flow files in a particular queue, with the total data shown on the queue as approx 70 GB. As the day goes on the number of flow files in the queue reduces as the queue is processed, but the total size of the queue rises, to over 100GB (it eventually starts dropping again, though).

This behaviour is not what I expect and I can't find anything about it in the user guide. My working theory is that the total might also include archived content (we have retention set to 12 hours) and/or content claims. However I can't find confirmation of this.

Is anyone able to shed some light on this behaviour for me?

Thanks

Richard

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Richard Corfield

The queue stats on a connection will reflect the number of FlowFiles and cumulative size of those queued FlowFile's content. It does not include any archived content. I am not aware of your throughput rates or the variations in FlowFile Content sizes in your dataflow that may explain what you are seeing.

A NiFi Content claim cannot be archived until there are no active FlowFiles anywhere in your dataflow pointing to that claim.

Perhaps some screenshots will help make sure we are talking about the same thing when you say "queue".

Thank you,

Matt

View solution in original post

2 REPLIES 2

avatar
Master Mentor

@Richard Corfield

The queue stats on a connection will reflect the number of FlowFiles and cumulative size of those queued FlowFile's content. It does not include any archived content. I am not aware of your throughput rates or the variations in FlowFile Content sizes in your dataflow that may explain what you are seeing.

A NiFi Content claim cannot be archived until there are no active FlowFiles anywhere in your dataflow pointing to that claim.

Perhaps some screenshots will help make sure we are talking about the same thing when you say "queue".

Thank you,

Matt

avatar

Hi @Matt Clarke

Thanks for taking the time to answer my question and confirm that the size of the queued content doesn't include any archived content.

I've taken a closer look at the status history of the connection (queue) over the past 24 hours and I can see that the nature of the data flowing in varies depending on the time of day. It seems that earlier in the day a large number of small flow files pass into the queue, but they can be processed rapidly. As the day goes on we start to see flow files of a much larger size. I think this is the explanation as to why the number of flow files decreases but the size of the data increases. This is something I hadn't expected until I looked at the status history closely.

Thanks again for helping me get to the bottom of this!

Richard