Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

nifi back pressure threshholds

avatar
Explorer

My back pressure threshold for number of objects is default 10,000 while the data size is 1G. However, I saw the number of objects in the queue went far beyond 10.1000 though the data size was beyond 1G. Does this mean back pressure is applied only if both number of objects and data size reach the thresholds respectively?

Another question. Suppose a kafka consumer processor is connected to a split processor. If the split processor reaches back pressure thresholds, will the kafka consumer stop consuming messages?

Thanks,

Mark

1 ACCEPTED SOLUTION

avatar
Super Mentor

@Mark Lin @mark juchems

-

The configurable backpressure thresholds (object and size) on a connection are soft limits.

So a backpressure object threshold of 10,000 (default) means that the NiFi controller will not schedule the feeding processor to run if the object count has reached or exceeded 10,000 queued FlowFiles.

-

So lets say there are 9,999 queued objects. NiFi would allow the preceding processor to get scheduled. When that processor executes it code it will execute with no regard for destination queue sizes. That means if the execution of that processor thread results in 1,000,000 FlowFiles being processed in a single execution, all 1,000,000 FlowFiles will be added to that downstream connection queue. Now that the queue has 1,009,999 FlowFiles queued, the preceding processor will not be scheduled again until that queue drops below 10,000 again.

-

Same soft limit concept applies for the back pressure size threshold setting as well on a connection.

-

Thank you,

Matt

-

When an "Answer" addresses/solves your question, please select "Accept" beneath that answer. This encourages user participation in this forum.

View solution in original post

11 REPLIES 11

avatar
New Contributor

How does the backpressure apply to nodes in a cluster? If the threshold is 10 000 object does that mean that the queue in total cannot reach 10 000 objects or does it apply to each node individually?

 

Also the same question for the ControlRate processor, does it rate limit based on the node statistics or the total stats across the cluster?

avatar

Marfill,

If backpressure is applied when the total number of flowfiles in a given queue has reached (#nodes * the limit per node) for example if you have a cluster of 3 nodes and the threshold is set to 10,000 then the backpressure will be applied when total # of flow files = 30,000 and so on.

Regarding the Control Rate I believe its done per node statistics, for example if you have a control rate that allows 1 flow file per hour and the control rate processor is set part of load balancing on 3 nodes cluster, let say you receive total of 3 files for the first time  one on each node then the 3 will be get processed immediately.