Created 03-28-2018 04:03 PM
My back pressure threshold for number of objects is default 10,000 while the data size is 1G. However, I saw the number of objects in the queue went far beyond 10.1000 though the data size was beyond 1G. Does this mean back pressure is applied only if both number of objects and data size reach the thresholds respectively?
Another question. Suppose a kafka consumer processor is connected to a split processor. If the split processor reaches back pressure thresholds, will the kafka consumer stop consuming messages?
Thanks,
Mark
Created 07-16-2018 07:37 PM
-
The configurable backpressure thresholds (object and size) on a connection are soft limits.
So a backpressure object threshold of 10,000 (default) means that the NiFi controller will not schedule the feeding processor to run if the object count has reached or exceeded 10,000 queued FlowFiles.
-
So lets say there are 9,999 queued objects. NiFi would allow the preceding processor to get scheduled. When that processor executes it code it will execute with no regard for destination queue sizes. That means if the execution of that processor thread results in 1,000,000 FlowFiles being processed in a single execution, all 1,000,000 FlowFiles will be added to that downstream connection queue. Now that the queue has 1,009,999 FlowFiles queued, the preceding processor will not be scheduled again until that queue drops below 10,000 again.
-
Same soft limit concept applies for the back pressure size threshold setting as well on a connection.
-
Thank you,
Matt
-
When an "Answer" addresses/solves your question, please select "Accept" beneath that answer. This encourages user participation in this forum.
Created 03-28-2018 07:42 PM
The threshold is 10,000 but it gives you a little leeway.
Created 07-16-2018 07:15 PM
We are seeing the same thing specifically with the ConsumeAzureEventHub processor. It seems to completely ignore either the size or number settings. A simple PutFile works, but not this one. We have seen it go to >1,000,000 and over a 1 gig in size.
Created 07-16-2018 07:37 PM
-
The configurable backpressure thresholds (object and size) on a connection are soft limits.
So a backpressure object threshold of 10,000 (default) means that the NiFi controller will not schedule the feeding processor to run if the object count has reached or exceeded 10,000 queued FlowFiles.
-
So lets say there are 9,999 queued objects. NiFi would allow the preceding processor to get scheduled. When that processor executes it code it will execute with no regard for destination queue sizes. That means if the execution of that processor thread results in 1,000,000 FlowFiles being processed in a single execution, all 1,000,000 FlowFiles will be added to that downstream connection queue. Now that the queue has 1,009,999 FlowFiles queued, the preceding processor will not be scheduled again until that queue drops below 10,000 again.
-
Same soft limit concept applies for the back pressure size threshold setting as well on a connection.
-
Thank you,
Matt
-
When an "Answer" addresses/solves your question, please select "Accept" beneath that answer. This encourages user participation in this forum.
Created 07-16-2018 07:45 PM
Thanks for the reply, Matt. However, I am not sure you read my post entirely. The ConsumeAzureEventHub does not seem to EVER stop pumping more messages in to the queue. Could it be written incorrectly?
We have it set to 10,000 and have never seen it stop rising.
Created 07-16-2018 07:59 PM
The ConsumeAzureEventHub processor was developed in the Apache community. From your description I did not realize it was growing non stop. It sounds like it was written in such away that is gets a thread upon initial execution and never releases that thread. If that is the case it will continue to produce FlowFiles to the output queue regardless of configured back pressure thresholds.
-
My suggestion would be to open an Apache Jira against that processor explaining the issue it is having and sharing your processor configuration.
-
Thank you,
Matt
Created 07-16-2018 08:04 PM
Thanks Matt! I was wondering if something like that was possible. I think the way Azure Event hub software works it would be impossible to change this. They manage the threads and they are long running, so, as you say, they never release the first thread that starts them up. It would take a rewrite of the Ehub listener logic to fix it.
Thanks for clearing this up!
Created 10-09-2018 08:59 AM
Hello Matt,
The source processor will be stopped when the size / number is reached.
But Do you know when does the source processor is restarted ??
Best regards
Abdou
Created 10-09-2018 01:03 PM
-
"Stopped" is probably not the correct word to use here. A processor that is started then executes based on the configured "run schedule". When Back pressure is being applied to a processor by one of the processors outgoing connections, the processor will no longer be scheduled to run. It is still started. As soon as back pressure is no longer being applied, the processor will begin executing again based on run schedule.
-
Thanks,
Matt
Created 10-09-2018 02:17 PM
Hello @Matt Clarke,
Thanks for your reply.
In the meanstime I was debuging NiFi .....
And you are right, the processor will be executing as soon as the number/size of file is below the under the threshold ( ie the queue is not full).
The Back Pressure is applied in NIFI 1.5 here (this piece of code made me understand the stuff )
private boolean isBackPressureEngaged() { return procNode.getIncomingConnections().stream() .filter(con -> con.getSource() == procNode) .map(con -> con.getFlowFileQueue()) .anyMatch(queue -> queue.isFull()); }
Thanks for your quick response.
Best Regards
Abdou