Support Questions

Find answers, ask questions, and share your expertise

nifi back pressure threshholds

avatar
Explorer

My back pressure threshold for number of objects is default 10,000 while the data size is 1G. However, I saw the number of objects in the queue went far beyond 10.1000 though the data size was beyond 1G. Does this mean back pressure is applied only if both number of objects and data size reach the thresholds respectively?

Another question. Suppose a kafka consumer processor is connected to a split processor. If the split processor reaches back pressure thresholds, will the kafka consumer stop consuming messages?

Thanks,

Mark

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Mark Lin @mark juchems

-

The configurable backpressure thresholds (object and size) on a connection are soft limits.

So a backpressure object threshold of 10,000 (default) means that the NiFi controller will not schedule the feeding processor to run if the object count has reached or exceeded 10,000 queued FlowFiles.

-

So lets say there are 9,999 queued objects. NiFi would allow the preceding processor to get scheduled. When that processor executes it code it will execute with no regard for destination queue sizes. That means if the execution of that processor thread results in 1,000,000 FlowFiles being processed in a single execution, all 1,000,000 FlowFiles will be added to that downstream connection queue. Now that the queue has 1,009,999 FlowFiles queued, the preceding processor will not be scheduled again until that queue drops below 10,000 again.

-

Same soft limit concept applies for the back pressure size threshold setting as well on a connection.

-

Thank you,

Matt

-

When an "Answer" addresses/solves your question, please select "Accept" beneath that answer. This encourages user participation in this forum.

View solution in original post

11 REPLIES 11

avatar
Master Guru

avatar

We are seeing the same thing specifically with the ConsumeAzureEventHub processor. It seems to completely ignore either the size or number settings. A simple PutFile works, but not this one. We have seen it go to >1,000,000 and over a 1 gig in size.

avatar
Master Mentor

@Mark Lin @mark juchems

-

The configurable backpressure thresholds (object and size) on a connection are soft limits.

So a backpressure object threshold of 10,000 (default) means that the NiFi controller will not schedule the feeding processor to run if the object count has reached or exceeded 10,000 queued FlowFiles.

-

So lets say there are 9,999 queued objects. NiFi would allow the preceding processor to get scheduled. When that processor executes it code it will execute with no regard for destination queue sizes. That means if the execution of that processor thread results in 1,000,000 FlowFiles being processed in a single execution, all 1,000,000 FlowFiles will be added to that downstream connection queue. Now that the queue has 1,009,999 FlowFiles queued, the preceding processor will not be scheduled again until that queue drops below 10,000 again.

-

Same soft limit concept applies for the back pressure size threshold setting as well on a connection.

-

Thank you,

Matt

-

When an "Answer" addresses/solves your question, please select "Accept" beneath that answer. This encourages user participation in this forum.

avatar

Thanks for the reply, Matt. However, I am not sure you read my post entirely. The ConsumeAzureEventHub does not seem to EVER stop pumping more messages in to the queue. Could it be written incorrectly?

We have it set to 10,000 and have never seen it stop rising.

avatar
Master Mentor

@mark juchems

The ConsumeAzureEventHub processor was developed in the Apache community. From your description I did not realize it was growing non stop. It sounds like it was written in such away that is gets a thread upon initial execution and never releases that thread. If that is the case it will continue to produce FlowFiles to the output queue regardless of configured back pressure thresholds.

-

My suggestion would be to open an Apache Jira against that processor explaining the issue it is having and sharing your processor configuration.

-

Thank you,

Matt

avatar

Thanks Matt! I was wondering if something like that was possible. I think the way Azure Event hub software works it would be impossible to change this. They manage the threads and they are long running, so, as you say, they never release the first thread that starts them up. It would take a rewrite of the Ehub listener logic to fix it.

Thanks for clearing this up!

avatar
Explorer

Hello Matt,

The source processor will be stopped when the size / number is reached.

But Do you know when does the source processor is restarted ??

Best regards

Abdou

avatar
Master Mentor

@Abdou B.

-

"Stopped" is probably not the correct word to use here. A processor that is started then executes based on the configured "run schedule". When Back pressure is being applied to a processor by one of the processors outgoing connections, the processor will no longer be scheduled to run. It is still started. As soon as back pressure is no longer being applied, the processor will begin executing again based on run schedule.

-

Thanks,

Matt

avatar
Explorer

Hello @Matt Clarke,

Thanks for your reply.

In the meanstime I was debuging NiFi .....

And you are right, the processor will be executing as soon as the number/size of file is below the under the threshold ( ie the queue is not full).

The Back Pressure is applied in NIFI 1.5 here (this piece of code made me understand the stuff )

private boolean isBackPressureEngaged() {
    return procNode.getIncomingConnections().stream()
        .filter(con -> con.getSource() == procNode)
        .map(con -> con.getFlowFileQueue())
        .anyMatch(queue -> queue.isFull());
}

Thanks for your quick response.

Best Regards

Abdou