- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Concurrent task and Max Timer Driven Thread Count
- Labels:
-
Apache NiFi
Created ‎04-10-2025 11:33 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
We are in the process of fine tuning the performance of our NiFi nodes.
1. Regarding "Max Timer Driven Thread Count" the default value is set as 10.
Is this value okay given that we have 4 core CPU ?
2. For "Concurrent task" how do we figure the optimal value ?
Also. does this value has to be same for all the Processors in the flow ?
Thanks
Created ‎04-11-2025 07:03 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@nifier
NiFi is flow based programming, so tuning is directly related to the dataflows you build and the volumes of data you process. Optimal values come from testing your "program" dataflow. Aside from NiFi settings in NiFi you have to consider any other program running on the same server as NiFi as they will also consume CPU resources.
"Max Timer Driven Thread Count":
This is the thread pool for all your timer driven/cron driven components you add to your NiFi canvas. The general guidance here is start with 4 X the number of cores. So for you that would be 16. Then you'll need to test/monitor your server CPU load average while your dataflow(s) aer running under expected loads. If your load average is very close to or exceeds your core count, you'll need to back off on the size of your thread pool.
"Concurrent task"
Configurable setting on processor components that allow multiple concurrent executions of the processor. This works in conjunction with the runs schedule set on the processor. When a a processor is "scheduled" the processor will need a thread from the thread pool. Assuming concurrent task more then default 1, If there is still "work" to do at next scheduled run time and the previous concurrent task is still executing or pending execution, the additional concurrent task will allow another concurrent execution to get scheduled. Scheduled i s different from execution. You can have a bunch of scheduled tasks waiting for an available thread from the thread pool to execute. With most processors, execution is milliseconds, so working through the pool of scheduled processors is fast and efficient. The general guidance with "concurrent tasks" is start with default of 1 concurrent task, monitor your dataflows and adjust in increments of 1 only where is needed. Dataflow developers tend to make the mistake of setting some larger value from the start which is a bad idea. You'll want to look at your dataflows under load and see at which processor furthest down your dataflow path is developing an ever increasing backlog of FlowFiles on its inbound connection. A growing backlog on a connection will eventually trigger backpressure controls to kick in (connection turns red). Once back pressure kicks in the upstream processor feeding that connection will no longer we allowed to schedule until the downstream connection backpressure is lifted. So this a processor that is blocked will start queuing FlowFile upstream from it. This can lead to backpressure all the way to start of the dataflow. So do NOT simply adjust concurrent task on all these processors, instead only increase concurrent task on the one furthest down the dataflow to unblock back pressure there which will naturally allow upstream to get scheduled again. The dangers of of setting large concurrent task values is you end up with a lot more scheduled tasks all waiting for CPU time. If you set concurrent task high on CPU intensive processor, those processor may take all your CPU preventing other processors from getting an opportunity to execute for long periods of time.
NOTE: The embedded documentation for each NiFi processor has a resource consideration section that will highlight if the processor has MEMORY or CPU resource considerations.
Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.
Thank you,
Matt
Created ‎04-11-2025 07:03 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@nifier
NiFi is flow based programming, so tuning is directly related to the dataflows you build and the volumes of data you process. Optimal values come from testing your "program" dataflow. Aside from NiFi settings in NiFi you have to consider any other program running on the same server as NiFi as they will also consume CPU resources.
"Max Timer Driven Thread Count":
This is the thread pool for all your timer driven/cron driven components you add to your NiFi canvas. The general guidance here is start with 4 X the number of cores. So for you that would be 16. Then you'll need to test/monitor your server CPU load average while your dataflow(s) aer running under expected loads. If your load average is very close to or exceeds your core count, you'll need to back off on the size of your thread pool.
"Concurrent task"
Configurable setting on processor components that allow multiple concurrent executions of the processor. This works in conjunction with the runs schedule set on the processor. When a a processor is "scheduled" the processor will need a thread from the thread pool. Assuming concurrent task more then default 1, If there is still "work" to do at next scheduled run time and the previous concurrent task is still executing or pending execution, the additional concurrent task will allow another concurrent execution to get scheduled. Scheduled i s different from execution. You can have a bunch of scheduled tasks waiting for an available thread from the thread pool to execute. With most processors, execution is milliseconds, so working through the pool of scheduled processors is fast and efficient. The general guidance with "concurrent tasks" is start with default of 1 concurrent task, monitor your dataflows and adjust in increments of 1 only where is needed. Dataflow developers tend to make the mistake of setting some larger value from the start which is a bad idea. You'll want to look at your dataflows under load and see at which processor furthest down your dataflow path is developing an ever increasing backlog of FlowFiles on its inbound connection. A growing backlog on a connection will eventually trigger backpressure controls to kick in (connection turns red). Once back pressure kicks in the upstream processor feeding that connection will no longer we allowed to schedule until the downstream connection backpressure is lifted. So this a processor that is blocked will start queuing FlowFile upstream from it. This can lead to backpressure all the way to start of the dataflow. So do NOT simply adjust concurrent task on all these processors, instead only increase concurrent task on the one furthest down the dataflow to unblock back pressure there which will naturally allow upstream to get scheduled again. The dangers of of setting large concurrent task values is you end up with a lot more scheduled tasks all waiting for CPU time. If you set concurrent task high on CPU intensive processor, those processor may take all your CPU preventing other processors from getting an opportunity to execute for long periods of time.
NOTE: The embedded documentation for each NiFi processor has a resource consideration section that will highlight if the processor has MEMORY or CPU resource considerations.
Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.
Thank you,
Matt
Created ‎04-14-2025 08:12 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks @MattWho for your response.
