Support Questions

Find answers, ask questions, and share your expertise

How to improve nifi concurrency

avatar
Rising Star

I see an option 'Concurrent Tasks' in Scheduling in Nifi version 2.0. Here are my questions regarding Concurrency.

1. I see this only at indivudual processor level. Can this be set at 'Processor Group' level. In short can I parameterize it.

2. How can I arrive at this number. What are all the fators that decide concurrency. E.g. Memory, average data load etc, Other application in the same box like spark.

1 ACCEPTED SOLUTION

avatar
Master Mentor

@bala krishnan

1. "Concurrent tasks" is nothing new to NiFi. There currently is no capability to set concurrency at the process group level and I am not sure that would be a good idea. I would assume you are looking for a way to set a number of "concurrent tasks" that would then get applied to every processor within a process group? Some processors involve tasks that are more cpu intensive then others. For example: CompressContent processor is cpu intensive. For every concurrent task it i assigned, 100% of cpu core is consumed for each file it compresses/decompresses. adding to many "concurrent tasks" here could have serious impact on the system hosting NiFi. UpdateAttribute processor on the other hand typically has very little CPU impact. One concurrent task here can process batches of FlowFiles very rapidly, so multiple concurrent tasks is usually unnecessary and a waste of server resources.

2. There is no defined algorithm for how many concurrent tasks a processor should receive out of the gate. Concurrent Tasks assignment is done through testing and fine tuning a dataflow using production data samples at production volumes. Evaluating your dataflow for bottlenecks in combination with tracking systems resource loads (CPU, Memory, network and disk I/O) can help tune concurrent task settings appropriately . Its is two often the case where users start off with assigning a high number of concurrent task rather then starting at the bottom. You have to remember that your system has only so much CPU to share. Assigning to many concurrent tasks to a single processor will hinder other processors who are looking for cpu time.

Along with setting "concurrent tasks" on individual processors, there are global maximum timer and event driven thread settings in NiFi (Defaults are 10 and 5 respectively). These control the maximum number of threads NiFi will request from the server that will be used to fulfill the concurrent task request from the NiFi processor components. These global values can be adjusted in "controller settings" (Located via the hamburger menu in the upper right corner of the NiFi UI.) Typical setting here are double to quadruple the number of CPU cores you have on your server. Giving excessive values here doe snot improve performance as those threads just spend more time in CPU wait.

Thanks,

Matt

View solution in original post

3 REPLIES 3

avatar
Master Mentor

@bala krishnan

1. "Concurrent tasks" is nothing new to NiFi. There currently is no capability to set concurrency at the process group level and I am not sure that would be a good idea. I would assume you are looking for a way to set a number of "concurrent tasks" that would then get applied to every processor within a process group? Some processors involve tasks that are more cpu intensive then others. For example: CompressContent processor is cpu intensive. For every concurrent task it i assigned, 100% of cpu core is consumed for each file it compresses/decompresses. adding to many "concurrent tasks" here could have serious impact on the system hosting NiFi. UpdateAttribute processor on the other hand typically has very little CPU impact. One concurrent task here can process batches of FlowFiles very rapidly, so multiple concurrent tasks is usually unnecessary and a waste of server resources.

2. There is no defined algorithm for how many concurrent tasks a processor should receive out of the gate. Concurrent Tasks assignment is done through testing and fine tuning a dataflow using production data samples at production volumes. Evaluating your dataflow for bottlenecks in combination with tracking systems resource loads (CPU, Memory, network and disk I/O) can help tune concurrent task settings appropriately . Its is two often the case where users start off with assigning a high number of concurrent task rather then starting at the bottom. You have to remember that your system has only so much CPU to share. Assigning to many concurrent tasks to a single processor will hinder other processors who are looking for cpu time.

Along with setting "concurrent tasks" on individual processors, there are global maximum timer and event driven thread settings in NiFi (Defaults are 10 and 5 respectively). These control the maximum number of threads NiFi will request from the server that will be used to fulfill the concurrent task request from the NiFi processor components. These global values can be adjusted in "controller settings" (Located via the hamburger menu in the upper right corner of the NiFi UI.) Typical setting here are double to quadruple the number of CPU cores you have on your server. Giving excessive values here doe snot improve performance as those threads just spend more time in CPU wait.

Thanks,

Matt

avatar

Hi @Matt Clarke.. I have a related question regarding concurrency .. I have an issue that I have a dataflow with two connected processors (each with concurrent tasks=1), but when I set the number of threads of the whole instance to 1, the two processors still manage to somehow run concurrently although I expect them to run sequentially .. The first processor takes on average 2.5 seconds per input and the second processors takes on average 4.5 seconds.. I gave it 100 inputs and I was expecting it to finish in around 700 seconds (i.e., sequential execution) but it still manages to finish in 480 seconds which suggests that each processor is using a separate thread and they do not wait on each other. Am I missing something here ?

avatar
Master Mentor

@Tarek Elgamal
Assuming you are referring to settings for "Max Timer Driven Thread count"?

That setting controls the max number of threads that can execute at one time. Does not guarantee any order to the execution of threads. NiFi's controller in the back ground does not operate under this thread pool. Both processors will be scheduled to run based on their configured run schedule. Those concurrent tasks then get stacked in a request queue waiting on one of the threads from that pool to service them. This way, every processor is eventually going to get a chance to run thier code. Also keep in mind that some processors work on batches of FlowFiles while others process one FlowFile per task. Also hard to say that each processed FlowFile will take same amount of time to complete an operation. Really depends on processor and what it is designed to do.

Thanks,

Matt