Support Questions
Find answers, ask questions, and share your expertise

How to increase concurrency in DataFlow / Nifi?

Super Collaborator

In Dataflow/Nifi, I see that I can set a value for "concurrent tasks" on most processors (I'm specifically using an ExecuteStreamCommand processor in this example).

Suppose I set this value to "10". When the processor starts, it will pull 10 flow files from the queue (assuming there are at least 10) and process them simultaneously (good).

However, suppose 9 of these files finish very quickly, and 1 of these flow files takes significantly longer than the others: it does not appear to pull 9 more from the queue.

Rather, it will run the single instance until the final flow file is completely processed, and THEN pull 10 more from the queue.

Is there a way to tell the processor to be more efficient about this?

In other words: Can I get Nifi to try and keep the number of instances close to 10, rather than just start 10 at once and wait for all of them to finish?


Cloudera Employee

@Zack Riesland Although it may appear that the processors are continually running, and in some cases the idle time may be small enough that for all intents and purposes they are continually running. The reality is, a processor is scheduled and when it is scheduled the "max concurrent tasks" are created and executed for it. Today NiFi does not interleave the scheduling to keep the concurrent tasks close to at at the configured value.

Super Collaborator

That's what I figured. Thanks.