Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to improve nifi concurrency

avatar
Rising Star

I see an option 'Concurrent Tasks' in Scheduling in Nifi version 2.0. Here are my questions regarding Concurrency.

1. I see this only at indivudual processor level. Can this be set at 'Processor Group' level. In short can I parameterize it.

2. How can I arrive at this number. What are all the fators that decide concurrency. E.g. Memory, average data load etc, Other application in the same box like spark.

1 ACCEPTED SOLUTION

avatar
Master Mentor
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
3 REPLIES 3

avatar
Master Mentor
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar

Hi @Matt Clarke.. I have a related question regarding concurrency .. I have an issue that I have a dataflow with two connected processors (each with concurrent tasks=1), but when I set the number of threads of the whole instance to 1, the two processors still manage to somehow run concurrently although I expect them to run sequentially .. The first processor takes on average 2.5 seconds per input and the second processors takes on average 4.5 seconds.. I gave it 100 inputs and I was expecting it to finish in around 700 seconds (i.e., sequential execution) but it still manages to finish in 480 seconds which suggests that each processor is using a separate thread and they do not wait on each other. Am I missing something here ?

avatar
Master Mentor

@Tarek Elgamal
Assuming you are referring to settings for "Max Timer Driven Thread count"?

That setting controls the max number of threads that can execute at one time. Does not guarantee any order to the execution of threads. NiFi's controller in the back ground does not operate under this thread pool. Both processors will be scheduled to run based on their configured run schedule. Those concurrent tasks then get stacked in a request queue waiting on one of the threads from that pool to service them. This way, every processor is eventually going to get a chance to run thier code. Also keep in mind that some processors work on batches of FlowFiles while others process one FlowFile per task. Also hard to say that each processed FlowFile will take same amount of time to complete an operation. Really depends on processor and what it is designed to do.

Thanks,

Matt