Support Questions
Find answers, ask questions, and share your expertise

Run Schedule - New task execution while previous is still running

Solved Go to solution

Run Schedule - New task execution while previous is still running

New Contributor

Hello,

 

I have a processor scheduling setup as below -

 

Run Schedule - 30 sec

Run Duration  - 0 ms

Concurrent Tasks - 1

 

In this case, assuming that each task takes more than 30 secs to complete once it starts, can 2 (or more than 2) processor tasks run at the same time? 

 

0s - Task1 starts 

30s - Task2 starts (Task 1 is still running)

 

Is there any way to prevent overlapping task runs i,e Task 2 should start at 30s only if Task1 that started at 0s is complete?

 

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Run Schedule - New task execution while previous is still running

Master Guru

@AnkushKoul 

Since the 30 seconds since last execution has past, the processor is available to be immediately scheduled once a thread becomes available.  So second thread would not wait till 60 seconds.  This setting is minimum wait between executions.

Other factors come int play that can affect component execution scheduling. NiFi hands out threads to processors from the Max Timer Driven Thread Count resource pool set via Controller Settings under the global menu in the upper right corner.  Naturally you will have more components on your canvas then the size of this resource pool (which should be set initially to only 2-4 times the number fo cores you have on a single node since setting applies per node).  NiFi will hand these available threads out to processors requesting CPU time to execute.  Most component threads are in the range of milliseconds of execution, bit some can be more resource intensive and take longer to complete.  Before increasing this resource pool, you should monitor the CPU impact/usage with all your dataflows running. Then make small increments if resource exist.

Hope this answers your questions.  If, so please take. moment to accept the answer(s) that helped.
Matt

View solution in original post

5 REPLIES 5
Highlighted

Re: Run Schedule - New task execution while previous is still running

Master Guru

@AnkushKoul 

By only having 1 concurrent task configured, you are affectively forcing that task to complete before the next can execute.
With your RunSchedule set to "30 sec" NiFi will only schedule this component to execute every 30 seconds. So if task1 takes only 20 seconds to complete, task 2 would not get started until 10 seconds later.
If you set RunSchedule to default 0 secs, that tells NiFi to schedule this component to execute as often as possible.  So as soon as task 1 completes task 2 will then execute.

You can think of concurrent tasks as a way to parallelize execution within a single component.  So instead of having two processors you have one with 2 concurrent tasks. Each task gets schedule independent (parallel) of the other concurrent task(s). Each concurrent task will work on different FlowFile(s) from inbound connection(s).   Some components will not support multiple concurrent tasks (the component source code would limit it to 1)

So to me it sounds like you want tasks to kick off as fast as possible one after another.  IN that case leave RunSchedule at 0 secs and concurrent tasks to 1.

If you found this answer addressed your question, please take a moment to accept the answer.

Hope this helps,

Matt

Highlighted

Re: Run Schedule - New task execution while previous is still running

New Contributor
Thanks Matt!

Since with RunSchedule set to "30 sec" NiFi schedules the component to execute every 30 seconds, what happens if task 1 takes say 40 seconds to complete (more than the schedule interval)? When will the 2nd execution happen (Task 2)?
Highlighted

Re: Run Schedule - New task execution while previous is still running

Master Guru

@AnkushKoul 

Since you only have 1 concurrent task configured, while that concurrent task thread is in use, another thread can not be started.  So even with a runs schedule of 0 secs, another task can't start until the thread tied to that concurrent task is released making it possible for another execution to happen.   At 30 secs it will only be allowed to execute again 30 secs later if there is an available concurrent task not in use already on the processor.  Setting 30 seconds can create an artificial delay in your dataflow when tasks takes less than 30 seconds to complete.

Note: While the processor is executing a task you will see a small number displayed in the upper right corner of the processor.

Highlighted

Re: Run Schedule - New task execution while previous is still running

New Contributor

Thanks Matt for your helpful and prompt response!

 

So if I understand you correctly, with a Run schedule of 30 seconds & 1 concurrent task configured if the 1st execution takes takes 40 seconds then the 2nd execution will start only at 60 seconds. The processor will be idle from t-40 sec to t-60 sec. Is this correct?

Highlighted

Re: Run Schedule - New task execution while previous is still running

Master Guru

@AnkushKoul 

Since the 30 seconds since last execution has past, the processor is available to be immediately scheduled once a thread becomes available.  So second thread would not wait till 60 seconds.  This setting is minimum wait between executions.

Other factors come int play that can affect component execution scheduling. NiFi hands out threads to processors from the Max Timer Driven Thread Count resource pool set via Controller Settings under the global menu in the upper right corner.  Naturally you will have more components on your canvas then the size of this resource pool (which should be set initially to only 2-4 times the number fo cores you have on a single node since setting applies per node).  NiFi will hand these available threads out to processors requesting CPU time to execute.  Most component threads are in the range of milliseconds of execution, bit some can be more resource intensive and take longer to complete.  Before increasing this resource pool, you should monitor the CPU impact/usage with all your dataflows running. Then make small increments if resource exist.

Hope this answers your questions.  If, so please take. moment to accept the answer(s) that helped.
Matt

View solution in original post