Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

What should be Ideal Run-duration and Run schedule configuration in nifi processors

avatar
Rising Star

Hi Team,

@Matt Clarke,@matt burgess

I am using PUTSQL processor to execute copy command in redshift database.This processor usually processes more than 20 flowfiles which are merged files containing json objects.I have configured it this way:(please see attached image). I want to know have i configured run-duration and run schedule correctly? My nifi canvas has more than 2000 processor and i am facing ui slowness issues.Please suggest how to decide these configurations in order to get high throughput and less resource consumption.

Thanks in advance!!

Sri

91403-putsql-sceduling-confi.png

1 ACCEPTED SOLUTION

avatar
Super Mentor

@sri chaturvedi

- You are only going to benefit from setting run duration to 50ms if the processing of each incoming FlowFile to the putSQL processor is taking fractions of the 50ms duration.

Details on "Run duration" and how it works can be found here:

https://community.hortonworks.com/articles/221807/understanding-nifi-processors-run-duration-functio...

---------

When you set a run duration on a lot of processors, when those threads are executed, they will consume that CPU thread for possibly longer then needed. This means that other processors may end up waiting longer for a thread.

-
Consider this example:

Your putSQL happens to be taking 10 ms to execute the put of a FlowFile. That means that with a 50 ms run duration it would put 5 FlowFiles within the single thread execution. What happens if incoming connection queue only has 1 FlowFile at time of execution? The processor holds that thread for 40 ms longer then needed. That is 40 ms of cpu time not available to another processor.

-

Since there is some time overhead in staring and stopping threads, run duration is very useful when you have a high sustained dataflow. It can actually decrease performance when used in dataflow where there is not high volume of FlowFiles (High volume is relative to the processors designed task.)

-

----------

When it comes to concurrent tasks, this dictates parallel processor execution.

-

Since your canvas has 2000 processors, you need to understand that all these processor cannot execute at the exact same time. There is only so much CPU available and NiFi has a configureable thread pool size. This means that many processors may be just waiting in line for their chance to to get time on the CPU.

-

Details on processor Concurrent task setting recommendations can be found here:
https://community.hortonworks.com/articles/221808/understanding-nifi-max-thread-pools-and-processor....

-

-----------

-

You also mentioned NiFi UI slowness. It may be related to nothing above:
https://community.hortonworks.com/articles/184786/hdfnifi-improving-the-performance-of-your-ui.html

-

Thank you,

Matt

-

If you found this answer addressed your question, please take a moment to login in and click the "ACCEPT" link.

View solution in original post

3 REPLIES 3

avatar
Super Mentor

@sri chaturvedi

- You are only going to benefit from setting run duration to 50ms if the processing of each incoming FlowFile to the putSQL processor is taking fractions of the 50ms duration.

Details on "Run duration" and how it works can be found here:

https://community.hortonworks.com/articles/221807/understanding-nifi-processors-run-duration-functio...

---------

When you set a run duration on a lot of processors, when those threads are executed, they will consume that CPU thread for possibly longer then needed. This means that other processors may end up waiting longer for a thread.

-
Consider this example:

Your putSQL happens to be taking 10 ms to execute the put of a FlowFile. That means that with a 50 ms run duration it would put 5 FlowFiles within the single thread execution. What happens if incoming connection queue only has 1 FlowFile at time of execution? The processor holds that thread for 40 ms longer then needed. That is 40 ms of cpu time not available to another processor.

-

Since there is some time overhead in staring and stopping threads, run duration is very useful when you have a high sustained dataflow. It can actually decrease performance when used in dataflow where there is not high volume of FlowFiles (High volume is relative to the processors designed task.)

-

----------

When it comes to concurrent tasks, this dictates parallel processor execution.

-

Since your canvas has 2000 processors, you need to understand that all these processor cannot execute at the exact same time. There is only so much CPU available and NiFi has a configureable thread pool size. This means that many processors may be just waiting in line for their chance to to get time on the CPU.

-

Details on processor Concurrent task setting recommendations can be found here:
https://community.hortonworks.com/articles/221808/understanding-nifi-max-thread-pools-and-processor....

-

-----------

-

You also mentioned NiFi UI slowness. It may be related to nothing above:
https://community.hortonworks.com/articles/184786/hdfnifi-improving-the-performance-of-your-ui.html

-

Thank you,

Matt

-

If you found this answer addressed your question, please take a moment to login in and click the "ACCEPT" link.

avatar
Rising Star

Thanks a lot Matt for such a detailed and outstanding explanation.Does this mean that 4 concurrent tasks and run duration of 50 ms will make the processor to occupy cpu time for 200 ms that is tasks multiplied by run duration?

Thanks,

Sri

avatar
Super Mentor

@sri chaturvedi

yes, potentially if there are enough inbound FlowFiles to trigger processor to run 4 times concurrently.