Created on 03-03-2020 05:44 AM - last edited on 03-03-2020 03:55 PM by ask_bill_brooks
Hi,
I have an ExecuteStreamCommand processor which executes a Python script. This takes a long time to execute ~ 5 mins. So I increased the number of concurrent tasks from 1, 4 and 8 but this had. no impact on the performance. I have an 8 core Intel i9 Mac machine with 32 GB RAM. I read that typically the number of concurrent tasks is roughly equal to 2 or 4 times the cores. Could you let me know why there is improvement? How can I improve the performance?
Thanks
Ganesh
Created 03-03-2020 01:14 PM
The following statement is not accurate:
"I read that typically the number of concurrent tasks is roughly equal to 2 or 4 times the cores"
The general recommendation is that the "Max Timer Driven Thread Count" is set to 2 to 4 times the number of cores. This setting is all relative to the other process running on your server (or you mac in this case).
The "Max Timer Driven Thread Count" setting establishes the max number of threads that can be handed out to requesting components that want to execute. (This is a soft limit, there are some scenarios where a thread can be obtained even when active threads executing has reached this configured max count).
The "Max Timer Driven Thread Count" is configured under the NiFi Global Menu --> Controller Settings --> General (tab). When you adjust this value, monitor your cpu usage and adjust accordingly.
Keep in mind that adding additional concurrent tasks to your processor will not improve the processing of a single FlowFile. The concurrency allows the processor to work on different FlowFiles pulled from the inbound connection queue concurrently. In the case of the ExecuteStreamCommand processor, the ability to execute the same command concurrently also is dependent of the command you are executing.
A small number will be displayed in the upper right corner of the processor illustrating the number of currently active threads in use by that processor at time of last browser refresh (NiFi browser auto refresh default is every 30 seconds).
Hope this helps,
Matt
Created 03-04-2020 06:11 AM
Mattwho,
Thanks for your comments. After reading your mail, I spent a lot of time thinking. I saw that 8 threads were created, and there was no performance improvement because all the threads were doing the same thing and executing the script on all files that were unpacked. Later your comment that the threads actually operate on the flowfiles, I changed the code so that it accepts the flow files as one of the input and processes using multiple threads.
This improved the performance by 30%. The time taken dropped from 13 mins to 3-4 mins.
So many thanks for your comments, I now understand how to use concurrent tasks.
Created 03-03-2020 01:14 PM
The following statement is not accurate:
"I read that typically the number of concurrent tasks is roughly equal to 2 or 4 times the cores"
The general recommendation is that the "Max Timer Driven Thread Count" is set to 2 to 4 times the number of cores. This setting is all relative to the other process running on your server (or you mac in this case).
The "Max Timer Driven Thread Count" setting establishes the max number of threads that can be handed out to requesting components that want to execute. (This is a soft limit, there are some scenarios where a thread can be obtained even when active threads executing has reached this configured max count).
The "Max Timer Driven Thread Count" is configured under the NiFi Global Menu --> Controller Settings --> General (tab). When you adjust this value, monitor your cpu usage and adjust accordingly.
Keep in mind that adding additional concurrent tasks to your processor will not improve the processing of a single FlowFile. The concurrency allows the processor to work on different FlowFiles pulled from the inbound connection queue concurrently. In the case of the ExecuteStreamCommand processor, the ability to execute the same command concurrently also is dependent of the command you are executing.
A small number will be displayed in the upper right corner of the processor illustrating the number of currently active threads in use by that processor at time of last browser refresh (NiFi browser auto refresh default is every 30 seconds).
Hope this helps,
Matt
Created 03-04-2020 06:11 AM
Mattwho,
Thanks for your comments. After reading your mail, I spent a lot of time thinking. I saw that 8 threads were created, and there was no performance improvement because all the threads were doing the same thing and executing the script on all files that were unpacked. Later your comment that the threads actually operate on the flowfiles, I changed the code so that it accepts the flow files as one of the input and processes using multiple threads.
This improved the performance by 30%. The time taken dropped from 13 mins to 3-4 mins.
So many thanks for your comments, I now understand how to use concurrent tasks.