Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Increasing concurrent tasks not improving performance

Solved Go to solution

Increasing concurrent tasks not improving performance

Explorer

Hi,

   I have an ExecuteStreamCommand processor which executes a Python script. This takes a long time to execute ~ 5 mins. So I increased the number of concurrent tasks from 1, 4 and 8 but this had. no impact on the performance. I have an 8 core Intel i9 Mac machine with 32 GB RAM. I read that typically the number of concurrent tasks is roughly equal to  2 or 4 times the cores. Could you let me know why there is improvement? How can I improve the performance?

 

Thanks

Ganesh

2 ACCEPTED SOLUTIONS

Accepted Solutions
Highlighted

Re: Increasing concurrent tasks not improving performance

Master Guru

@TVGanesh 

 

The following statement is not accurate:
"I read that typically the number of concurrent tasks is roughly equal to  2 or 4 times the cores"

 

The general recommendation is that the "Max Timer Driven Thread Count" is set to 2 to 4 times the number of cores.  This setting is all relative to the other process running on your server (or you mac in this case).

The "Max Timer Driven Thread Count" setting establishes the max number of threads that can be handed out to requesting components that want to execute. (This is a soft limit, there are some scenarios where a thread can be obtained even when active threads executing has reached this configured max count).

The "Max Timer Driven Thread Count" is configured under the NiFi Global Menu --> Controller Settings --> General (tab).  When you adjust this value, monitor your cpu usage and adjust accordingly.

Keep in mind that adding additional concurrent tasks to your processor will not improve the processing of a single FlowFile.  The concurrency allows the processor to work on different FlowFiles pulled from the inbound connection queue concurrently.  In the case of the ExecuteStreamCommand processor, the ability to execute the same command concurrently also is dependent of the command you are executing. 

 

A small number will be displayed in the upper right corner of the processor illustrating the number of currently active threads in use by that processor at time of last browser refresh (NiFi browser auto refresh default is every 30 seconds).

Hope this helps,
Matt

View solution in original post

Highlighted

Re: Increasing concurrent tasks not improving performance

Explorer

Mattwho,

            Thanks for your comments. After reading your mail, I spent a lot of time thinking. I saw that 8 threads were created, and there was no performance improvement because all the threads were doing the same thing and executing the script on all files that were unpacked. Later your comment that the threads actually operate on the flowfiles, I changed the code so that it accepts the flow files as  one of the input and processes using multiple threads. 

This improved the performance by 30%. The time taken dropped from 13 mins to 3-4 mins. 

 

So many thanks for your comments, I now understand how to use concurrent tasks.

View solution in original post

2 REPLIES 2
Highlighted

Re: Increasing concurrent tasks not improving performance

Master Guru

@TVGanesh 

 

The following statement is not accurate:
"I read that typically the number of concurrent tasks is roughly equal to  2 or 4 times the cores"

 

The general recommendation is that the "Max Timer Driven Thread Count" is set to 2 to 4 times the number of cores.  This setting is all relative to the other process running on your server (or you mac in this case).

The "Max Timer Driven Thread Count" setting establishes the max number of threads that can be handed out to requesting components that want to execute. (This is a soft limit, there are some scenarios where a thread can be obtained even when active threads executing has reached this configured max count).

The "Max Timer Driven Thread Count" is configured under the NiFi Global Menu --> Controller Settings --> General (tab).  When you adjust this value, monitor your cpu usage and adjust accordingly.

Keep in mind that adding additional concurrent tasks to your processor will not improve the processing of a single FlowFile.  The concurrency allows the processor to work on different FlowFiles pulled from the inbound connection queue concurrently.  In the case of the ExecuteStreamCommand processor, the ability to execute the same command concurrently also is dependent of the command you are executing. 

 

A small number will be displayed in the upper right corner of the processor illustrating the number of currently active threads in use by that processor at time of last browser refresh (NiFi browser auto refresh default is every 30 seconds).

Hope this helps,
Matt

View solution in original post

Highlighted

Re: Increasing concurrent tasks not improving performance

Explorer

Mattwho,

            Thanks for your comments. After reading your mail, I spent a lot of time thinking. I saw that 8 threads were created, and there was no performance improvement because all the threads were doing the same thing and executing the script on all files that were unpacked. Later your comment that the threads actually operate on the flowfiles, I changed the code so that it accepts the flow files as  one of the input and processes using multiple threads. 

This improved the performance by 30%. The time taken dropped from 13 mins to 3-4 mins. 

 

So many thanks for your comments, I now understand how to use concurrent tasks.

View solution in original post

Don't have an account?
Coming from Hortonworks? Activate your account here