Support Questions

Find answers, ask questions, and share your expertise

Nifi tuning for a high number of tasks

avatar
Frequent Visitor

Hello,

We are using a Nifi cluster with 5 nodes. Each node is a machine with 48 cores and 280-300 Gi of available memory. The issue we currently have is Nifi having trouble keeping up when a high number of tasks is required, 1,000,000+ tasks. The files are eventually transferred but errors do pop up in the mean time that seem to be irrelevant and more related to the flow trying to keep up. It's a simple flow with a GetFile > UpdateAttribute > PutSFTP > LogMesssage. I have increased concurrent tasks and batch sizes but this has little to no effect. 

1 ACCEPTED SOLUTION

avatar
Master Mentor

@jfs912 

You should not be configuring your NiFi with larger then necessary heap.  Doing so just leads to very long stop-the-world garbage collection events.   The simple flow you have described would use very little heap memory.

  • So you have a 5 node cluster and each nodes has files in some local directory that each node is pulling from?
  • Is that local directory a mounted directory that is mounted to all nodes or each nodes has its own set of files in the local directly from which getFile is pulling from?
  • Are you seeing backpressure being applied on any of the connections between your processors?  When backpressure is being applied to the upstream processor, NiFi will not schedule that upstream processor until that backpressure is removed.
  • If you can tolerate some latency in your dataflow, you can get better throughput performance with some processors by increasing the Run Duration as well.
  • Dataflow design best practices and designs can also improve performance and better load distribution across all the nodes in your cluster.  You want to minimize as much as possible one node doing bulk of the work load.


Adjusting concurrent tasks has multiple elements to it.

  1. What is the current CPU load average on each of yoru 5 servers?  First need to know if there is capacity to run more parallel threads.
  2. How large is the configured timer driven thread pool in NiFi?  It is from this configured thread pool that all concurrent tasks used by processor components comes from.  If this pool is small, adding more concurrent tasks to processors will improve nothing.  Ability to increase the size of this thread pool is dependent on node's cpu load average.  Thread pool is also applied per node.  So when set to 10 that is 10 threads per each node in your 5 node cluster. MattWho_0-1738863533317.png

     

  3. If cpu load average is not high and you increase the size of the Timer Driven Thread pool, you'll want to make small incremental changes to the concurrent tasks on processor and monitor impact on CPU load average.

Please help our community grow and thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.

Thank you,
Matt

View solution in original post

3 REPLIES 3

avatar
Community Manager

@jfs912 Welcome to the Cloudera Community!

To help you get the best possible solution, I have tagged our NiFi experts @MattWho @SAMSAL  who may be able to assist you further.

Please keep us updated on your post, and we hope you find a satisfactory solution to your query.


Regards,

Diana Torres,
Senior Community Moderator


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar
Master Mentor

@jfs912 

You should not be configuring your NiFi with larger then necessary heap.  Doing so just leads to very long stop-the-world garbage collection events.   The simple flow you have described would use very little heap memory.

  • So you have a 5 node cluster and each nodes has files in some local directory that each node is pulling from?
  • Is that local directory a mounted directory that is mounted to all nodes or each nodes has its own set of files in the local directly from which getFile is pulling from?
  • Are you seeing backpressure being applied on any of the connections between your processors?  When backpressure is being applied to the upstream processor, NiFi will not schedule that upstream processor until that backpressure is removed.
  • If you can tolerate some latency in your dataflow, you can get better throughput performance with some processors by increasing the Run Duration as well.
  • Dataflow design best practices and designs can also improve performance and better load distribution across all the nodes in your cluster.  You want to minimize as much as possible one node doing bulk of the work load.


Adjusting concurrent tasks has multiple elements to it.

  1. What is the current CPU load average on each of yoru 5 servers?  First need to know if there is capacity to run more parallel threads.
  2. How large is the configured timer driven thread pool in NiFi?  It is from this configured thread pool that all concurrent tasks used by processor components comes from.  If this pool is small, adding more concurrent tasks to processors will improve nothing.  Ability to increase the size of this thread pool is dependent on node's cpu load average.  Thread pool is also applied per node.  So when set to 10 that is 10 threads per each node in your 5 node cluster. MattWho_0-1738863533317.png

     

  3. If cpu load average is not high and you increase the size of the Timer Driven Thread pool, you'll want to make small incremental changes to the concurrent tasks on processor and monitor impact on CPU load average.

Please help our community grow and thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.

Thank you,
Matt

avatar
Community Manager

@jfs912 Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.


Regards,

Diana Torres,
Senior Community Moderator


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community: