Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Apache NiFi is not utilizing all CPU cores

avatar
New Contributor

We have configured both NiFi Settings Maximum Timer Driven to 80 & Maximum Event DrivenThread Count to 120. We have a 4 node cluster with 64 cores.

 

On diagnostics we see only average usage of 5 cores. Additionally, the data is queuing up and processing slowly.

Issue is The CPU utilization does not go beyond some percentage.

 

What further configurations can help utilize the CPUs and increase the computation power.

1 ACCEPTED SOLUTION

avatar
Master Mentor

@knighttime 
You should not be configuring any of your NiFi processors to us the Event Driven scheduling strategy.  It was not moved from an experimental method to production ready.  Advances in the Time rDriven scheduling strategy has made int more efficient.   So Event Driven is pretty much deprecated at this point in time.   If you are not using Event Driven scheduling on any processor component in your NiFi, You should not be setting a large "Maximum Event Drive Thread Count" pool (default is 5, but i recommend setting to 1).   While you can increase the Maximum while NiFi is running, reducing will require you to restart your NiFi.

Now when it comes to the "Maximum Timer Driven thread count" pool.   We can create a large pool which is per node in your 4 node cluster (80 thread X 4).   Then you configure concurrent tasks on your individual processors to scale concurrency on each processor component.  Also keep in mind that many processors execute to check connection inbound connection queues and those thread may only be active for micro seconds before being released back to the thread pool. so actually seeing full thread utilization represented in the status bar of your NiFi's may be difficult to see.

Tips about concurrent task setting on processors.  Setting high concurrent tasks configuration across many processors can be worse than leaving everything set at 1 in terms of overall performance.  Start in the basement (1 concurrent task) and slowly increment concurrent tasks on processors as needed.

You mention data queueing up, but it is difficult to tell you why or provide guidance without seeing your dataflow and knowing which processors have data backed up to them in their inbound connections and the configuration of those processors.

If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped.

Thank you,

Matt

View solution in original post

2 REPLIES 2

avatar
Master Mentor

@knighttime 
You should not be configuring any of your NiFi processors to us the Event Driven scheduling strategy.  It was not moved from an experimental method to production ready.  Advances in the Time rDriven scheduling strategy has made int more efficient.   So Event Driven is pretty much deprecated at this point in time.   If you are not using Event Driven scheduling on any processor component in your NiFi, You should not be setting a large "Maximum Event Drive Thread Count" pool (default is 5, but i recommend setting to 1).   While you can increase the Maximum while NiFi is running, reducing will require you to restart your NiFi.

Now when it comes to the "Maximum Timer Driven thread count" pool.   We can create a large pool which is per node in your 4 node cluster (80 thread X 4).   Then you configure concurrent tasks on your individual processors to scale concurrency on each processor component.  Also keep in mind that many processors execute to check connection inbound connection queues and those thread may only be active for micro seconds before being released back to the thread pool. so actually seeing full thread utilization represented in the status bar of your NiFi's may be difficult to see.

Tips about concurrent task setting on processors.  Setting high concurrent tasks configuration across many processors can be worse than leaving everything set at 1 in terms of overall performance.  Start in the basement (1 concurrent task) and slowly increment concurrent tasks on processors as needed.

You mention data queueing up, but it is difficult to tell you why or provide guidance without seeing your dataflow and knowing which processors have data backed up to them in their inbound connections and the configuration of those processors.

If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped.

Thank you,

Matt

avatar
Community Manager

@knighttime Has your issue been resolved?  If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. 

 

 

 

Screen Shot 2019-08-06 at 1.54.47 PM.png

 

 


Cy Jervis, Manager, Community Program
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.