Support Questions

knighttime · ‎09-26-2022

We have configured both NiFi Settings Maximum Timer Driven to 80 & Maximum Event DrivenThread Count to 120. We have a 4 node cluster with 64 cores.

On diagnostics we see only average usage of 5 cores. Additionally, the data is queuing up and processing slowly.

Issue is The CPU utilization does not go beyond some percentage.

What further configurations can help utilize the CPUs and increase the computation power.

MattWho · ‎09-26-2022

@knighttime
You should not be configuring any of your NiFi processors to us the Event Driven scheduling strategy. It was not moved from an experimental method to production ready. Advances in the Time rDriven scheduling strategy has made int more efficient. So Event Driven is pretty much deprecated at this point in time. If you are not using Event Driven scheduling on any processor component in your NiFi, You should not be setting a large "Maximum Event Drive Thread Count" pool (default is 5, but i recommend setting to 1). While you can increase the Maximum while NiFi is running, reducing will require you to restart your NiFi.

Now when it comes to the "Maximum Timer Driven thread count" pool. We can create a large pool which is per node in your 4 node cluster (80 thread X 4). Then you configure concurrent tasks on your individual processors to scale concurrency on each processor component. Also keep in mind that many processors execute to check connection inbound connection queues and those thread may only be active for micro seconds before being released back to the thread pool. so actually seeing full thread utilization represented in the status bar of your NiFi's may be difficult to see.

Tips about concurrent task setting on processors. Setting high concurrent tasks configuration across many processors can be worse than leaving everything set at 1 in terms of overall performance. Start in the basement (1 concurrent task) and slowly increment concurrent tasks on processors as needed.

You mention data queueing up, but it is difficult to tell you why or provide guidance without seeing your dataflow and knowing which processors have data backed up to them in their inbound connections and the configuration of those processors.

If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped.

Thank you,

Matt

View solution in original post

MattWho · ‎09-26-2022

@knighttime
You should not be configuring any of your NiFi processors to us the Event Driven scheduling strategy. It was not moved from an experimental method to production ready. Advances in the Time rDriven scheduling strategy has made int more efficient. So Event Driven is pretty much deprecated at this point in time. If you are not using Event Driven scheduling on any processor component in your NiFi, You should not be setting a large "Maximum Event Drive Thread Count" pool (default is 5, but i recommend setting to 1). While you can increase the Maximum while NiFi is running, reducing will require you to restart your NiFi.

Now when it comes to the "Maximum Timer Driven thread count" pool. We can create a large pool which is per node in your 4 node cluster (80 thread X 4). Then you configure concurrent tasks on your individual processors to scale concurrency on each processor component. Also keep in mind that many processors execute to check connection inbound connection queues and those thread may only be active for micro seconds before being released back to the thread pool. so actually seeing full thread utilization represented in the status bar of your NiFi's may be difficult to see.

Tips about concurrent task setting on processors. Setting high concurrent tasks configuration across many processors can be worse than leaving everything set at 1 in terms of overall performance. Start in the basement (1 concurrent task) and slowly increment concurrent tasks on processors as needed.

You mention data queueing up, but it is difficult to tell you why or provide guidance without seeing your dataflow and knowing which processors have data backed up to them in their inbound connections and the configuration of those processors.

If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped.

Thank you,

Matt

cjervis · ‎10-10-2022

@knighttime Has your issue been resolved? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.

mark as solution button

Cy Jervis, Manager, Community Program
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Cloudera Community

Support Questions

Apache NiFi is not utilizing all CPU cores

NiFi CPU utilization without active threads (workf...

Monitoring Energy Usage Utilizing Apache NiFi, Pyt...

scale out nifi CPU utilization

Integrating Apache NiFi with AWS S3 and SQS

Integrating Apache Spark 2.x Jobs with Apache NiFi...

Versioned DataFlows with Apache NiFi 1.5 and Apach...

Apache NiFi 1.1.0 on Docker

DevOps Tips: Using the Apache NiFi Toolkit with A...

Troubleshooting high CPU issues

Nifi CPU core utilization