Created on 03-20-2018 07:49 AM - edited 09-16-2022 05:59 AM
We are using NiFi a cluster mode. Each node is running on a 40 core machine. We have configured both NiFi Settings Maximum Timer Driven & Maximum Event Driven Thread Count to 50 each. There are multiple flows which are computation intensive processing both real time continous and batch data.. However, the CPU utilization does not go beyond 20%.
What further configurations can help utilize the CPUs and increase the computation power.
Created 03-20-2018 10:01 AM
Setting the optimal value of max thread count depends on your use cases and what processors you are using (CPU intensive like convert processor or IO intensive like the put/get processors). I've seen better usage of my hardware by having thread count around 2x number of cores. I've seen some cluster with 3x number of cores. I think you can go beyond 50 in your case and monitor the behavior. The best thing to do is to proceed in an incremental manner.
I hope this helps.
Abdelkrim
Created 03-20-2018 11:52 AM
Just to add to the excellent answer above. The use of the "Event Driven" scheduling strategy by any NiFi processor component is not recommended. The Event Driven strategy is considered experimental. So there is no need to configure a thread resource pool under "Max Event Driven Thread Count". I recommend setting this value back to default 5. Reducing the size of the event driven thread pool will require a NiFi restart. (Event driven Increase can be performed without restart)
-
Max Timer Driven Thread count can be increased and decreased without a NiFi restart.
-
Does your NIFi canvas show your dataflow using all the threads provided. Best to access "Cluster" UI and observe how the thread pool on each node is being utilized. Do you observe any node where the thread usage is close to your "Max Timer Driven Thread Count"?
Are you seeing bottlenecks in your dataflow (Queued up data)?
Thanks,
Matt
Created 03-21-2018 04:59 AM
Hi @Matt Clarke, the back pressure if high for the CPU intensive processors, whereas the CPU utilization is only 20%. The processor load average is max 5.
Created 03-27-2018 12:33 PM
You currently have 120 set as your "maximum Timer Driven Thread Count". Multiply that by the number of nodes in your NiFi cluster to see maximum number of useable threads cumulative across you cluster.
Then look at the info bar across the top of your canvas. Does it look like your dataflows is using all these threads you have allocated? You may need to make adjustments to your processor configurations to maximize the thread usage.
Look for where you have bottlenecks in your dataflow (queues built up in front of processors). What kind of processors reading from these built up queues? How have they been configured?
Just because you allocated more available threads does not mean NiFi processors are going to automatically start using them or even be allowed to use them.
Created 03-21-2018 04:58 AM
Thanks @Abdelkrim Hadjidj, we are trying to increment it and check.
Created 03-23-2018 06:00 PM
Any feedback on your issue ?
Created 03-27-2018 10:55 AM
Doesn't change much of the CPU utilization. We increased the Maximum Timer Driven Thread Count to 3x but the CPU utilization does no go beyond 20%.