I am a new user of nifi trying to integrate it for one of our use case. I have set up 3 node nifi cluster (26 cores, 128GB ram). I am stress testing the flow I have and not able to achieve the throughput I require. I am following the best practices articles. But with the simple flow I have I am not able to scale up more than 7Mbps transfer. Can anyone suggest me how to increase the throughput ?
Flowfiles are stuck transferring between processGroupPort -> OutputPort.
I have tried with various combination of backpressure, number of threads and batch size in RPG. The maximum I could achive was 8Mbps. I have seen various usecases where users have achived throughput much more than this.
Few settings I have changed in nifi.conf
backpressure threshold = 10000
java heap = 20GB
Timer Driven Thread Count in controller settings : 500
Can you guys please help me configure the flow for optimum performance ?
I would suggest you flip your design for your Site-to-Site dataflow.
Instead of using a "pull" design:
"<dataflow>" --> "Remote output port" . --> "Remote Process Group (RPG)" --> "<dataflow>"
"<dataflow>" --> "Remote Process Group (RPG)" ---> "Remote Input Port" --> "<dataflow>" - Every Node in a NiFi cluster is running its own copy of the flow. That means that an RPG on one node which is pulling data from a remote port on another system has no idea how many other nodes may be doing the same. Each node only knows via collected S2S details that the target consists of x num nodes. So there is no distributed pooling strategy amongst all the nodes.
With a push model RPG --> "Remote input port". The sending nodes know how many nodes in target cluster and can construct a better load distribution strategy.
Also take a look at following article to additionally tune your RPG: