Member since
01-27-2025
2
Posts
0
Kudos Received
0
Solutions
01-27-2025
06:53 AM
@MattWho Hi Matt, I now understand the load balancing behavior, thanks! My data flow is a typical ETL process. The most heavy work is counting and analyzing time series data, such as the ExecuteStreamCommand processor running a python script to read time series data from a database to perform statistical analysis. And, I configured concurrent tasks to be close to the number of cluster cpu cores. Currently, I/O and CPU time are about half apart. In the past, it took about 20 minutes for a single node NIFI to process a batch of data received in half an hour, which was close to the limit, and it was difficult to perform more data analysis. I've tried to solve this problem by using redis, scale the database and nifi cluster. It does help, but you can see that the slowest node becomes the bottleneck of the whole system. Based on the current load balancing logic, if the processing power of three nodes can be estimated, such as 1:2:2, is there any way to distribute data flows to the corresponding nodes according to this ratio? Also, it might be a good idea to upgrade nifi, I have thousands of processors in my Data Flow, so which version is better to upgrade from nifi 1.12.1, to balance the difficulty of upgrading with the performance of the cluster. BR, Sean
... View more
01-27-2025
12:38 AM
Hi, all, To improve the throughput of nifi, I enlarged my standalone nifi to a three nodes cluster. And, configured my data flow accordingly, now it works but the load balance doesn't work as expected. The load balancing is configured as round-robin, and it seems that all flow files are dispatched to three nodes at the beginning. However, later I noticed that the cluster slowed down dramatically, with only one node being processing and the other two running nothing. However, I think the unfinished flow files in the queue should be dispatched to other nodes. 192.168.1.200 is the master node and the slowest node in the cluster. My nifi version is 1.12.1, is it the expected behavior of round robin on this version? I want to maximize throughput, please tell me how I can do that.
... View more
Labels:
- Labels:
-
Apache NiFi