Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

load balancing in nifi

avatar
Rising Star

Please share best practices to achieve load balance in NiFi Cluster.

How to configure for auto scaling and configuratoin based manual scaling?

Thanks

1 ACCEPTED SOLUTION

avatar
Super Mentor

@spdvnz

When NiFi dataflows are designed so that they are pulling data from source systems, load-balancing and auto-scaling can be handled fairly easily through the use primary node and remote process groups.

For example:

A ListSFTP processor can be configured to run on the primary node only. It would be responsible for producing a 0 byte FlowFile for every File returned in the listing. Those 0 byte FlowFiles are then routed to a Remote Process Group (RPG) which is configured to point back at the same NiFi cluster. The RPG would handle smart load-balancing of data across all currently connected nodes in the cluster (If additional nodes are added or nodes are removed, the RPG auto-scales accordingly). On the receiving side of the RPG (remote input port) the 0 byte FlowFile would be routed to a FetchSFTP processor that will actually retrieve the content for the FlowFile. There are many list and fetch processors. They generally exist where the process is not exactly cluster friendly. (Not a good idea having GetSFTP running on every node as they would all complete for the same data.)

If your only options is to have your source system push data to your NiFi cluster, those source systems would need to know the currently available nodes at all times. One option is use MiNiFi on each of your source systems that can make use of NiFi's Site-To-Site (Remote Process Groups) protocol to send data in a smart load-balanced way to all the NiFi nodes currently connected in the target cluster.

Another option:

https://community.hortonworks.com/articles/16120/how-do-i-distribute-data-across-a-nifi-cluster.html

Thanks,

Matt

View solution in original post

1 REPLY 1

avatar
Super Mentor

@spdvnz

When NiFi dataflows are designed so that they are pulling data from source systems, load-balancing and auto-scaling can be handled fairly easily through the use primary node and remote process groups.

For example:

A ListSFTP processor can be configured to run on the primary node only. It would be responsible for producing a 0 byte FlowFile for every File returned in the listing. Those 0 byte FlowFiles are then routed to a Remote Process Group (RPG) which is configured to point back at the same NiFi cluster. The RPG would handle smart load-balancing of data across all currently connected nodes in the cluster (If additional nodes are added or nodes are removed, the RPG auto-scales accordingly). On the receiving side of the RPG (remote input port) the 0 byte FlowFile would be routed to a FetchSFTP processor that will actually retrieve the content for the FlowFile. There are many list and fetch processors. They generally exist where the process is not exactly cluster friendly. (Not a good idea having GetSFTP running on every node as they would all complete for the same data.)

If your only options is to have your source system push data to your NiFi cluster, those source systems would need to know the currently available nodes at all times. One option is use MiNiFi on each of your source systems that can make use of NiFi's Site-To-Site (Remote Process Groups) protocol to send data in a smart load-balanced way to all the NiFi nodes currently connected in the target cluster.

Another option:

https://community.hortonworks.com/articles/16120/how-do-i-distribute-data-across-a-nifi-cluster.html

Thanks,

Matt