Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Nifi Cluster with Remote Process Group

avatar
Contributor

I am working with nifi cluster. As in tutorial it is suggestion to use remote process group with cluster for load balancing.

But to use Remote process group, we have to provide url of other nifi instance which is hardcoded. So what will happen if that nifi instance goes down in between of workflow execution. Is remote process call is working in that situation as load balancing concept is implemented in Remote process call.

1 ACCEPTED SOLUTION

avatar
Super Mentor
@Gaurav Jain

The URL provided when adding the Remote Process Group (RPG) to your canvas must be successful only when initially added. Once a successful connection is established the target instance will return a list of currently connected cluster nodes. The source instance with the RPG will record those hosts in peer files. From that point forward the RPG constantly updates the list of available nodes and will not only load-balance to those nodes but will also use anyone of them to get an updated status. Lets assume your source instance of NiFi has trouble getting a status update from any of the nodes, it will still attempt to load-balance with failover delivery of data to the last known set of nodes until communication is successful in getting an updated list.

In addition, NiFi will also allow you to specify multiple URLs in the RPG when you create it. Simply provide a comma separated list of URLS for the nodes in the same target cluster. This does not change how the RPG works. It will still constantly retrieve a new listing of available nodes. This allows the target cluster to scale up or down without affecting your Site-To-Site (S2S) functionality.

Thanks,

Matt

View solution in original post

11 REPLIES 11

avatar
Super Mentor

@Gaurav Jain

NiFi does not redistribution of FlowFiles at this time between nodes behind the scenes. Any redistribution of FLowFiles between nodes in a cluster has to be done programmatically through your dataflow design via components (processors like postHTTP to ListenHTTP or RPG) to push FlowFiles to other nodes.

Thanks,

Matt

avatar
Contributor

This means that, if one node is performing some transformation to flowfile, and in between it if that node goes down, then its overall functionality is transferred to another node.