Support Questions

Find answers, ask questions, and share your expertise

NiFi - RPG & Workload Distribution between clustered nodes, limitations

avatar
Contributor

Hello,

First time posting here and I really appreciate everyone's posts that kept me moving along between now and when I had nothing working.

To begin, our team has setup flows on a standalone. Everything works and now we created a cluster to leverage more processing power. Our issue is our source can only be access by a Primary node because we cannot guarantee unique data between all five nodes in the cluster.

After the data has been pulled by the primary we would like to distribute the work between everyone in the cluster since the files technically only exist on the Primary node as I understand it. So we run the data through an RPG which distributes the data and all looks good.

Now here in lies our issue. We are trying to build out a platform for a large group of people. If we must create RPGs on the root canvas managing permissions and keeping things organized will get very messy. I can understand the concern for visibility of S2S between instances but doing so between instances of the same cluster seems very cumbersome.

Do we have other options we are not aware of to distribute files between the same cluster nodes? Additionally if adding an input/output port anywhere below the root canvas is it expected that the processor policy for send/receive data is grayed out so we are unable to grant permissions to these ports?

Thanks

1 ACCEPTED SOLUTION

avatar

@Zack Atkinson

One way to eliminate multiple RPGs on the cluster, is to have the standalone node ingest all of the data and then one RPG between the standalone node and the cluster distributing the data to the cluster.

Then, on the cluster, you can setup Process Groups for different groups of users and send whatever data a particular group needs from the data being sent to the cluster.

View solution in original post

2 REPLIES 2

avatar

@Zack Atkinson

One way to eliminate multiple RPGs on the cluster, is to have the standalone node ingest all of the data and then one RPG between the standalone node and the cluster distributing the data to the cluster.

Then, on the cluster, you can setup Process Groups for different groups of users and send whatever data a particular group needs from the data being sent to the cluster.

avatar
Rising Star

@Zack Atkinson. Did you find a solution to this? I'm in the same situation now as you have described (single cluster), and find the explicit round-trip through a RPG on root canvas very hard to manage, and not elegant. All for just load balancing FlowFiles after ListSFTP/Fetch.