Support Questions

Find answers, ask questions, and share your expertise

Nifi and Clustering

avatar
Contributor

Hi,

I have two questions:

1. My understanding is that if I have two servers and I want to route files on a given condition so that only a certain node located on one of the servers will do processing on these specified files, I should use Site-2-Site/Remote Process Group. Is this correct?

2. I have tried to set up an RPG. First I tried from a Nifi Cluster Manager instance to another remote nifi instance. Then I tried from one NCM to another NCM. I got the same error on both: a java.net.ConnectException. I am able to set up the input port, on the remote instance and have both clusters run on the servers, but the first NCM is unable to pass files to the second. Could this be due to a nifi.properties port issue? I have followed the set up for nifi.properties instructions pretty closely. Also, just for clarification, in nifi.properties on the remote instance (node receiving data) , is the input.remote.socket port on the manager supposed to be different than the child node's? Any advice would be appreciated.

Thanks.

K

1 ACCEPTED SOLUTION

avatar

Regarding the first question about wanting to distribute data from a given node to another node... Site-to-Site is meant for sending data from one cluster to another cluster on explicit ports (named input/entry points) to another cluster. It then takes care of load-balancing and fail-over. At present, site-to-site does not support sending data to a limited subset of nodes based on some defined criteria (partitioning). Though this is an interesting idea and something that has been talked about.

However, as you've describe your case thus far you might find that simply using PostHttp (on the sending node(s)) and ListenHttp on the listening node(s) is sufficient. With PostHTTP you get to address a specific recipient and therefore will know that only that node is getting the data of interest. You could then route other data that can be more generally spread throughout the cluster to use site-to-site.

View solution in original post

5 REPLIES 5

avatar
Guru

Really if you have two questions, you should ask them as two questions on this forum (it's a Q&A style, not a forum or mail list style we're going for)

avatar
Guru

What you are looking for in primary node scheduling, not Remote Process Groups. If you have a NiFi cluster, once of the nodes is designated as the primary. You can then schedule certain processors to run only on primary in the Scheduling tab.

To answer your second question, you will need the NCM on a port, the DFMs on a port, and the Remote Socket port (input.remote.socket) set to something else. The way the Site-to-Site protocol works, is that the control channel is established with the NCM over the same port as the web ui and api calls. The data channel is then established directly to the DFMs over the port specified in the input.remote.socket settings (note you may also want to specify the listening host name as well here). You then establish the connection in the connecting RPG to the address of the NCM and the port of the UI, so the exact address you would go to to change the flow on the server end of the RPG.

avatar
Contributor

Thanks for your explanation -- and sorry new to the site, will ask Q&A style in the future. But I am trying to figure out how to distribute data between instances which is why I thought RPG approach might be the way to do it -- is this true?

avatar

Regarding the first question about wanting to distribute data from a given node to another node... Site-to-Site is meant for sending data from one cluster to another cluster on explicit ports (named input/entry points) to another cluster. It then takes care of load-balancing and fail-over. At present, site-to-site does not support sending data to a limited subset of nodes based on some defined criteria (partitioning). Though this is an interesting idea and something that has been talked about.

However, as you've describe your case thus far you might find that simply using PostHttp (on the sending node(s)) and ListenHttp on the listening node(s) is sufficient. With PostHTTP you get to address a specific recipient and therefore will know that only that node is getting the data of interest. You could then route other data that can be more generally spread throughout the cluster to use site-to-site.

avatar
Contributor
@jwitt

Awesome! It worked! Thanks so much