Support Questions

Find answers, ask questions, and share your expertise

How to send flowfile to other Nifi instances in order?

avatar
Contributor

I have Nifi cluster(two instances). I generate urls using "GenerateFlowFile", then split it using "SplitText". For example;

-> "GenerateFlowFile" is generate this;
url1
url2
url3
url4

-> In "SplitText"(split count 2);
url1
url2
---------
url3
url4

-> I want to send these two flowfiles to other nifi instances in order. For this, i am using round robin. But i always want to go in the same order. What I want;
url1, url2 ---> node1
url3, url4 ---> node2

url1, url2 ---> node1
url3, url4 ---> node2

url1, url2 ---> node1
url3, url4 ---> node2

-> But data is being sent this way

url1, url2 ---> node1
url3, url4 ---> node2

url3, url4 ---> node1
url1, url2 ---> node2

How can I send same data to the same Nifi node?

1 ACCEPTED SOLUTION

avatar
Master Mentor
@Adam J

The Remote Process Group (RPG) was not designed with any logic to make sure specific FlowFiles went to one node versus another. IT was designed to simply build a delivery model based on load on target NiFi cluster nodes. That delivery model will change potentially each time the lates cluster status is retrieved.

-

If you need to be very specific as to which node get a specific FlowFile, you best bet is to use a direct delivery dataflow design.

The best option here is to have your splitText processor send to a routeOnContent processor that sends the split with URL 1/2 to one new connection and the flowfile with url 3/4 to another connection. Each of these connections would feed to a different postHTTP processor (this processor can be configured to send as flowfile). One of the would be configured to send to a listenHTTP processor on node 1 and the other configured to point at same listenHTTP processor on node 2.
-

You may want to think about this setup from a HA standpoint. If you lose either node 1 or 2, those flowfiles will just stack up and not transfer until the node is back online. at the same time the other urls continue to transfer.

-

Something else you may want to look into is the new load-balanced connections capability introduced in NiFi 1.8:
https://blogs.apache.org/nifi/entry/load-balancing-across-the-cluster

-

There is a "Partition by Attribute" option with this new feature which would make sure flowfiles with matching attribute go to same node. While you still can't specify a specific node, it does allow similar flowfiles to get moved to same node. if node goes down you don't end up with an outage, but files with matching attributes will stay together going to different node that is still available.

-

Thanks,

Matt

-

If you found this answer addressed your question, please take a moment to login in and click the "ACCEPT" link.

View solution in original post

3 REPLIES 3

avatar
Contributor

Hi @Adam J,

Not sure if I got you right. but maybe you can enable Site-to-Site communication.

To do that we will use Ambari to update the required configuration. In Ambari the below property values can be found at http://<AMBARI_NODE>:8080/#/main/services/NIFI/configs .

  • Change:
      		nifi.remote.input.socket.port=
    
    To
     		nifi.remote.input.socket.port=10000
    
  • Restart NiFi via Ambari

Now you should be ready to create our flow. To do this do the following:

  1. The first thing you are going to do is setup an Input Port. This is the port that NiFi will be sending data to. To do this drag the Input Port icon to the canvas and call it "From NiFi URL1".
  2. Now that the Input Port is configured you need to have somewhere for the data to go once we receive it. In this case you will keep it very simple and use a processor to route your content depending on url1, url2, url3, url4.
  3. Now that you have the input port and the processor to handle our data.
  4. Add a Remote Processor Group to the canvas
    •  For the URL copy and paste the URL for the NiFi UI from your browser
      
    • Connect the route content nifi to the Remote Process Group

I hope this path helps!

Cheers,

avatar
Master Mentor
@Adam J

The Remote Process Group (RPG) was not designed with any logic to make sure specific FlowFiles went to one node versus another. IT was designed to simply build a delivery model based on load on target NiFi cluster nodes. That delivery model will change potentially each time the lates cluster status is retrieved.

-

If you need to be very specific as to which node get a specific FlowFile, you best bet is to use a direct delivery dataflow design.

The best option here is to have your splitText processor send to a routeOnContent processor that sends the split with URL 1/2 to one new connection and the flowfile with url 3/4 to another connection. Each of these connections would feed to a different postHTTP processor (this processor can be configured to send as flowfile). One of the would be configured to send to a listenHTTP processor on node 1 and the other configured to point at same listenHTTP processor on node 2.
-

You may want to think about this setup from a HA standpoint. If you lose either node 1 or 2, those flowfiles will just stack up and not transfer until the node is back online. at the same time the other urls continue to transfer.

-

Something else you may want to look into is the new load-balanced connections capability introduced in NiFi 1.8:
https://blogs.apache.org/nifi/entry/load-balancing-across-the-cluster

-

There is a "Partition by Attribute" option with this new feature which would make sure flowfiles with matching attribute go to same node. While you still can't specify a specific node, it does allow similar flowfiles to get moved to same node. if node goes down you don't end up with an outage, but files with matching attributes will stay together going to different node that is still available.

-

Thanks,

Matt

-

If you found this answer addressed your question, please take a moment to login in and click the "ACCEPT" link.

avatar
Contributor

"Partition by Attribute" was great for me! Thank you Matt!