Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

MiniFi to NiFi connection through load balancer?

avatar
Explorer

Does anyone have recommendations about how to use MiniFi with NiFi in a real production environment? For instance, I need to deploy MiniFi on a number of edge nodes outside of my network (customer data centers). I want to point MiniFi to my NiFi cluster in AWS. I DO NOT want my MiniFi to point at the individual NiFi cluster members (that would be poor network design) since I may change out NiFi clusters (Blue/Green). Changing the MiniFi configurations on the edge nodes should be done almost never.

What I would like to do is to to add a load balancer that would balance (MiniFi) traffic to the NiFi nodes, not so much for the load balancing as to hide the NiFi node DNS's from MiniFi (like you do). That way when I switch out a NiFi cluster (let's say to update NiFi version), I can just point the DNS that was pointing to my original load balancer to a new load balancer for the new cluster.

The client cert/SSL design of NiFi makes this difficult, and the recently added host header checking (1.5) even more difficult. Does anyone have this working properly?

Thanks

1 ACCEPTED SOLUTION

avatar
Super Collaborator

Site to site already load balancing internally. When you connect to a remote nifi over s2s and provide url of a node in that cluster, the first call by s2s is to the service discovery api. it receives the cluster topology and addresses of all nodes in the cluster , s2s client then uses that information to pick one node which under the least amount of load and then pushes data to that node. If one of your node is down, s2s should pick a different node in the cluster to push data. For upgrades, you can do a rolling upgrade , one node at a time and mostly you should be fine.

View solution in original post

6 REPLIES 6

avatar
Super Collaborator

Site to site already load balancing internally. When you connect to a remote nifi over s2s and provide url of a node in that cluster, the first call by s2s is to the service discovery api. it receives the cluster topology and addresses of all nodes in the cluster , s2s client then uses that information to pick one node which under the least amount of load and then pushes data to that node. If one of your node is down, s2s should pick a different node in the cluster to push data. For upgrades, you can do a rolling upgrade , one node at a time and mostly you should be fine.

avatar
Explorer

So I think you are saying that NiFi will direct S2S around the load balancer (i.e. to a specific node's internal IP) being used to abstract the individual nodes, similar to the behavior of Kafka? This is problematic. Single node migration is probably not a good option since this is being managed through HDF. How do I abstract the NiFI cluster members from the 800 MiniFi edge nodes that can almost never be updated?

avatar
Super Collaborator

@Michael Nacey Which minifi agent are you using? I dont think there is anything against load balancers, but in the past i have had some users complain about performance of the load balancer itself. What kind of messages are the minfi agent forwarding to nifi? are these infrequent large files or datasets , or very frequent small messages.

avatar
Explorer

We are using the standard Java minifi agent to query a database and send changed records out to nifi over S2S (frequent small updates). The main reason for the LB is not necessarily to balance the S2S load. It is to:

1. Obfuscate knowledge of the individual NiFi nodes from the minifi agent (networking best practice).

2. Make sure that the S2S host specified in the minifi config is not a single point of failure (meaning if that node happens to be down).

3. Keep the NiFi nodes in a private subnet (private security zone) and expose only the minimum required in a public zone.

In addition, all traffic comes in through a "transit VPC", meaning it has passed through (before reaching NiFi):

1. An edge balancer (balancing firewalls)

2. A firewall

3. A second balancer (balancing WAFs)

4. A WAF/RP that inspects traffic and routes accordingly, as well as inspecting traffic for funny business

It's mostly like this: https://www.draw.io/#LNifi.xml

avatar
Explorer

So far I have been able to get this working. Traffic flows fine through the final NLB, but we want to do some better load testing. I have put together a post that explains:

https://everymansravings.wordpress.com/2018/07/27/apache-nifi-behind-an-aws-load-balancer-w-minifi/

avatar
Explorer

I see there are some settings added for S2S use through Reverse Proxy. I will try those out.