Support Questions

Find answers, ask questions, and share your expertise

Load Balance NiFi Cluster

avatar
Contributor

It might be discussed in other threads, but I have some misunderstanding and issue that maybe you can help me with. I have a 3 node NiFi cluster and I'd like to schedule ListFile on a Primary Node, then send the flowfiles via Remote Process Group to the same cluster and by that load balance the processing.

In my Remote Process Group I can set cluster URL as http://10.10.10.175:8080/nifi which is my nifi.node1.

However, what happens if nifi.node1 goes down? For that purpose I took another machine and installed HAProxy so it will serve as a load balancer for my cluster. Then I configured Remote Process Group to point to LB via http://10.10.10.174:8080/nifi, but I am getting error "Unable to refresh group's peers due to unable to communicate with remote NiFi cluster..."

10391-2016-12-16-17-10-07.jpg

I am able to work with NiFi Web UI via load balancer, but Remote Process Group doesn't work.

Is that right approach to use Load Balancer so I won't have single point of the failure in case one of the nodes that I configured in Remote Process group goes down? Any idea what might be the cause of this error?

Here's my HAProxy configuration:

global
   log /dev/log local0
   log /dev/log local1 notice
   chroot /var/lib/haproxy
   stats socket /run/haproxy/admin.sock mode 660 level admin
   stats timeout 30s
   user haproxy
   group haproxy
   daemon


defaults
   log global
   mode http
   option httplog
   option dontlognull
   timeout connect 5000
   timeout client 50000
   timeout server 50000


frontend http_front
   bind *:8080
   reqadd X-Forwarded-Proto:\ http
   capture request header origin len 128
   http-response add-header Access-Control-Allow-Origin %[capture.req.hdr(0)] if { capture.req.hdr(0) -m found }
   rspadd Access-Control-Allow-Headers:\ Origin,\ X-Requested-With,\ Content-Type,\ Accept  if { capture.req.hdr(0) -m found }
   stats uri /haproxy?stats
   default_backend http_back


backend http_back
   balance roundrobin
   server nifi1 10.10.10.175:8080 check
   server nifi2 10.10.10.176:8080 check
   server nifi3 10.10.10.177:8080 check


1 ACCEPTED SOLUTION

avatar
Master Guru

You shouldn't need to use an additional load balancer. The URL you enter in the RPG is only used for the initial connection to learn about the nodes in the cluster, from there the RPG talks directly to all nodes. So the main failure case is if your NiFi restarted at a time when the URL in the RPG happened to be down, this will be addressed in an upcoming release which will allow you to enter multiple URLs: https://issues.apache.org/jira/browse/NIFI-3026

View solution in original post

8 REPLIES 8

avatar
Master Guru

You shouldn't need to use an additional load balancer. The URL you enter in the RPG is only used for the initial connection to learn about the nodes in the cluster, from there the RPG talks directly to all nodes. So the main failure case is if your NiFi restarted at a time when the URL in the RPG happened to be down, this will be addressed in an upcoming release which will allow you to enter multiple URLs: https://issues.apache.org/jira/browse/NIFI-3026

avatar
Contributor

Thank you Bryan! But what if that specific node that I entered in the RPG goes down and nod available? How will it find other nodes if the one in RPG is down?

Thank you!

avatar
Master Guru

When you start the RPG it goes to the URL you entered and asks it for the info of all the cluster nodes one time, after that it is talking to all the nodes directly. So it doesn't matter if that node goes down while it is running because it already knows about all the nodes. It only matters if it goes down at the moment you start the RPG, which could be when your NiFi instance restarts or when the user clicks start in the UI.

avatar
Contributor

Thank you! All is clear now

avatar
Super Guru
@Michael Kalika

Rather than going this path, since it is still just the same cluster, why not leverage "Isolated Processor"?. Run ListFile on primary node and then load balance? "Isolated Processor" under following link.

https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#clustering

avatar
Contributor

Thank you! This is exactly what I want to do. Run ListFile on Primary node and then load balance to the same cluster... I read in few posts here that RPG is the way to load balance, but my concern is with the URL that I will enter in RPG. What if it goes down?

avatar
Master Guru

The flow in the picture is the correct flow for load balancing, and it is what was meant by "with the proper dataflow configuration - load-balance it across the rest of the nodes" in the admin guide link above.

avatar
New Contributor

Thanks. In the above solution, how will the external system (syslog source) identify as to which Nifi node to be sent with the messages? Will it be ZK url or Primary Node itself? In case of latter, how will a PN fail-over can be made aware to external source?