Created on 12-16-2016 03:17 PM - edited 08-18-2019 05:12 AM
It might be discussed in other threads, but I have some misunderstanding and issue that maybe you can help me with. I have a 3 node NiFi cluster and I'd like to schedule ListFile on a Primary Node, then send the flowfiles via Remote Process Group to the same cluster and by that load balance the processing.
In my Remote Process Group I can set cluster URL as http://10.10.10.175:8080/nifi which is my nifi.node1.
However, what happens if nifi.node1 goes down? For that purpose I took another machine and installed HAProxy so it will serve as a load balancer for my cluster. Then I configured Remote Process Group to point to LB via http://10.10.10.174:8080/nifi, but I am getting error "Unable to refresh group's peers due to unable to communicate with remote NiFi cluster..."
I am able to work with NiFi Web UI via load balancer, but Remote Process Group doesn't work.
Is that right approach to use Load Balancer so I won't have single point of the failure in case one of the nodes that I configured in Remote Process group goes down? Any idea what might be the cause of this error?
Here's my HAProxy configuration:
global log /dev/log local0 log /dev/log local1 notice chroot /var/lib/haproxy stats socket /run/haproxy/admin.sock mode 660 level admin stats timeout 30s user haproxy group haproxy daemon defaults log global mode http option httplog option dontlognull timeout connect 5000 timeout client 50000 timeout server 50000 frontend http_front bind *:8080 reqadd X-Forwarded-Proto:\ http capture request header origin len 128 http-response add-header Access-Control-Allow-Origin %[capture.req.hdr(0)] if { capture.req.hdr(0) -m found } rspadd Access-Control-Allow-Headers:\ Origin,\ X-Requested-With,\ Content-Type,\ Accept if { capture.req.hdr(0) -m found } stats uri /haproxy?stats default_backend http_back backend http_back balance roundrobin server nifi1 10.10.10.175:8080 check server nifi2 10.10.10.176:8080 check server nifi3 10.10.10.177:8080 check
Created 12-16-2016 03:27 PM
You shouldn't need to use an additional load balancer. The URL you enter in the RPG is only used for the initial connection to learn about the nodes in the cluster, from there the RPG talks directly to all nodes. So the main failure case is if your NiFi restarted at a time when the URL in the RPG happened to be down, this will be addressed in an upcoming release which will allow you to enter multiple URLs: https://issues.apache.org/jira/browse/NIFI-3026
Created 12-16-2016 03:27 PM
You shouldn't need to use an additional load balancer. The URL you enter in the RPG is only used for the initial connection to learn about the nodes in the cluster, from there the RPG talks directly to all nodes. So the main failure case is if your NiFi restarted at a time when the URL in the RPG happened to be down, this will be addressed in an upcoming release which will allow you to enter multiple URLs: https://issues.apache.org/jira/browse/NIFI-3026
Created 12-16-2016 03:31 PM
Thank you Bryan! But what if that specific node that I entered in the RPG goes down and nod available? How will it find other nodes if the one in RPG is down?
Thank you!
Created 12-16-2016 03:38 PM
When you start the RPG it goes to the URL you entered and asks it for the info of all the cluster nodes one time, after that it is talking to all the nodes directly. So it doesn't matter if that node goes down while it is running because it already knows about all the nodes. It only matters if it goes down at the moment you start the RPG, which could be when your NiFi instance restarts or when the user clicks start in the UI.
Created 12-16-2016 04:10 PM
Thank you! All is clear now
Created 12-16-2016 03:30 PM
Rather than going this path, since it is still just the same cluster, why not leverage "Isolated Processor"?. Run ListFile on primary node and then load balance? "Isolated Processor" under following link.
https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#clustering
Created 12-16-2016 03:34 PM
Thank you! This is exactly what I want to do. Run ListFile on Primary node and then load balance to the same cluster... I read in few posts here that RPG is the way to load balance, but my concern is with the URL that I will enter in RPG. What if it goes down?
Created 12-16-2016 03:40 PM
The flow in the picture is the correct flow for load balancing, and it is what was meant by "with the proper dataflow configuration - load-balance it across the rest of the nodes" in the admin guide link above.
Created 01-23-2018 07:02 AM
Thanks. In the above solution, how will the external system (syslog source) identify as to which Nifi node to be sent with the messages? Will it be ZK url or Primary Node itself? In case of latter, how will a PN fail-over can be made aware to external source?