Created 02-16-2017 04:58 PM
I'm testing out NiFi site to site in a secure environment. I have a cluster of 3 nodes (call them server1, server2, server3) and 2 standalone, "client", NiFis that want to talk to the cluster over site to site (call them client1, client2). The server cluster is using kerberos for user auth and 2-way SSL with the clients for authentication and authorization. This all works fine if I give the clients one of the server names directly in the RemoteProcessGroup (e.g. https://server1.local:9443/nifi). The internal cluster load balancing is working and all certs seem to be behaving.
What I'd like to do is put a proxy between the clients and the cluster because in the real environment they'll be in different networks with only one VIP available for the proxy. I've tried putting HAProxy in between on another server (called proxy.local) and do SSL passthrough in HAProxy by using the tcp mode. The connection gets through but then the RPG complains that the returned certificate (eg server2.local) doesn't match the host name (proxy.local). I then tried giving all the servers the same certificate, all using the CN proxy.local, but then NiFi complains because its intra-cluster communication fails with (this from server1.local's log):
2017-02-16 10:03:34,873 WARN [Replicate Request Thread-2] o.a.n.c.c.h.r.ThreadPoolRequestReplicator Failed to replicate request GET /nifi-api/flow/current-user to server2.local:9443 due to {} com.sun.jersey.api.client.ClientHandlerException: java.io.IOException: HTTPS hostname wrong: should be <server2.local>
And in a final attempt that was surely doomed to failure I changed the nifi.properties on all servers to refer to themselves as proxy.local, and similarly changed their authorizers.xml to have only a single Node Identity of proxy.local. Which failed, predictably.
So, should this work? Is this an artifact of HAProxy usage that wouldn't show up on a VIP provided by say an F5 device? Am I missing something fundamental?
Created 02-16-2017 06:00 PM
Hi @Oliver Meyn,
An option to resolve your issue is to define SANs (Subject Alternative Names) in your certificates so that each certificate has the FQDN as CN (as you did initially) but also the FQDN and the proxy/load balancer FQDN as DNS SAN.
Have a look here - https://issues.apache.org/jira/browse/NIFI-3331
This will be possible with the TLS toolkit in the next release. This way you won't have any error regarding hostname mismatch.
Hope this helps.
Created 02-16-2017 05:10 PM
Hi @Oliver Meyn
I've never tried with HA proxy but I have configured Site To Site through a squid proxy with TLS. It didn't require much in the way of special configuration, here's what I had in my config file (this is for dev purposes, you would probably want to lock it down quite a bit).
http_access allow all # Choose the port you want. Below we set it to default 3128. http_port 3128
You'll definitely want to use the HTTP site to site and configure your keystore and truststore for the client.
Thanks,
Bryan
Created 02-16-2017 06:00 PM
Hi @Oliver Meyn,
An option to resolve your issue is to define SANs (Subject Alternative Names) in your certificates so that each certificate has the FQDN as CN (as you did initially) but also the FQDN and the proxy/load balancer FQDN as DNS SAN.
Have a look here - https://issues.apache.org/jira/browse/NIFI-3331
This will be possible with the TLS toolkit in the next release. This way you won't have any error regarding hostname mismatch.
Hope this helps.
Created 02-17-2017 09:44 PM
Thanks very much @Pierre Villard - TIL about DNS SANs, and what a tricky business it is to properly generate certs that contain them! Looking forward to the upgraded tls-toolkit. I've accepted the answer because it solves my 2-way auth problem but unfortunately the files aren't yet flowing. The client nifi connects through the proxy but first failed with "Unable to validate peers" of the remote group because (from the logs) it was trying to get directly to the cluster on server1.local:9443 where the proxy address is proxy.local:5678. As a hack I changed the proxy's port and then at least the client could connect. But flowfiles just queue up in front of the RPG now, and I don't know what's wrong. The client logs show:
2017-02-17 16:07:28,523 INFO [Timer-Driven Process Thread-7] o.apache.nifi.remote.client.PeerSelector New Weighted Distribution of Nodes: PeerStatus[hostname=server1.local,port=9443,secure=true,flowFileCount=0] will receive 50.0% of data PeerStatus[hostname=server2.local,port=9443,secure=true,flowFileCount=0] will receive 50.0% of data
which strikes me as highly suspicious. I reduced the cluster by one from my original question so only 2 are expected, but the fact that the client logs refer to the cluster members by name seems to me that it'll be trying to speak to the cluster directly. Any insight?
Created 02-20-2017 08:58 AM
Hi @Oliver Meyn
As you suspected, NiFi Site-to-Site requires direct peer-to-peer communication. Your previous log seems that the client was able to retrieve remote cluster topology (server1.local and server2.local) by sending a request through the proxy. But it can't talk to those nodes directly.
I haven't tried to put a reverse proxy such as HAProxy for NiFi Site-to-Site, because Site-to-Site handles load distribution by itself. If your client needs to go through a proxy server because of a restricted firewall or networking, then I'd recommend to use forward proxies with HTTP transport protocol as Bryan answered.
Having said that, setting 'nifi.remote.input.host=proxy.local' in nifi.properties on each node in the remote cluster might work as it introduces every node as 'proxy.local' for Site-to-Site communication, thus further communication goes through the proxy. But I can't guarantee if this works.
Thanks,
Koji
Created 02-22-2017 02:32 PM
Thanks for confirming my suspicions @kkawamura - looks like I'll have to find another way.