Created on 03-13-2017 11:26 AM - edited 08-17-2019 01:51 PM
NiFi Site to Site (S2S) is a communication protocol used to exchange data between NiFi instances or clusters. This protocol is useful for use case where we have geographically distributed clusters that need to communicate. Examples include:
S2S provides several benefits such as scalability, security, load balancing and high availability. More information can be found here
NiFi can be secured by enabling SSL and requiring users/nodes to authenticate with certificates. However, in some scenarios, customers have secured and unsecured NiFi clusters that should communicate. The objective of this tutorial is to show two approaches to achieve this. Discussions on having secure and unsecured NiFi cluster in the same application are outside the topic of this tutorial.
Let's assume that we have already installed an unsecure HDF cluster (Cluster2) that needs to send data to a secure cluster (Cluster1).
Cluster1 is a 3 node NiFi cluster with SSL : hdfcluster0, hdfcluster1 and hdfcluster2. We can see the HTTPS in the URLs as well as the connected user 'ahadjidj'.
Cluster2 is also a 3 nodes NiFi cluster but without SSL enabled : hdfcluster20, hdfcluster21 and hdfcluster22
The easiest way to get data from cluster 2 to cluster 1 is to use a Pull method. In this approach, cluster 1 will use a Remote Process Group to pull data from cluster 2. We will configure the RPG to use HTTP and no special configurations are required. However, data will go unencrypted over the network. Let's see how to implement this.
We should see flow files coming from the RPG and buffering before the PutFile processor.
The first approach was easy to configure but data was sent unencrypted over the wire. If we want to leverage SSL and send data encrypted even between the two clusters, we need to generate and use certificates for each node in the Cluster2. The only difference here is that we don't activate SSL.
I suppose that you already know how to generate certificates for CA/nodes and add them to Truststore/KeyStore. Otherwise, there are several HCC articles that explain how to do it.
We need to configure Cluster2 with its certificats
In Cluster1, add an input port (toCluster1) and connect it to a PutFile processor.
Use a GenerateFlowFile to generate data in Cluster2 and a RPG to push data to Cluster1. Here we will use HTTPS addresses when configuring the RPG.
Cluster2 should be able to send data to Cluster1 via the toCluster1 input port. However, the RPG shows a Forbidden error
The previous error is triggered because nodes belonging to Cluster2 are not authorized to access to Cluster1 resources. To solve the problem, let's do the following configurations:
1) Go the users menu in Cluster1 and add a user for each node from Cluster2
2) Go to the policies menu in Cluster1, and add each node from Cluster2 to the retrieve site-to-site details policy
At this point, the RPG in Cluster2 is working however the input port is not visible yet
3) The last step is editing the input port policy in Cluster1 to authorize nodes from Cluster2 to send data through S2S. Select the toCluster1 Input port and click on the key to edit it's policies. Add cluster2 nodes to the list.
4) Now, go back to cluster2 and connect the GenerateFlowFile with the RPG. The input port should be visible and data start flowing "securely" 🙂