Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (1)
avatar

The standard solution

Let's say you want to collect log messages from an edge cluster with NIFI, and push it to a central NIFI cluster via the Site To Site (S2S) protocol. This is exactly what NIFI is designed for, and results in a simple flow setup like this:

  • A processor that tails the log file
  • which sends it's flowfiles to a remote process group which is configured with the FQDN URL of the central NIFI cluster
  • on the central NIFI cluster an INPUT port is defined
  • and from that input port the rest of the flow is doing it's thing with the incoming flow files, like filtering, transformations and eventually sinking it into kafka, HDFS or SOLR.
  • The NIFI S2S protocol is used for the connection between the edge NIFI cluster and the central nifi cluster.
  • which PUSHES the flowfiles from the edge cluster to the central NIFI cluster.

And now with a firewall blocking incoming connections in between

This standard setup however assumes the central NIFI cluster has a public FQDN and isn't behind a firewall blocking incoming connections. But what if there is a firewall blocking incoming connections? Fear not! The flexibility of NIFI comes to the rescue once again. The solution is to move the initiation of the S2S connection from the edge NIFI to central NIFI:

  • The remote process group in defined on the central node,
  • which connects to a output port on the edge node
  • as the edge NIFI node has a public FQDN (this is required!)
  • and instead of a S2S PUSH, the data is effectively PULLED from the edge NIFI cluster to the central NIFI cluster.

To be clear: this setup has the downside that the central cluster NIFI will need to know about all edge clusters. Not necessarily a big deal, just means the flow in the central NIFI cluster needs to be updated when edge clusters/nodes are added. But if you can't change the fact you have a firewall blocking incoming connections, it does the job.

Example solution NIFI flow setup

Screenshot of flow on Edge Node with a TailFile processor that send it's flowfiles to the output port named `logs`:

15288-flowedgenode.png

Screenshot of flow on central NIFI cluster with a remote process group pointed to the FQDN of the Edge Node and a connection from the output port `logs` to the rest of the flow:

15289-flowcentralnode.png

The configuration of the remote process group:

15290-remoteprocessgroupdetails.png

And the details of the `logs` connection:

15301-connectiondetails.png

1,538 Views
webinar banner
Version history
Last update:
‎08-17-2019 12:58 PM
Updated by:
Contributors
meetups banner