Support Questions

gunlomboy · ‎09-30-2025

Hi,

We have a NiFi cluster which spans two physical locations, Zone A and Zone B. We are clustering across locations for ease of management. In each location we have hosts running NiFi acting as data collectors, enriching the data and then sending to the main cluster via site-2-site (using remote processing groups).

My question is:

We want data originating in Zone 1 to only hit the NiFi Cluster nodes in Zone 1. Using site-2-site, even if we only configure the Zone 1 nodes in the RPG, NiFi will report the presence of the Zone 2 nodes hence data originating from Zone 1 will cross the WAN and end up in Zone 2. Does NiFi have any concept of zones?

We currently achieve this by configuring a firewall rule to block outgoing traffic to Zone 1 nodes from Zone 2 (and vice-versa), but this produces warnings in the RPG.

Any thoughts or ideas would be appreciated. Thanks

MattWho · ‎10-01-2025

@gunlomboy

NiFi Site-To-Site (S2S) protocol was not designed for the use case you are trying to achieve. The ability to provide a comma separated list of hostnames in the Remote Process Group (RPG) exists simply to provide fault tolerance.

The RPG will attempt to use the first hostname provided to fetch the S2S detail about the NiFi cluster from that host. These S2S details will include all the detail about the target NiFi cluster (This is why the RPG still distributed across all nodes in target cluster when only one hostname is provided). The extra hostnames provide fault tolerance for example lets say the hostname you configured is down, the RPG will attempt the second host to fetch S2S details.

The S2S details fetched from the contacted host will include things like hostnames of all nodes connected to cluster, RAW enabled status, Raw Port for each host, Secure enabled, individual target node workload, etc.
site-to-site-protocol-sequence

By setting up firewall rules, you are not changing the S2S details being refreshed regularly and the RPG will continue to build a distribution algorithm that includes all the target host nodes. This means very inefficient transfer of FlowFiles.

You might consider redesigning your dataflows to utilize PostHTTP processor (one for each host in target NiFi cluster with failure routed to other PostHTTP). Then setup a ListenHTTP on your receiving NIFi cluster. The postHTTP processor can be configured to "Send as FlowFile" so the NiFi Cluster will still receive the FlowFile content and associated FlowFile attributes/metadata just like how RPG sends FlowFiles. In Front of your ListenHTTP processor you could put a DistributeLoad processor to distribute FlowFiles being sent to your different target nodes.

The downside to a setup like this is if you add additional nodes in your zone1 or zone2 of the target NiFi Cluster, You'll need to update your dataflows to add more PostHTTP processors for those new target hosts. You also lose the benefit of the workload based FlowFile distribution build into the S2S protocol. That being said, there is no other options since you can't change the functionality of the S2S protocol.

Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.

Thank you,
Matt

Cloudera Community

Support Questions

Apache NiFi Zones