Created 02-01-2018 10:26 AM
Hi team ,
We are using NiFi 1.2 and using 2 nodes for clustering (Lets say NodeA and NodeB). We are not able to understand the how nifi clustering works.
We created processors using URL https://NodeA:9091/nifi/ . If NodeA went down then do we have same processors available if we access using https://NodeB:9091/nifi/ . Or do we need to perform any manual task ?
Regards
Laiju
Created 04-16-2018 06:53 PM
Here is a link to the documentation covering clustering for that version of NiFi: Clustering Configuration
Created 04-23-2018 01:39 PM
The most important things to understand about NiFi's cluster architecture is the every node in the cluster runs with its own local copy of the flow.xml.gz (this file contains every configuration any user has made the the NiFi Ui (building flows on canvas, adding reporting tasks, adding controller services, etc...).
-
Because of NiFi's HA control layer, user can login to any node in the an active cluster and make changes within the canvas. The NiFi control layer takes care of making sure those changes are replicated to every node connected to that cluster.
-
Each node also node runs with its own set of repositories (FlowFile, content, provenance and database). Since NiFi does not currently have a HA data layer, should a NiFi node go down the data currently being processed by that node will not be processed until that node is restarted. It is important that the flowfile and content repositories (essential for data integrity) are protected through using RAID disk setups. It is actually easy to standup an entirely new node that uses these same repos and pickup where the old dead node left off. There is no way to merge the contents of two node's repositories together however.
-
Thank you,
Matt
Created 04-16-2018 08:08 PM
You do not need to do anything "special or manual" for NiFi flow to run on the other machine in case of a node failure. NiFi employs a Zero-Master Clustering paradigm. Each node in the cluster performs the same tasks on the data, but each operates on a different set of data.
So if a node fails, the other one has "sufficient information" to keep continuing.
You can have a more in-depth understanding here .