We are currently use single node nifi server on aws ec2 instance, We are planning to move multi node cluster architecture. We are in development and testing stage. We need to setup as acvtive and passive nodes, which nifi should works fine whenever one of the server goes down another will pick. can anyone help us to move forward this scenerio or could you suggest any reference architure,
Within a NiFi cluster all nodes are active. Each node in the cluster runs its own local copy of the dataflows on the NiFi canvas (stored in the flow.xml.gz and loaded in memory). A NiFi cluster will have an elected cluster coordinator that is responsible for making sure every node in the cluster is running identical copies of the dataflows. No matter which NiFi cluster node a user is logged in to, the cluster coordinator is responsible for replicating requests made to all nodes and ensuring all responded to those requests.
When it comes to NiFi FlowFiles (objects moving from component to component via connections in your dataflow(s)), each node in the NiFi cluster will have its own set of local repositories and perform work on only the FlowFiles present on that specific node. NiFi nodes have no direct access to FlowFiles on other nodes. So if a node in your NiFi cluster goes down, the FlowFiles currently active on that down node will remain on that node until it is brought back online, at which time it will continue processing of those FlowFiles were it left off. Each NiFi node in the cluster should be setup to use RAID to protect the content of its content, flowfile, and provenance repositories to avoid data loss from disk issues.
While a node is down and/or disconnected from the cluster, NiFi will prevent new changes being made on the canvas. This is to prevent flows from becoming out of sync with the currently disconnected node. While changes are blocked, all additional nodes still in the cluster, will continue processing existing and new FlowFiles.
Hope this helps,