Created 01-28-2024 10:36 PM
Hi,
I am using 3 node NiFi (Version: 1.23.2) cluster. If a node goes down, it is known that the flow can be directed to another node, but what about the data that is queued in the failed node? Is there anyway to replicate the data from failed node to other node? or will have to wait until the node comes back up? Please explain how to handle this scenario.
Created 01-29-2024 03:36 AM
When a specific node is down, all the new ingestion will be handled by the remaining nodes.
But the data queued on the down node will be there on that node until the node comes back. If this is HW failure such as disk failure and the node can't be brought back, stuck data will be lost.
Since NiFi is not a storage system but more of an In/out pipeline, we expect the data that is pushed to NiFi not to be the only copy. systems that are pushing data to NiFi should have some room to keep the data intact for some buffer days/hrs until it is processed.
NiFi does not provide any sort of backup/restore or real-time replication.
To avoid such data loss it is expected to have the data at source for some time until data is processed by NiFi for further downstream processing.
Hope this helps.
Thank you
Created 01-29-2024 05:44 AM
@PriyankaMondal
Just to add to What @ckumar provided, the NiFi repositories are not locked to the specific node. What i mean by that is that they can be moved to a new node, withe "new" being the key word there. A typical prod NiFi setup will use protected storage for its flowfile_repository and content_repository(s) which hold all the FlowFile metadata and FlowFile content for all actively queued and archived FlowFiles on a node. To prevent loss of data, these repositories should be protected through the use of RAID storage or some other equivalent protected storage. The data stored in these repositories is tightly coupled to the flow.xml.gz/flow.json.gz that cluster is running on every node. Let's say you have hardware failure, it may be faster to standup a new server then repair the existing hardware failure. You can simple move or copy the protected repositories to the new node before starting it. When the node starts and joins your existing cluster it will inherit the cluster's flow.xml.gz/flow.json.gz and then begin loading the FlowFiles from those moved repositories in to the connection queues. Processing will continue exactly where it left off on the old node. There is no way to merge repositories together, so you can not add the contents of one nodes repositories to the already existing repositories of another node.
The provenance_repository holds lineage data, and the database_repository holds flow configuration history and some node specific info. Neither of these are needed to preserve the actual FlowFiles.
Hope this helps,
Matt