Created 11-14-2016 09:27 PM
How does Nifi Fault Tolerance and more importantly FlowFile acking work?
Specifically looking at Nifi from a streaming background:
This post goes high level on but doesn't talk about specifics of how the failover will work. https://community.hortonworks.com/questions/46887/is-nifi-fault-tolerant-against-machine-failures.ht...
Created 11-14-2016 09:47 PM
Each Node in a NiFi cluster runs its own copy of the dataflow and works on its own set of FlowFiles. Node A for example is unaware of the existence of Node B.
NiFi does persist all FlowFiles (attributes and content) in to local repositories on each node in the cluster. That is why is is important to make these repo fault tolerant (For example using RAID 10 Disk for your repos).
Should a node go down, as long as you have access to those repos and copy of the flow.xml.gz, you can recover your dataflow where it left off, even if that means spinning up a new NiFi and pointing it at those existing repos.
NiFi comes with no automated built in process for this.
While Nodes at this current time are not aware of other nodes or the data the currently have queued, This is a roadmap item for a future version of NiFi. At this time the HA Data plane stuff has not been committed to any particular release to the best of my knowledge.
Thanks,
Matt
Created 11-14-2016 09:32 PM
Hi Ambud,
This section in the Storm documentation provides an excellent explanation on how acking and fault tolerance work together to guarantee at least once processing:
http://storm.apache.org/releases/1.0.2/Guaranteeing-message-processing.html
Created 11-14-2016 09:37 PM
Sorry, misread your question as Storm fault tolerance, not NiFi.
Created 11-14-2016 09:34 PM
@Ryan Templeton thanks for your answer Ryan; the question is how it works in Nifi not Storm.
Created 11-14-2016 09:47 PM
Each Node in a NiFi cluster runs its own copy of the dataflow and works on its own set of FlowFiles. Node A for example is unaware of the existence of Node B.
NiFi does persist all FlowFiles (attributes and content) in to local repositories on each node in the cluster. That is why is is important to make these repo fault tolerant (For example using RAID 10 Disk for your repos).
Should a node go down, as long as you have access to those repos and copy of the flow.xml.gz, you can recover your dataflow where it left off, even if that means spinning up a new NiFi and pointing it at those existing repos.
NiFi comes with no automated built in process for this.
While Nodes at this current time are not aware of other nodes or the data the currently have queued, This is a roadmap item for a future version of NiFi. At this time the HA Data plane stuff has not been committed to any particular release to the best of my knowledge.
Thanks,
Matt
Created 11-14-2016 09:55 PM
So the repo needs to be a shared (like NFS) storage between Nifi nodes?
Created 11-14-2016 09:49 PM
Flow files content is written in the content repository. When a node goes down, NiFi cluster manager will route the data to another node. However, NiFi does not replicate data like Kafka. The queued data for the failed node will still be queued for failed node. Only that data must be manually sent over to the live node in the cluster or just bring the failed node up. Any new data, will automatically be routed to other nodes in the cluster by NiFi Cluster Manager (NCM).
http://alvincjin.blogspot.com/2016/08/fault-tolerant-in-apache-nifi-07-cluster.html
Created 11-14-2016 09:54 PM
Only the NiFi 0.x or HDF 1.x versions of NiFi use a NCM. NiFI 1.x or HDF 2.x versions have moved to zero master clustering and do not have an NCM anymore (HA control plane).
The routing of data you are referring to is specific to data being sent to your NiFi cluster via Site-to-Site (S2S). S2S does make sure that data continues to route to only the available destination nodes.
Matt