Created 11-08-2016 09:24 PM
In NiFi 1.0 (HDF 2.x) I see illustration of zookeeper handling failover for primary node. Is the primary node simply the data flow manager (UI)? are there other task outside of the UI the primary node performs?
Created 11-08-2016 09:39 PM
ah my bad I found the answer here
https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html
it is not the same. UI runs on all nodes.
Primary Node: Every cluster has one Primary Node. On this node, it is possible to run "Isolated Processors" (see below). ZooKeeper is used to automatically elect a Primary Node. If that node disconnects from the cluster for any reason, a new Primary Node will automatically be elected. Users can determine which node is currently elected as the Primary Node by looking at the Cluster Management page of the User Interface.
Isolated Processors: In a NiFi cluster, the same dataflow runs on all the nodes. As a result, every component in the flow runs on every node. However, there may be cases when the DFM would not want every processor to run on every node. The most common case is when using a processor that communicates with an external service using a protocol that does not scale well. For example, the GetSFTP processor pulls from a remote directory, and if the GetSFTP Processor runs on every node in the cluster tries simultaneously to pull from the same remote directory, there could be race conditions. Therefore, the DFM could configure the GetSFTP on the Primary Node to run in isolation, meaning that it only runs on that node. It could pull in data and - with the proper dataflow configuration - load-balance it across the rest of the nodes in the cluster. Note that while this feature exists, it is also very common to simply use a standalone NiFi instance to pull data and feed it to the cluster. It just depends on the resources available and how the Administrator decides to configure the cluster.
Created 11-08-2016 09:39 PM
ah my bad I found the answer here
https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html
it is not the same. UI runs on all nodes.
Primary Node: Every cluster has one Primary Node. On this node, it is possible to run "Isolated Processors" (see below). ZooKeeper is used to automatically elect a Primary Node. If that node disconnects from the cluster for any reason, a new Primary Node will automatically be elected. Users can determine which node is currently elected as the Primary Node by looking at the Cluster Management page of the User Interface.
Isolated Processors: In a NiFi cluster, the same dataflow runs on all the nodes. As a result, every component in the flow runs on every node. However, there may be cases when the DFM would not want every processor to run on every node. The most common case is when using a processor that communicates with an external service using a protocol that does not scale well. For example, the GetSFTP processor pulls from a remote directory, and if the GetSFTP Processor runs on every node in the cluster tries simultaneously to pull from the same remote directory, there could be race conditions. Therefore, the DFM could configure the GetSFTP on the Primary Node to run in isolation, meaning that it only runs on that node. It could pull in data and - with the proper dataflow configuration - load-balance it across the rest of the nodes in the cluster. Note that while this feature exists, it is also very common to simply use a standalone NiFi instance to pull data and feed it to the cluster. It just depends on the resources available and how the Administrator decides to configure the cluster.
Created 11-10-2016 06:35 PM
Sunile,
The Dataflow Manager is a role assumed by a person to build, configure, and monitor the data flow. They can do this via the REST API or the UI. The UI is exposed on all nodes in Apache NiFi 1.0.0+ (HDF 2.0+). Previously, in a clustered environment, only the Primary Node NCM exposed the UI. This may be the source of your confusion.
Created 11-10-2016 06:38 PM
I believe the NCM exposed the UI in previous versions (which is not necessarily the same as the primary node).
Created 11-11-2016 06:11 PM
@slachterman you are correct and I have updated my answer to reflect this. Thanks.