Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Is NiFi Primary Node and Dataflow manager the same?

avatar
Master Guru

In NiFi 1.0 (HDF 2.x) I see illustration of zookeeper handling failover for primary node. Is the primary node simply the data flow manager (UI)? are there other task outside of the UI the primary node performs?

1 ACCEPTED SOLUTION

avatar
Master Guru

ah my bad I found the answer here

https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html

it is not the same. UI runs on all nodes.

Primary Node: Every cluster has one Primary Node. On this node, it is possible to run "Isolated Processors" (see below). ZooKeeper is used to automatically elect a Primary Node. If that node disconnects from the cluster for any reason, a new Primary Node will automatically be elected. Users can determine which node is currently elected as the Primary Node by looking at the Cluster Management page of the User Interface.

Isolated Processors: In a NiFi cluster, the same dataflow runs on all the nodes. As a result, every component in the flow runs on every node. However, there may be cases when the DFM would not want every processor to run on every node. The most common case is when using a processor that communicates with an external service using a protocol that does not scale well. For example, the GetSFTP processor pulls from a remote directory, and if the GetSFTP Processor runs on every node in the cluster tries simultaneously to pull from the same remote directory, there could be race conditions. Therefore, the DFM could configure the GetSFTP on the Primary Node to run in isolation, meaning that it only runs on that node. It could pull in data and - with the proper dataflow configuration - load-balance it across the rest of the nodes in the cluster. Note that while this feature exists, it is also very common to simply use a standalone NiFi instance to pull data and feed it to the cluster. It just depends on the resources available and how the Administrator decides to configure the cluster.

View solution in original post

4 REPLIES 4

avatar
Master Guru

ah my bad I found the answer here

https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html

it is not the same. UI runs on all nodes.

Primary Node: Every cluster has one Primary Node. On this node, it is possible to run "Isolated Processors" (see below). ZooKeeper is used to automatically elect a Primary Node. If that node disconnects from the cluster for any reason, a new Primary Node will automatically be elected. Users can determine which node is currently elected as the Primary Node by looking at the Cluster Management page of the User Interface.

Isolated Processors: In a NiFi cluster, the same dataflow runs on all the nodes. As a result, every component in the flow runs on every node. However, there may be cases when the DFM would not want every processor to run on every node. The most common case is when using a processor that communicates with an external service using a protocol that does not scale well. For example, the GetSFTP processor pulls from a remote directory, and if the GetSFTP Processor runs on every node in the cluster tries simultaneously to pull from the same remote directory, there could be race conditions. Therefore, the DFM could configure the GetSFTP on the Primary Node to run in isolation, meaning that it only runs on that node. It could pull in data and - with the proper dataflow configuration - load-balance it across the rest of the nodes in the cluster. Note that while this feature exists, it is also very common to simply use a standalone NiFi instance to pull data and feed it to the cluster. It just depends on the resources available and how the Administrator decides to configure the cluster.

avatar

Sunile,

The Dataflow Manager is a role assumed by a person to build, configure, and monitor the data flow. They can do this via the REST API or the UI. The UI is exposed on all nodes in Apache NiFi 1.0.0+ (HDF 2.0+). Previously, in a clustered environment, only the Primary Node NCM exposed the UI. This may be the source of your confusion.

avatar

I believe the NCM exposed the UI in previous versions (which is not necessarily the same as the primary node).

avatar

@slachterman you are correct and I have updated my answer to reflect this. Thanks.