One of the most highly anticipated features of Apache NiFi 1.0.0 is the introduction
of Zero-Master Clustering. Previous versions of NiFi relied upon a single "Master Node"
(more formally known as the NiFi Cluster Manager) to show the User Interface. If this
node was lost, data continued to flow, but the
application was unable to show the topology of the flow, or show any stats. Additionally,
Site-to-Site communications continued to send data but could not obtain up-to-date
information about cluster topology, which resulted in less efficient load balancing.
Version 1.0.0 of NiFi addresses these issues by switching to a Zero-Master Clustering paradigm.
This post will explore the approaches taken to ensure that NiFi provides high availability
of the control plane without sacrificing the User Experience. After all, the User Experience is
what has allowed NiFi to become the go-to solution for providing dataflow management to small
organizations as well as the world's largest enterprises.
The benefit that the master/worker paradigm offered us was a design that was easy to reason over
and understand. All web requests were sent directly to the master. This means that coordination of
the flow was controlled by the master (e.g., it would prevent one user from modifying a Processor while
another user was modifying the Processor at the same time). The entire cluster topology was
stored only at the master. The "golden copy" of the flow configuration was held by the master. To
the extent possible, we wanted to keep this benefit of being easy to reason about how the system
works, while still overcoming all of these hurdles.
I am happy to say that the NiFi community has accomplished this goal, keeping a simple, easy-to-understand
design with all of the benefits of High Availability. To do this, we leveraged the power of Apache ZooKeeper in
order to provide automatic election of different clustering-related roles. In NiFi 1.0.0, we have two
different roles that are automatically elected. The first role is the Primary Node (Yes! Gone are the days
of having to manually switch which node is Primary Node). The second role is the Cluster Coordinator.
This new Cluster Coordinator role is responsible for monitoring the nodes in a cluster and marking any
nodes that fail to heartbeat as being "Disconnected." Additionally, the Cluster Coordinator provides a
mechanism to ensure that the flow is consistent across all nodes. This is accomplished by forwarding all
web-based requests to the Coordinator. The Coordinator can then replicate this request to all nodes in
the cluster and merge their responses into a single, unified view, in much the same way that the old
Cluster Manager did. However, with the shift to the Cluster Coordinator, if the node that is elected
Cluster Coordinator drops from the cluster, a new node will automatically pick up these responsibilities.
This approach means that users are able to navigate to the URL of any node in a NiFi cluster, so users
need not concern themselves with which node is currently elected the Cluster Coordinator. All of the
necessary coordination, such as component locking, is handled at a single point, so there is no need to
introduce expensive and difficult-to-understand distributed locking mechanisms.
Additionally, these changes provide a great footing to build upon for the upcoming changes that are planned
for Data Replication across nodes in a NiFi cluster. A NiFi Feature Proposal outlines this feature at
a fairly high level at https://cwiki.apache.org/confluence/display/NIFI/Data+Replication. This notion of
an automatically elected, highly available Cluster Coordinator means that we can also develop an
easy-to-understand approach for this Data Replication, as well, since we are able to elect a single
node to coordinate the failover of the data processing.
Also new to NiFi 1.0.0 is an overhaul of the security model and component-level versioning. We refer
to these updates jointly as providing multi-tenancy. NiFi now supports any number of users viewing and
modifying the flow at the same time without the need to continually refresh the flow. In addition to this,
permissions can now be given to users to read or modify any component, individually. Prior to version 1.0.0,
NiFi required that users be given read-only access or write access to the entire flow. However, as NiFi
continues to gain more and more adoption, enterprise users have been seeking the ability to restrict access
to specific components to different users. This is now possible, with a simple, intuitive user interface to
provide and configure access policies. Bryan Bende, an Apache NiFi PMC member has provided an excellent overview
of this feature at http://bryanbende.com/development/2016/08/17/apache-nifi-1-0-0-authorization-and-multi-tenancy.