Created on 09-05-201706:12 PM - edited 08-17-201911:20 AM
Apache NiFi provides the option of starting an embedded ZooKeeper server. However, NiFi can also be configured to run with an external ZooKeeper server. This article describes how to install and configure a 3 host ZooKeeper ensemble to work with a 2 node NiFi cluster.
This tutorial was tested using the following environment and components:
Mac OS X 10.11.6
Apache ZooKeeper 3.4.6
Apache NiFi 1.3.0
The version of ZooKeeper chosen for this tutorial is
Note: ZooKeeper 3.4.6 is the version supported by the latest and previous versions of Hortonworks HDF as shown in the "Component Availability In HDF" table of the HDF 18.104.22.168 Release Notes.
http://www.apache.org/dyn/closer.cgi/zookeeper/ to determine the best Apache mirror site to download a stable ZooKeeper distribution. From that mirror site, select the zookeeper-3.4.6 directory and download the zookeeper-3.4.6.tar.gz file.
Unzip the tar.gz file and create 3 copies of the distribution directory, one for each host in the ZooKeeper ensemble. For example:
Note: In this tutorial, we are running multiple servers on the same machine.
Next we need to create three config files. In the
conf directory of zookeeper-1, create a zoo.cfg file with the following contents:
Because we are running multiple ZooKeeper servers on a single machine, we specified the servername as localhost with unique quorum & leader election ports (i.e. 2888:3888, 2889:3889, 2890:3890) for each server.X.
zoo.cfg files in the conf directories of zookeeper-2 and zookeeper-3 with modified values for dataDir and clientPort properties as separate dataDirs and distinct clientPorts are necessary.
Every machine that is part of the ZooKeeper ensemble needs to know about every other machine in the ensemble. As such, we need to attribute a server id to each machine by creating a file named
myid, one for each server, which resides in that server's data directory, as specified by the configuration file parameter dataDir.
For example, create a
myid file in /usr/local/zookeeper1 that consists of a single line with the text "1" and nothing else. Create the other myid files in the /usr/local/zookeeper2 and /usr/local/zookeeper3 directories with the contents of "2" and "3" respectively.
The first property configures NiFi to not use its embedded ZooKeeper. As a result, the
zookeeper.properties and state-management.xml files in the conf directory are ignored. The second property must be specified to join the cluster as it lists all the ZooKeeper instances in the ensemble.
In addition to the other setup steps from the CDC article, since this environment is a cluster, for the CaptureChangeMySQL processor, go to the Scheduling tab on the Configure Processor dialog. Change the Execution setting to "Primary node" from "All nodes":
Run the flow and select "View State" from the CaptureChangeMySQL and/or EnforceOrder processors to verify that state information is managed properly by the external ZooKeeper ensemble: