Created 05-04-2017 11:55 AM
Hello,
I am trying to set up a clustering with 3 nodes but without using embedded zookeeper. I installed and started a separate zookeeper on port 2181 on one of the nodes (node1). My other nodes properties are as follows:
nifi.cluster.is.node=true
nifi.cluster.node.address=IP_MACHINE
nifi.cluster.node.protocol.port=9000
nifi.cluster.node.protocol.threads=10
nifi.cluster.node.event.history.size=25
nifi.cluster.node.connection.timeout=5 sec
nifi.cluster.node.read.timeout=5 sec
nifi.cluster.firewall.file=
nifi.cluster.flow.election.max.wait.time=1 mins
nifi.cluster.flow.election.max.candidates=3
nifi.zookeeper.connect.string=IP_MACHINE_1:2181
nifi.zookeeper.connect.timeout=3 secs
nifi.zookeeper.session.timeout=3 secs
nifi.zookeeper.root.node=/nifi
And I have kept remote (site to site) properties empty for the first node and following on other 2 nodes:
nifi.remote.input.host=IP_MACHINE_1
nifi.remote.input.secure=false
nifi.remote.input.socket.port=
nifi.remote.input.http.enabled=true
nifi.remote.input.http.transaction.ttl=30 sec
Then I start all the nodes. I get following logs in the node1:
2017-05-04 11:46:48,843 INFO [Process Cluster Protocol Request-10] o.a.n.c.p.impl.SocketProtocolListener Finished processing request 3fadedb5-c6f0-4fe8-ad02-091799b5c242 (type=NODE_CONNECTION_STATUS_REQUEST, length=97 bytes) from MACHINE_IP_3 in 0 millis 2017-05-04 11:46:50,539 INFO [Process Cluster Protocol Request-1] o.a.n.c.p.impl.SocketProtocolListener Finished processing request 4b414f92-8486-4e7f-9c4c-e184279611b1 (type=HEARTBEAT, length=2458 bytes) from localhost:9993 in 1 millis 2017-05-04 11:46:52,083 INFO [Process Cluster Protocol Request-4] o.a.n.c.p.impl.SocketProtocolListener Finished processing request 82956fb1-7696-41a6-a5de-36e2b4362889 (type=NODE_CONNECTION_STATUS_REQUEST, length=97 bytes) from MACHINE_IP_2 in 0 millis 2017-05-04 11:46:52,421 INFO [Process Cluster Protocol Request-2] o.a.n.c.p.impl.SocketProtocolListener Finished processing request d9f3c705-4feb-452f-b6aa-ff2d01bd3f7f (type=HEARTBEAT, length=2456 bytes) from localhost:9994 in 1 millis 2017-05-04 11:46:53,848 INFO [Process Cluster Protocol Request-6] o.a.n.c.p.impl.SocketProtocolListener Finished processing request 45063482-48a2-4edf-b064-35840f6fcf6e (type=NODE_CONNECTION_STATUS_REQUEST, length=97 bytes) from MACHINE_IP_3 in 0 millis 2017-05-04 11:46:54,262 WARN [main] o.a.nifi.controller.StandardFlowService Failed to connect to cluster due to: org.apache.nifi.cluster.protocol.ProtocolException: Failed to create socket to MACHINE_IP_1:9000 due to: java.net.ConnectException: Connection timed out (Connection timed out)
So it received heartbeat messages from localhost:9993 (which actually is node2).
I checked Zookeeper and it shows 2 Primary nodes (the 2 nodes not running zookeeper) and all 3 nodes connected in the Cluster Coordinator.
When I check UI, on Machine 1, I get:
Cluster is still in the process of voting on the appropriate Data Flow.
On Machine 2 and 3, I get following message on UI:
Action cannot be performed because there is currently no Cluster Coordinator elected. The request should be tried again after a moment, after a Cluster Coordinator has been automatically elected.
I seem to have configured the zookeeper and other properties properly and still it is not able to elect the cluster coordinator.
Thanks in advace.
Created 05-04-2017 07:43 PM
There are a few things that do not look right in your nifi.properties configuration above:
On every node the following properties should be configured with the FQDN of the node:
1. nifi.remote.input.host= 2. nifi.cluster.node.address= 3. nifi.web.http.host= or nifi.web.https.host=
I noticed you are configuring nifi.remote.input.host= with the IP of a different node.
It is not clear from the above if you set a value for nifi.web.http.host= or nifi.web.https.host=. If you did not, Java may be resolving your hostname to localhost. This can be problematic for cluster communications. Since node may end up trying to talk to themselves rather then actually talking to the other nodes.
Also make sure that the following ports are open in any firewalls between your nodes:
1. nifi.remote.input.socket.port= 2. nifi.cluster.node.protocol.port= 3. nifi.web.http.port=8080 or nifi.web.https.port=
Also make sure all three of your nodes can talk to zookeeper on port 2181.
Thanks,
Matt
Created 05-04-2017 07:04 PM
In nifi.properties on each node of your cluster, nifi.state.management.embedded.zookeeper.start set to false?
Created 05-05-2017 06:19 AM
@Jeff Storck, yes the property nifi.state.management.embedded.zookeeper.start is set to false in all the 3 nodes.
Created 05-04-2017 07:43 PM
There are a few things that do not look right in your nifi.properties configuration above:
On every node the following properties should be configured with the FQDN of the node:
1. nifi.remote.input.host= 2. nifi.cluster.node.address= 3. nifi.web.http.host= or nifi.web.https.host=
I noticed you are configuring nifi.remote.input.host= with the IP of a different node.
It is not clear from the above if you set a value for nifi.web.http.host= or nifi.web.https.host=. If you did not, Java may be resolving your hostname to localhost. This can be problematic for cluster communications. Since node may end up trying to talk to themselves rather then actually talking to the other nodes.
Also make sure that the following ports are open in any firewalls between your nodes:
1. nifi.remote.input.socket.port= 2. nifi.cluster.node.protocol.port= 3. nifi.web.http.port=8080 or nifi.web.https.port=
Also make sure all three of your nodes can talk to zookeeper on port 2181.
Thanks,
Matt
Created 05-05-2017 07:14 PM
Thanks @Matt Clarke,
I added FQDNs and it worked like charm.
Created 05-09-2017 07:28 AM
Hello @Matt Clarke,
I started running the nodes in the cluster. Nodes are shown as 3/3 connected. But there is something wrong. When I add a processor in one of the nodes and then configure that processor, the URL redirects to its internal ip.
For example, my public ip is : a.b.c.d and internal ip is a1.b1.c1.d1 then configuring processor redirects to :
a1.b1.c1.d1:9999/nifi-api/processors/ID_of_Processor
while it should be a.b.c.d/nifi-api/processors/ID_of_Processor