Support Questions

Find answers, ask questions, and share your expertise

Nifi 1.1.1 Clustering without embedded zookeeper

avatar
Contributor

Hello,

I am trying to set up a clustering with 3 nodes but without using embedded zookeeper. I installed and started a separate zookeeper on port 2181 on one of the nodes (node1). My other nodes properties are as follows:

nifi.cluster.is.node=true

nifi.cluster.node.address=IP_MACHINE

nifi.cluster.node.protocol.port=9000

nifi.cluster.node.protocol.threads=10

nifi.cluster.node.event.history.size=25

nifi.cluster.node.connection.timeout=5 sec

nifi.cluster.node.read.timeout=5 sec

nifi.cluster.firewall.file=

nifi.cluster.flow.election.max.wait.time=1 mins

nifi.cluster.flow.election.max.candidates=3

nifi.zookeeper.connect.string=IP_MACHINE_1:2181

nifi.zookeeper.connect.timeout=3 secs

nifi.zookeeper.session.timeout=3 secs

nifi.zookeeper.root.node=/nifi

And I have kept remote (site to site) properties empty for the first node and following on other 2 nodes:

nifi.remote.input.host=IP_MACHINE_1

nifi.remote.input.secure=false

nifi.remote.input.socket.port=

nifi.remote.input.http.enabled=true

nifi.remote.input.http.transaction.ttl=30 sec

Then I start all the nodes. I get following logs in the node1:

2017-05-04 11:46:48,843 INFO [Process Cluster Protocol Request-10] o.a.n.c.p.impl.SocketProtocolListener Finished processing request 3fadedb5-c6f0-4fe8-ad02-091799b5c242 (type=NODE_CONNECTION_STATUS_REQUEST, length=97 bytes) from MACHINE_IP_3 in 0 millis
2017-05-04 11:46:50,539 INFO [Process Cluster Protocol Request-1] o.a.n.c.p.impl.SocketProtocolListener Finished processing request 4b414f92-8486-4e7f-9c4c-e184279611b1 (type=HEARTBEAT, length=2458 bytes) from localhost:9993 in 1 millis
2017-05-04 11:46:52,083 INFO [Process Cluster Protocol Request-4] o.a.n.c.p.impl.SocketProtocolListener Finished processing request 82956fb1-7696-41a6-a5de-36e2b4362889 (type=NODE_CONNECTION_STATUS_REQUEST, length=97 bytes) from MACHINE_IP_2 in 0 millis
2017-05-04 11:46:52,421 INFO [Process Cluster Protocol Request-2] o.a.n.c.p.impl.SocketProtocolListener Finished processing request d9f3c705-4feb-452f-b6aa-ff2d01bd3f7f (type=HEARTBEAT, length=2456 bytes) from localhost:9994 in 1 millis
2017-05-04 11:46:53,848 INFO [Process Cluster Protocol Request-6] o.a.n.c.p.impl.SocketProtocolListener Finished processing request 45063482-48a2-4edf-b064-35840f6fcf6e (type=NODE_CONNECTION_STATUS_REQUEST, length=97 bytes) from MACHINE_IP_3 in 0 millis
2017-05-04 11:46:54,262 WARN [main] o.a.nifi.controller.StandardFlowService Failed to connect to cluster due to: org.apache.nifi.cluster.protocol.ProtocolException: Failed to create socket to MACHINE_IP_1:9000 due to: java.net.ConnectException: Connection timed out (Connection timed out)


So it received heartbeat messages from localhost:9993 (which actually is node2).

I checked Zookeeper and it shows 2 Primary nodes (the 2 nodes not running zookeeper) and all 3 nodes connected in the Cluster Coordinator.

When I check UI, on Machine 1, I get:

Cluster is still in the process of voting on the appropriate Data Flow.

On Machine 2 and 3, I get following message on UI:

Action cannot be performed because there is currently no Cluster Coordinator elected. The request should be tried again after a moment, after a Cluster Coordinator has been automatically elected.

I seem to have configured the zookeeper and other properties properly and still it is not able to elect the cluster coordinator.

Thanks in advace.

1 ACCEPTED SOLUTION

avatar
Master Mentor
@Jatin Kheradiya

There are a few things that do not look right in your nifi.properties configuration above:

On every node the following properties should be configured with the FQDN of the node:

1. nifi.remote.input.host=
2. nifi.cluster.node.address=
3. nifi.web.http.host=  or   nifi.web.https.host=

I noticed you are configuring nifi.remote.input.host= with the IP of a different node.

It is not clear from the above if you set a value for nifi.web.http.host= or nifi.web.https.host=. If you did not, Java may be resolving your hostname to localhost. This can be problematic for cluster communications. Since node may end up trying to talk to themselves rather then actually talking to the other nodes.

Also make sure that the following ports are open in any firewalls between your nodes:

1. nifi.remote.input.socket.port=
2. nifi.cluster.node.protocol.port=
3. nifi.web.http.port=8080   or    nifi.web.https.port=

Also make sure all three of your nodes can talk to zookeeper on port 2181.

Thanks,

Matt

View solution in original post

5 REPLIES 5

avatar
Expert Contributor
@Jatin Kheradiya

In nifi.properties on each node of your cluster, nifi.state.management.embedded.zookeeper.start set to false?

avatar
Contributor

@Jeff Storck, yes the property nifi.state.management.embedded.zookeeper.start is set to false in all the 3 nodes.

avatar
Master Mentor
@Jatin Kheradiya

There are a few things that do not look right in your nifi.properties configuration above:

On every node the following properties should be configured with the FQDN of the node:

1. nifi.remote.input.host=
2. nifi.cluster.node.address=
3. nifi.web.http.host=  or   nifi.web.https.host=

I noticed you are configuring nifi.remote.input.host= with the IP of a different node.

It is not clear from the above if you set a value for nifi.web.http.host= or nifi.web.https.host=. If you did not, Java may be resolving your hostname to localhost. This can be problematic for cluster communications. Since node may end up trying to talk to themselves rather then actually talking to the other nodes.

Also make sure that the following ports are open in any firewalls between your nodes:

1. nifi.remote.input.socket.port=
2. nifi.cluster.node.protocol.port=
3. nifi.web.http.port=8080   or    nifi.web.https.port=

Also make sure all three of your nodes can talk to zookeeper on port 2181.

Thanks,

Matt

avatar
Contributor

Thanks @Matt Clarke,

I added FQDNs and it worked like charm.

avatar
Contributor

Hello @Matt Clarke,

I started running the nodes in the cluster. Nodes are shown as 3/3 connected. But there is something wrong. When I add a processor in one of the nodes and then configure that processor, the URL redirects to its internal ip.

For example, my public ip is : a.b.c.d and internal ip is a1.b1.c1.d1 then configuring processor redirects to :

a1.b1.c1.d1:9999/nifi-api/processors/ID_of_Processor

while it should be a.b.c.d/nifi-api/processors/ID_of_Processor