Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Apache Nifi not working with multiple notes in zookeeper string

avatar

I want to setup cluster of 3 nodes. Suppose below are the 3 nodes,

  1. Node 1 = 192.168.0.10
  2. Node 2 = 192.168.0.20
  3. Node 3 = 192.168.0.30

I am able to setup the cluster by using below zookeeper string in the respective nodes,

  1. Node 1 = 192.168.0.10:2181
  2. Node 2 = 192.168.0.10:2181,192.168.0.20:2181
  3. Node 3 = 192.168.0.10:2181,192.168.0.20:2181,192.168.0.30:2181

If Node 1 is up and running and I am turning of any other node than Node 1, then cluster is working without any issue.

 

But problem with above configuration is, if I am turning off the Node 1 (192.168.0.10), then cluster is failing with the connection loss status. As Node 2 and Node 3 waiting for Node 1 which is down which turns into cluster failure.

 

So I am trying to setup same zookeeper string for all the node at once and trying to start server one by one.

  1. Node 1 = 192.168.0.10:2181,192.168.0.20:2181,192.168.0.30:2181
  2. Node 2 = 192.168.0.10:2181,192.168.0.20:2181,192.168.0.30:2181
  3. Node 3 = 192.168.0.10:2181,192.168.0.20:2181,192.168.0.30:2181

So now Node 2 and Node 3 is already down (like never started). And I am trying to start Node 1 with zookeeper string "192.168.0.10:2181,192.168.0.20:2181,192.168.0.30:2181".

 

Node 1 not starting it all.

 

I am kind of new to the Nifi and working on clustering part. I am able to achieve all the clustering activity by taking reference of available resources and by doing some POC work. But only this is problem where I stuck and not able to move ahead.

 

I am using SSL Nifi Clustering. Can anyone please help me out with this ? Thanks in advance.

 

1 ACCEPTED SOLUTION

avatar

This issue resolved. External ZK did the trick. I am not able to figure the solution with embedded ZK. But using External ZK I am able to achieve the use case which I was trying.

 

Below is link where you will get few more details on this thread,

https://stackoverflow.com/questions/63339452/apache-nifi-not-working-with-multiple-notes-in-zookeepe... 

 

Thanks.

View solution in original post

4 REPLIES 4

avatar
Expert Contributor

Hi @yogesh_shisode awesome that you are exploring NiFi.

 

So just to be clear Apache Zookeeper can be considered an external service to help with state management / NiFi clustering

With that said and to make things "flow" better, NiFi allows us to start an embedded zookeeper cluster.

To me it seems that is what you are trying to connect to given the IP examples, so you are trying to use NiFi's embedded Zookeeper capability.

So let's delve a little into zookeeper, we have zookeeper the service that can be single node or multi node.

when in multi node we have a zookeeper ensemble and when we have that we need to maintain a quorum.

This answer is very eloquently explained https://stackoverflow.com/questions/25174622/difference-between-ensemble-and-quorum-in-zookeeper

 

And with that said please make sure you follow this guide: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#embedded_zookeeper

 

In it it discusses howe to configure the NiFi to start up as an embedded zookeeper service and the settings needed to accomplish this. 

 

For clarity port 2181 is the zookeeper listening port and depending on how many servers you configured to be your zookeeper servers based of this nifi.properties entry:

 

nifi.state.management.embedded.zookeeper.start=false

 

if it is set to true, then Nifi will start a zookeeper service too and will depend on this setting:

nifi.state.management.embedded.zookeeper.properties=./conf/zookeeper.properties

 

Which is all explained on the link I gave you https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#embedded_zookeeper

 

AND once you decide who the member nodes of your zookeeper are then all the NiFi's regardless of wether they are zookeeper servers or not should have this property set:

 

nifi.zookeeper.connect.string=

^^^ The same on all NiFi's

 

And using your IP examples if you want 3 servers to be zookeeper servers, then I would expect this setting to be:

nifi.zookeeper.connect.string=192.168.0.10:2181,192.168.0.20:2181,192.168.0.30:2181

 

And on those servers this setting to be:

nifi.state.management.embedded.zookeeper.start=true

 

And the additional configurations from the Apache NiFi admin guide linked above.

 

 

 

avatar

Hi @DigitalPlumber ,

 

Thanks for the reply. As I mentioned earlier I am able to achieve Secure Nifi Clustering. But the problem is if first node is removed out of 3 nodes cluster is getting impacted badly with connection loss.

 

And I was trying one more solution like to have same zookeeper string across the cluster. So now in nifi.properties, zookeeper.properties and state-management.xml file having common entry compared to all other nodes. Like below,

 

nifi.properties

 

nifi.state.management.configuration.file=./conf/state-management.xml
# The ID of the local state provider
nifi.state.management.provider.local=local-provider
# The ID of the cluster-wide state provider. This will be ignored if NiFi is not clustered but must be populated if running in a cluster.
nifi.state.management.provider.cluster=zk-provider
# Specifies whether or not this instance of NiFi should run an embedded ZooKeeper server
nifi.state.management.embedded.zookeeper.start=true
# Properties file that provides the ZooKeeper properties to use if <nifi.state.management.embedded.zookeeper.start> is set to true
nifi.state.management.embedded.zookeeper.properties=./conf/zookeeper.properties

# web properties #
nifi.web.https.host=192.168.0.10
nifi.web.https.port=2520

# security properties #
nifi.security.keystore=./ext-lib/ssl/keystore.jks
nifi.security.keystoreType=jks
nifi.security.keystorePasswd=CskME7zgbZiR1k/vwlPJcOayK4VPl3+gVIq/ZgD9c
nifi.security.keyPasswd=CskME7zgbiR1k/vwlPJc3OayK4VP3+gVIq/ZgUD9c
nifi.security.truststore=./ext-lib/ssl/truststore.jks
nifi.security.truststoreType=jks
nifi.security.truststorePasswd=dHNilK1qAKsX5ee3e7gcbg/yQQsVSbUrxGG4Lhr7Y
nifi.security.needClientAuth=true
nifi.security.user.authorizer=managed-authorizer
nifi.security.user.login.identity.provider=file-identity-provider
nifi.security.ocsp.responder.url=
nifi.security.ocsp.responder.certificate=

# cluster common properties (all nodes must have same values) #
nifi.cluster.protocol.heartbeat.interval=5 sec
nifi.cluster.protocol.is.secure=true

# cluster node properties (only configure for cluster nodes) #
nifi.cluster.is.node=true
nifi.cluster.node.address=192.168.0.10
nifi.cluster.node.protocol.port=2510
nifi.cluster.node.protocol.threads=10
nifi.cluster.node.protocol.max.threads=50
nifi.cluster.node.event.history.size=25
nifi.cluster.node.connection.timeout=5 sec
nifi.cluster.node.read.timeout=5 sec
nifi.cluster.node.max.concurrent.requests=100
nifi.cluster.firewall.file=
nifi.cluster.flow.election.max.wait.time=1 mins
nifi.cluster.flow.election.max.candidates=10

# cluster load balancing properties #
nifi.cluster.load.balance.host=
nifi.cluster.load.balance.port=6343
nifi.cluster.load.balance.connections.per.node=4
nifi.cluster.load.balance.max.thread.count=8
nifi.cluster.load.balance.comms.timeout=30 sec

# zookeeper properties, used for cluster management #
nifi.zookeeper.connect.string=192.168.0.10:2182,192.168.0.20:2182,192.168.0.30:2182
nifi.zookeeper.connect.timeout=3 secs
nifi.zookeeper.session.timeout=3 secs
nifi.zookeeper.root.node=/nifi

 

 

zookeeper.properties

 

initLimit=10
autopurge.purgeInterval=24
syncLimit=5
tickTime=2000
dataDir=./ext-lib/zookeeper-home/zookeeper
autopurge.snapRetainCount=30

server.1=192.168.0.10:2777:3777;2182
server.2=192.168.0.20:2777:3777;2182
server.3=192.168.0.30:2777:3777;2182

 

 

state-management.xml

 

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<stateManagement>
    <local-provider>
        <id>local-provider</id>
        <class>org.apache.nifi.controller.state.providers.local.WriteAheadLocalStateProvider</class>
        <property name="Directory">./state/local</property>
        <property name="Always Sync">false</property>
        <property name="Partitions">16</property>
        <property name="Checkpoint Interval">2 mins</property>
    </local-provider>
	
    <cluster-provider>
        <id>zk-provider</id>
        <class>org.apache.nifi.controller.state.providers.zookeeper.ZooKeeperStateProvider</class>
        <property name="Connect String">192.168.0.10:2182,192.168.0.20:2182,192.168.0.30:2182</property>
        <property name="Root Node">/nifi</property>
        <property name="Session Timeout">10 seconds</property>
        <property name="Access Control">Open</property>
    </cluster-provider>
</stateManagement>

 

 

Now after these configuration if I am moved ahead to start the server then I am getting below continuous errors.

 

 

2020-08-06 22:55:21,397 ERROR [Curator-Framework-0] o.a.c.f.imps.CuratorFrameworkImpl Background operation retry gave up
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:862)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:990)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:943)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:66)
at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:346)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2020-08-06 22:55:21,397 ERROR [Curator-Framework-0] o.a.c.f.imps.CuratorFrameworkImpl Background retry gave up
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:972)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:943)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:66)
at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:346)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2020-08-06 22:55:21,399 ERROR [Curator-Framework-0] o.a.c.f.imps.CuratorFrameworkImpl Background operation retry gave up
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:862)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:990)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:943)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:66)
at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:346)

 

 

Above are infinite logs and even it will start the Nifi. Once I am trying to access the nifi from the browser then I am getting cluster connection error message. Please see below screenshot for your reference. 
  NiFi Flow - Google Chrome 07-08-2020 10_00_32 (2)_LI.jpg

 

 

 

 

 

 

 

 

 

 

 

 

 

Its quite surprising for me. As If I am starting the server with in cluster with single zookeeper string, I am able to start without any issue. Single node is getting into cluster properly.

 

nifi.properties
nifi.zookeeper.connect.string=192.168.0.10:2182

zookeeper.properties
server.1=192.168.0.10:2777:3777;2182

state-management.xml
<property name="Connect String">192.168.0.10:2182</property>

 

 

But When I am moving ahead to add more than one node it is failing.

 

nifi.properties
nifi.zookeeper.connect.string=192.168.0.10:2182,192.168.0.20:2182,192.168.0.30:2182


zookeeper.properties
server.1=192.168.0.10:2777:3777;2182
server.2=192.168.0.20:2777:3777;2182
server.3=192.168.0.30:2777:3777;2182


state-management.xml
<property name="Connect String">192.168.0.10:2182,192.168.0.20:2182,192.168.0.30:2182</property>

 

 

I have followed all the steps for the same. If I missed out any of the step then even I won't be able to achieve 1 scenario. Please let me know if I am missing anything. If I am missing any of the configuration then of course it will be really very silly mistake which I am not able to figure it out for now. :).

 

Again thanks @DigitalPlumber  for taking out time and replying on this thread. Please let me know what solution will work for this.

 

Thanks in advance.

avatar

Hi @DigitalPlumber ,

 

Just to give few more details to above my reply. Below is the version of the Nifi I am using for clustering,

NiFi Flow - Google Chrome 08-08-2020 09_25_41 (2)_LI.jpg

Also if you can see in the screenshot, this nifi is in clustered with below entries in configuration file,

nifi.properties
nifi.zookeeper.connect.string=192.168.0.10:2182

zookeeper.properties
server.1=192.168.0.10:2777:3777;2182

state-management.xml
<property name="Connect String">192.168.0.10:2182</property>

Below below is the screenshot of the clustered node,

NiFi Flow - Google Chrome 08-08-2020 09_25_56 (2)_LI.jpg

These all screenshot which I have shared in reply in my both the comments are latest (like are of same day or day before I am commenting). 

 

Also on top of it, can you please give me more clarity on zookeeper.properties file node entry. I want to know more about port range which we are providing for node entry,

 

zookeeper.properties
server.1=192.168.0.10:2777:3777;2182

 

 

2777:3777 

What is use of above port range value ?

What is minimum and maximum value for this ?

Is it necessary to have difference of 1000 between these ports ?  (3777 - 2777 = 1000)

If we want to change the value of these port range then what is range of it ?

 

Thanks in advance.

avatar

This issue resolved. External ZK did the trick. I am not able to figure the solution with embedded ZK. But using External ZK I am able to achieve the use case which I was trying.

 

Below is link where you will get few more details on this thread,

https://stackoverflow.com/questions/63339452/apache-nifi-not-working-with-multiple-notes-in-zookeepe... 

 

Thanks.