Support Questions

Find answers, ask questions, and share your expertise

NiFi Clustering Issue ConnectionLoss Error

avatar
Rising Star

Hi Everyone!

I keep getting a: ERROR [Curator-Framework-0] o.a.c.f.imps.CuratorFrameworkImpl Background retry gave up org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss

My EDITABLE config files are here in the associated directories in my google drive below and attached, this is a 3 node cluster:

https://drive.google.com/drive/folders/11xM-sz8mUvpaiOOS4aiZ94TQGHHzConF?usp=sharing

I made sure FirewallD was off, all ports used are free, and I followed these guides:

https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.1.1/bk_administration/content/clustering.html

AND

https://community.hortonworks.com/articles/135820/configuring-an-external-zookeeper-to-work-with-apa...

Any help would be greatly appreciated!!

Here's my configs in plain text for reference:

###################################################10.0.0.89 Server###################################################### 
 
####ZooKeeper.properties File##### 


clientPort=24489
initLimit=10
autopurge.purgeInterval=24
syncLimit=5
tickTime=2000
dataDir=./state/zookeeper
autopurge.snapRetainCount=30




server.1=10.0.0.89:2888:3888
 server.2=10.0.0.227:2888:3888
 server.3=10.0.0.228:2888:3888
 


 
 
 ####nifi.properties cluster section####
 
# cluster common properties (all nodes must have same values) #
nifi.cluster.protocol.heartbeat.interval=5 sec
nifi.cluster.protocol.is.secure=false


# cluster node properties (only configure for cluster nodes) #
nifi.cluster.is.node=true
nifi.cluster.node.address=10.0.0.89
nifi.cluster.node.protocol.port=24489
nifi.cluster.node.protocol.threads=10
nifi.cluster.node.protocol.max.threads=50
nifi.cluster.node.event.history.size=25
nifi.cluster.node.connection.timeout=5 sec
nifi.cluster.node.read.timeout=5 sec
nifi.cluster.node.max.concurrent.requests=100
nifi.cluster.firewall.file=
nifi.cluster.flow.election.max.wait.time=5 mins
nifi.cluster.flow.election.max.candidates=3


# zookeeper properties, used for cluster management #
nifi.zookeeper.connect.string=10.0.0.89:24489,10.0.0.227:24427,10.0.0.28:24428
nifi.zookeeper.connect.timeout=3 secs
nifi.zookeeper.session.timeout=3 secs
nifi.zookeeper.root.node=/nifi








 
 
###################################################################10.0.0.227 Server################################################################################################
 
####ZooKeeper.properties File#####  
 
 clientPort=24427
initLimit=10
autopurge.purgeInterval=24
syncLimit=5
tickTime=2000
dataDir=./state/zookeeper
autopurge.snapRetainCount=30




server.1=10.0.0.89:2888:3888
 server.2=10.0.0.227:2888:3888
 server.3=10.0.0.228:2888:3888
 
 
#####nifi.properties cluster section ###################


# cluster common properties (all nodes must have same values) #
nifi.cluster.protocol.heartbeat.interval=5 sec
nifi.cluster.protocol.is.secure=false


# cluster node properties (only configure for cluster nodes) #
nifi.cluster.is.node=true
nifi.cluster.node.address=10.0.0.227
nifi.cluster.node.protocol.port=24427
nifi.cluster.node.protocol.threads=10
nifi.cluster.node.protocol.max.threads=50
nifi.cluster.node.event.history.size=25
nifi.cluster.node.connection.timeout=5 sec
nifi.cluster.node.read.timeout=5 sec
nifi.cluster.node.max.concurrent.requests=100
nifi.cluster.firewall.file=
nifi.cluster.flow.election.max.wait.time=5 mins
nifi.cluster.flow.election.max.candidates=3


# zookeeper properties, used for cluster management #
nifi.zookeeper.connect.string=10.0.0.89:24489,10.0.0.227:24427,10.0.0.28:24428
nifi.zookeeper.connect.timeout=3 secs
nifi.zookeeper.session.timeout=3 secs
nifi.zookeeper.root.node=/nifi
 
 
 
 
 
 
##################################################################################10.0.0.228 Server####### ##############################################################################
 
####ZooKeeper.properties File##### 
 
clientPort=24428
initLimit=10
autopurge.purgeInterval=24
syncLimit=5
tickTime=2000
dataDir=./state/zookeeper
autopurge.snapRetainCount=30




server.1=10.0.0.89:2888:3888
 server.2=10.0.0.227:2888:3888
 server.3=10.0.0.228:2888:3888
 
 
 
 #####nifi.properties cluster section ##########
 
 # cluster common properties (all nodes must have same values) #
nifi.cluster.protocol.heartbeat.interval=5 sec
nifi.cluster.protocol.is.secure=false


# cluster node properties (only configure for cluster nodes) #
nifi.cluster.is.node=true
nifi.cluster.node.address=10.0.0.228
nifi.cluster.node.protocol.port=24428
nifi.cluster.node.protocol.threads=10
nifi.cluster.node.protocol.max.threads=50
nifi.cluster.node.event.history.size=25
nifi.cluster.node.connection.timeout=5 sec
nifi.cluster.node.read.timeout=5 sec
nifi.cluster.node.max.concurrent.requests=100
nifi.cluster.firewall.file=
nifi.cluster.flow.election.max.wait.time=5 mins
nifi.cluster.flow.election.max.candidates=3


# zookeeper properties, used for cluster management #
nifi.zookeeper.connect.string=10.0.0.89:24489,10.0.0.227:24427,10.0.0.28:24428
nifi.zookeeper.connect.timeout=3 secs
nifi.zookeeper.session.timeout=3 secs
nifi.zookeeper.root.node=/nifi


1 ACCEPTED SOLUTION

avatar
Master Mentor
@John T

Note: We do not recommend using the embedded ZK in a production environment.

Aside from that connection issues can be expected during any NiFi shutdown/restart because the embedded ZK is shutdown also. Also the default ZK connection and session timeouts are very aggressive for anything more then a basic setup in ideal environment.
-
I recommend changing those to at least 30 secs each.

-

I also se that each of your embedded ZK servers are running in different ports (24489, 24427, and 24428), why? Unusual, but should not be an issue.

Also confirm you created the unique "myid" files in the "./state/zookeeper" directory on each ZK server.

-

Of course any changes to any of NiFi's config files except logback.xml will require a restart for those changes to take affect. Once all nodes are back up and connected to cluster, check to see fi you are still seeing connection issues with ZK.

-

Thank you,

Matt

View solution in original post

10 REPLIES 10

avatar
Rising Star

avatar
Master Mentor
@John T

Note: We do not recommend using the embedded ZK in a production environment.

Aside from that connection issues can be expected during any NiFi shutdown/restart because the embedded ZK is shutdown also. Also the default ZK connection and session timeouts are very aggressive for anything more then a basic setup in ideal environment.
-
I recommend changing those to at least 30 secs each.

-

I also se that each of your embedded ZK servers are running in different ports (24489, 24427, and 24428), why? Unusual, but should not be an issue.

Also confirm you created the unique "myid" files in the "./state/zookeeper" directory on each ZK server.

-

Of course any changes to any of NiFi's config files except logback.xml will require a restart for those changes to take affect. Once all nodes are back up and connected to cluster, check to see fi you are still seeing connection issues with ZK.

-

Thank you,

Matt

avatar
Master Mentor

@John T

Did the timeout changes help with the error you were seeing in your NiFi app log with regards to ZK connection loss?

-

Thanks,

Matt

avatar
Rising Star

Turns out I had: nifi.state.management.embedded.zookeeper.start=false

Once I changed that to true it worked

avatar

@John how many znodes you have on zookeeper ? One of the reasons for Keeper Exceptions are bigger request / response size between zookeeper and its client component.

avatar
Explorer

@Matt Clarke Hi.I have the same error . Nifi 6 nodes and zk 6 embedded. Thanks in advice

,

@Matt Clarke Hi.i have the same error.i have 6 nodes Nifi and 6 embedded Zk.

Thanks in advice

avatar
Master Mentor

@Serhii K

Best recommendation I can make is to stop using the embedded ZK capability in NiFi and stand up an external ZK (on 3 different servers ideally) for your NiFi cluster to use.

-

The issues come from using embedded ZK

1. If NiFi is stopped or rebooted, ZK also is stopped or restart which can easily cause connection issues across your cluster once you lose ZK quorum.

2. NiFi can be very resource intensive (disk, memory, and CPU) depending on your particular dataflows and data volumes. This can become an issue when both the ZK and NiFi services are competing for the limited server resources.

-

Thanks,

Matt

avatar
Explorer

@Matt Clarke Thx for answer

1.probably I should remove flow.tar.gz from each node ? And start again ?

2.it's new cluster .from scratch. 8cpu/32memory have each node

avatar
New Contributor

nifi.remote.input.http.enabled

Specifies whether HTTP Site-to-Site should be enabled on this host. By default, it is set to true.
Whether a Site-to-Site client uses HTTP or HTTPS is determined by nifi.remote.input.secure. If it is set to true, then requests are sent as HTTPS to nifi.web.https.port. If set to false, HTTP requests are sent to nifi.web.http.port.