Created 06-05-2018 04:33 AM
Hi Everyone!
I keep getting a: ERROR [Curator-Framework-0] o.a.c.f.imps.CuratorFrameworkImpl Background retry gave up org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
My EDITABLE config files are here in the associated directories in my google drive below and attached, this is a 3 node cluster:
https://drive.google.com/drive/folders/11xM-sz8mUvpaiOOS4aiZ94TQGHHzConF?usp=sharing
I made sure FirewallD was off, all ports used are free, and I followed these guides:
https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.1.1/bk_administration/content/clustering.html
AND
Any help would be greatly appreciated!!
Here's my configs in plain text for reference:
###################################################10.0.0.89 Server###################################################### ####ZooKeeper.properties File##### clientPort=24489 initLimit=10 autopurge.purgeInterval=24 syncLimit=5 tickTime=2000 dataDir=./state/zookeeper autopurge.snapRetainCount=30 server.1=10.0.0.89:2888:3888 server.2=10.0.0.227:2888:3888 server.3=10.0.0.228:2888:3888 ####nifi.properties cluster section#### # cluster common properties (all nodes must have same values) # nifi.cluster.protocol.heartbeat.interval=5 sec nifi.cluster.protocol.is.secure=false # cluster node properties (only configure for cluster nodes) # nifi.cluster.is.node=true nifi.cluster.node.address=10.0.0.89 nifi.cluster.node.protocol.port=24489 nifi.cluster.node.protocol.threads=10 nifi.cluster.node.protocol.max.threads=50 nifi.cluster.node.event.history.size=25 nifi.cluster.node.connection.timeout=5 sec nifi.cluster.node.read.timeout=5 sec nifi.cluster.node.max.concurrent.requests=100 nifi.cluster.firewall.file= nifi.cluster.flow.election.max.wait.time=5 mins nifi.cluster.flow.election.max.candidates=3 # zookeeper properties, used for cluster management # nifi.zookeeper.connect.string=10.0.0.89:24489,10.0.0.227:24427,10.0.0.28:24428 nifi.zookeeper.connect.timeout=3 secs nifi.zookeeper.session.timeout=3 secs nifi.zookeeper.root.node=/nifi ###################################################################10.0.0.227 Server################################################################################################ ####ZooKeeper.properties File##### clientPort=24427 initLimit=10 autopurge.purgeInterval=24 syncLimit=5 tickTime=2000 dataDir=./state/zookeeper autopurge.snapRetainCount=30 server.1=10.0.0.89:2888:3888 server.2=10.0.0.227:2888:3888 server.3=10.0.0.228:2888:3888 #####nifi.properties cluster section ################### # cluster common properties (all nodes must have same values) # nifi.cluster.protocol.heartbeat.interval=5 sec nifi.cluster.protocol.is.secure=false # cluster node properties (only configure for cluster nodes) # nifi.cluster.is.node=true nifi.cluster.node.address=10.0.0.227 nifi.cluster.node.protocol.port=24427 nifi.cluster.node.protocol.threads=10 nifi.cluster.node.protocol.max.threads=50 nifi.cluster.node.event.history.size=25 nifi.cluster.node.connection.timeout=5 sec nifi.cluster.node.read.timeout=5 sec nifi.cluster.node.max.concurrent.requests=100 nifi.cluster.firewall.file= nifi.cluster.flow.election.max.wait.time=5 mins nifi.cluster.flow.election.max.candidates=3 # zookeeper properties, used for cluster management # nifi.zookeeper.connect.string=10.0.0.89:24489,10.0.0.227:24427,10.0.0.28:24428 nifi.zookeeper.connect.timeout=3 secs nifi.zookeeper.session.timeout=3 secs nifi.zookeeper.root.node=/nifi ##################################################################################10.0.0.228 Server####### ############################################################################## ####ZooKeeper.properties File##### clientPort=24428 initLimit=10 autopurge.purgeInterval=24 syncLimit=5 tickTime=2000 dataDir=./state/zookeeper autopurge.snapRetainCount=30 server.1=10.0.0.89:2888:3888 server.2=10.0.0.227:2888:3888 server.3=10.0.0.228:2888:3888 #####nifi.properties cluster section ########## # cluster common properties (all nodes must have same values) # nifi.cluster.protocol.heartbeat.interval=5 sec nifi.cluster.protocol.is.secure=false # cluster node properties (only configure for cluster nodes) # nifi.cluster.is.node=true nifi.cluster.node.address=10.0.0.228 nifi.cluster.node.protocol.port=24428 nifi.cluster.node.protocol.threads=10 nifi.cluster.node.protocol.max.threads=50 nifi.cluster.node.event.history.size=25 nifi.cluster.node.connection.timeout=5 sec nifi.cluster.node.read.timeout=5 sec nifi.cluster.node.max.concurrent.requests=100 nifi.cluster.firewall.file= nifi.cluster.flow.election.max.wait.time=5 mins nifi.cluster.flow.election.max.candidates=3 # zookeeper properties, used for cluster management # nifi.zookeeper.connect.string=10.0.0.89:24489,10.0.0.227:24427,10.0.0.28:24428 nifi.zookeeper.connect.timeout=3 secs nifi.zookeeper.session.timeout=3 secs nifi.zookeeper.root.node=/nifi
Created 06-05-2018 06:14 PM
Note: We do not recommend using the embedded ZK in a production environment.
Aside from that connection issues can be expected during any NiFi shutdown/restart because the embedded ZK is shutdown also. Also the default ZK connection and session timeouts are very aggressive for anything more then a basic setup in ideal environment.
-
I recommend changing those to at least 30 secs each.
-
I also se that each of your embedded ZK servers are running in different ports (24489, 24427, and 24428), why? Unusual, but should not be an issue.
Also confirm you created the unique "myid" files in the "./state/zookeeper" directory on each ZK server.
-
Of course any changes to any of NiFi's config files except logback.xml will require a restart for those changes to take affect. Once all nodes are back up and connected to cluster, check to see fi you are still seeing connection issues with ZK.
-
Thank you,
Matt
Created 06-05-2018 05:53 PM
Created 06-05-2018 06:14 PM
Note: We do not recommend using the embedded ZK in a production environment.
Aside from that connection issues can be expected during any NiFi shutdown/restart because the embedded ZK is shutdown also. Also the default ZK connection and session timeouts are very aggressive for anything more then a basic setup in ideal environment.
-
I recommend changing those to at least 30 secs each.
-
I also se that each of your embedded ZK servers are running in different ports (24489, 24427, and 24428), why? Unusual, but should not be an issue.
Also confirm you created the unique "myid" files in the "./state/zookeeper" directory on each ZK server.
-
Of course any changes to any of NiFi's config files except logback.xml will require a restart for those changes to take affect. Once all nodes are back up and connected to cluster, check to see fi you are still seeing connection issues with ZK.
-
Thank you,
Matt
Created 06-06-2018 07:36 PM
Did the timeout changes help with the error you were seeing in your NiFi app log with regards to ZK connection loss?
-
Thanks,
Matt
Created 06-07-2018 01:43 PM
Turns out I had: nifi.state.management.embedded.zookeeper.start=false
Once I changed that to true it worked
Created 06-05-2018 08:36 PM
@John how many znodes you have on zookeeper ? One of the reasons for Keeper Exceptions are bigger request / response size between zookeeper and its client component.
Created 08-30-2018 06:05 PM
@Matt Clarke Hi.I have the same error . Nifi 6 nodes and zk 6 embedded. Thanks in advice
,@Matt Clarke Hi.i have the same error.i have 6 nodes Nifi and 6 embedded Zk.
Thanks in advice
Created 08-30-2018 07:44 PM
Best recommendation I can make is to stop using the embedded ZK capability in NiFi and stand up an external ZK (on 3 different servers ideally) for your NiFi cluster to use.
-
The issues come from using embedded ZK
1. If NiFi is stopped or rebooted, ZK also is stopped or restart which can easily cause connection issues across your cluster once you lose ZK quorum.
2. NiFi can be very resource intensive (disk, memory, and CPU) depending on your particular dataflows and data volumes. This can become an issue when both the ZK and NiFi services are competing for the limited server resources.
-
Thanks,
Matt
Created 08-30-2018 07:55 PM
@Matt Clarke Thx for answer
1.probably I should remove flow.tar.gz from each node ? And start again ?
2.it's new cluster .from scratch. 8cpu/32memory have each node
Created 04-12-2020 10:10 AM
nifi.remote.input.http.enabled | Specifies whether HTTP Site-to-Site should be enabled on this host. By default, it is set to true. |