- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
zookeeper connection error in NiFi version nifi-1.2.0.3.0.0.0-453
- Labels:
-
Apache NiFi
Created 06-27-2017 10:08 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I have a 3 node cluster all using the NiFi version nifi-1.2.0.3.0.0.0-453. The cluster has been working fine for the last couple of weeks, however today all of a sudden one of the nodes disconnected from the cluster and won't join the cluster back. I checked the logs and the error I see is the following:
ERROR [Curator-Framework-0] o.a.c.f.imps.CuratorFrameworkImpl Background retry gave up org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:838) at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809) at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64) at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2017-06-27 17:54:40,179 ERROR [Curator-Framework-0] o.a.c.f.imps.CuratorFrameworkImpl Background operation retry gave up org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:728) at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:857) at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809) at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64) at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
The node itself appears to still be running but is disconnected from the cluster. I tried restarting it but the same error keeps appearing over and over again. The other two nodes in the cluster are working fine. Does anyone have any idea of what could be causing this sudden issue? Any insights would be greatly appreciated.
Created 06-28-2017 11:49 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It looks like you are seeing this issue: Background retry falls into infinite loop of reconnection after connection loss
Are all of the zookeeper instances running? Are you seeing any messages in the zookeeper logs?
Created 06-28-2017 11:49 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It looks like you are seeing this issue: Background retry falls into infinite loop of reconnection after connection loss
Are all of the zookeeper instances running? Are you seeing any messages in the zookeeper logs?
Created 06-29-2017 02:47 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Wynner yes, all of my zookeeper instances are running, we use an external zookeeper not the NiFi embedded zookeeper and all of the instances have been running fine. The day this issue started to happen apparently one of the instances was having issues but since yesterday all of the instances have been working fine and all the services seem to be running but still the node keeps having an issue connecting to zookeeper. However the other two nodes seem to be just fine connecting to zookeeper and joining the cluster.
Created 06-29-2017 04:37 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What do the zookeeper logs show for the node that is having issues? Does it show the node trying to connect to zookeeper?
Created 06-29-2017 07:11 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Wynner, no logs show of the node. In the zookeeper logs I logs for the other two nodes in the cluster but not for the one that is having problems.
Created 06-30-2017 12:30 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Are you able to ping the zookeeper systems from the NiFi node that is having the issue?
I found an article where another user is seeing this issue. They said they cleared state in the state/zookeeper directory on all of the nodes, but don't remove the myid file, and restarted all of the nodes at the same time. I don't know if this is an option for you or not. Here is a link to the article I found Zookeeper error
Created 06-30-2017 06:04 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I just saw the same error in a cluster I have for testing. I was able to make the error occur and the only way it would clear is if I restarted all of the nodes in my cluster at the same time.
How you tried that with your cluster yet?
Created 06-30-2017 09:43 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Wynner
I had to try a couple of times but after a couple of tries of restarting the nodes at the same time the node was able to join the cluster. Thanks for the help!
Created 11-04-2021 08:11 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, may i know if you have managed to solve this problem? I was configuring the nifi cluster on the VM with the external zookeeper, and I faced this problem as well. I been struggling for this issues for weeks but still have no ideas to solve it
Created 02-17-2025 08:38 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
i too have same issue , i have 3 nodes like zookeepernode1 zookeepernode2 zookeepernode3
nifiuser@zookeepernode3:~$ /opt/zookeeper/bin/zkServer.sh status
/usr/bin/java
ZooKeeper JMX enabled by default
Using config: /opt/zookeeper/bin/../conf/zoo.cfg
myid could not be determined, will not able to locate clientPort in the server configs.
Client port found: 2181. Client address: localhost. Client SSL: false.
Mode: follower
in all three i get this status but in nifi
root@nifinode2:~# tail -f /opt/nifi/logs/nifi-app.log
2025-02-17 16:05:41,069 INFO [epollEventLoopGroup-4-1] o.apache.zookeeper.ClientCnxnSocketNetty channel is disconnected: [id: 0x41871c79
, L:/53.13.138.69:55258 ! R:zookeepernode2/53.13.138.72:2181]
2025-02-17 16:05:41,069 INFO [epollEventLoopGroup-4-1] o.apache.zookeeper.ClientCnxnSocketNetty channel is told closing
2025-02-17 16:05:41,111 INFO [epollEventLoopGroup-4-1] o.apache.zookeeper.ClientCnxnSocketNetty SSL handler added for channel: [id: 0x29
83cd7a]
2025-02-17 16:05:41,113 INFO [epollEventLoopGroup-4-1] o.apache.zookeeper.ClientCnxnSocketNetty channel is connected: [id: 0x2983cd7a, L
:/53.13.138.69:57908 - R:zookeepernode1/53.13.138.71:2181]
2025-02-17 16:05:41,114 INFO [epollEventLoopGroup-4-1] o.apache.zookeeper.ClientCnxnSocketNetty channel is disconnected: [id: 0x2983cd7a
, L:/53.13.138.69:57908 ! R:zookeepernode1/53.13.138.71:2181]
2025-02-17 16:05:41,114 INFO [epollEventLoopGroup-4-1] o.apache.zookeeper.ClientCnxnSocketNetty channel is told closing
2025-02-17 16:05:41,157 INFO [epollEventLoopGroup-4-1] o.apache.zookeeper.ClientCnxnSocketNetty SSL handler added for channel: [id: 0x80
2bf7fc]
2025-02-17 16:05:41,158 INFO [epollEventLoopGroup-4-1] o.apache.zookeeper.ClientCnxnSocketNetty channel is connected: [id: 0x802bf7fc, L
:/53.13.138.69:38350 - R:zookeepernode3/53.13.247.198:2181]
2025-02-17 16:05:41,159 INFO [epollEventLoopGroup-4-1] o.apache.zookeeper.ClientCnxnSocketNetty channel is disconnected: [id: 0x802bf7fc
, L:/53.13.138.69:38350 ! R:zookeepernode3/53.13.247.198:2181]
2025-02-17 16:05:41,159 INFO [epollEventLoopGroup-4-1] o.apache.zookeeper.ClientCnxnSocketNetty channel is told closing
here are my configurations
# zookeeper properties, used for cluster management #
nifi.zookeeper.connect.string=zookeepernode1:2181,zookeepernode2:2181,zookeepernode3:2181
nifi.zookeeper.connect.timeout=10 secs
nifi.zookeeper.session.timeout=10 secs
nifi.zookeeper.root.node=/nifi
root@nifinode1:/opt/nifi/conf# cat ./zookeeper.properties
initLimit=10
autopurge.purgeInterval=24
syncLimit=5
tickTime=2000
dataDir=./state/zookeeper
autopurge.snapRetainCount=30
server.1=zookeepernode1:2888:3888;2181
server.2=zookeepernode2:2888:3888;2181
server.3=zookeepernode3:2888:3888;2181
also i have myid file created properly
