Support Questions

Find answers, ask questions, and share your expertise

zookeeper connection error in NiFi version nifi-1.2.0.3.0.0.0-453

avatar
Expert Contributor

Hello,

I have a 3 node cluster all using the NiFi version nifi-1.2.0.3.0.0.0-453. The cluster has been working fine for the last couple of weeks, however today all of a sudden one of the nodes disconnected from the cluster and won't join the cluster back. I checked the logs and the error I see is the following:

ERROR [Curator-Framework-0] o.a.c.f.imps.CuratorFrameworkImpl Background retry gave up
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
	at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:838)
	at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809)
	at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64)
	at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
2017-06-27 17:54:40,179 ERROR [Curator-Framework-0] o.a.c.f.imps.CuratorFrameworkImpl Background operation retry gave up
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
	at org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:728)
	at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:857)
	at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809)
	at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64)
	at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

The node itself appears to still be running but is disconnected from the cluster. I tried restarting it but the same error keeps appearing over and over again. The other two nodes in the cluster are working fine. Does anyone have any idea of what could be causing this sudden issue? Any insights would be greatly appreciated.

1 ACCEPTED SOLUTION

avatar

@Adda Fuentes

It looks like you are seeing this issue: Background retry falls into infinite loop of reconnection after connection loss

Are all of the zookeeper instances running? Are you seeing any messages in the zookeeper logs?

View solution in original post

10 REPLIES 10

avatar

@Adda Fuentes

It looks like you are seeing this issue: Background retry falls into infinite loop of reconnection after connection loss

Are all of the zookeeper instances running? Are you seeing any messages in the zookeeper logs?

avatar
Expert Contributor

@Wynner yes, all of my zookeeper instances are running, we use an external zookeeper not the NiFi embedded zookeeper and all of the instances have been running fine. The day this issue started to happen apparently one of the instances was having issues but since yesterday all of the instances have been working fine and all the services seem to be running but still the node keeps having an issue connecting to zookeeper. However the other two nodes seem to be just fine connecting to zookeeper and joining the cluster.

avatar

What do the zookeeper logs show for the node that is having issues? Does it show the node trying to connect to zookeeper?

avatar
Expert Contributor

@Wynner, no logs show of the node. In the zookeeper logs I logs for the other two nodes in the cluster but not for the one that is having problems.

avatar

@Adda Fuentes

Are you able to ping the zookeeper systems from the NiFi node that is having the issue?

I found an article where another user is seeing this issue. They said they cleared state in the state/zookeeper directory on all of the nodes, but don't remove the myid file, and restarted all of the nodes at the same time. I don't know if this is an option for you or not. Here is a link to the article I found Zookeeper error

avatar

@Adda Fuentes

I just saw the same error in a cluster I have for testing. I was able to make the error occur and the only way it would clear is if I restarted all of the nodes in my cluster at the same time.

How you tried that with your cluster yet?

avatar
Expert Contributor

@Wynner

I had to try a couple of times but after a couple of tries of restarting the nodes at the same time the node was able to join the cluster. Thanks for the help!

avatar
New Contributor

Hi, may i know if you have managed to solve this problem? I was configuring the nifi cluster on the VM with the external zookeeper, and I faced this problem as well. I been struggling for this issues for weeks but still have no ideas to solve it

avatar
New Contributor

i too have same issue , i have 3 nodes like zookeepernode1 zookeepernode2 zookeepernode3 


nifiuser@zookeepernode3:~$ /opt/zookeeper/bin/zkServer.sh status
/usr/bin/java
ZooKeeper JMX enabled by default
Using config: /opt/zookeeper/bin/../conf/zoo.cfg
myid could not be determined, will not able to locate clientPort in the server configs.
Client port found: 2181. Client address: localhost. Client SSL: false.
Mode: follower

in all three i get this status but in nifi

root@nifinode2:~# tail -f /opt/nifi/logs/nifi-app.log
2025-02-17 16:05:41,069 INFO [epollEventLoopGroup-4-1] o.apache.zookeeper.ClientCnxnSocketNetty channel is disconnected: [id: 0x41871c79
, L:/53.13.138.69:55258 ! R:zookeepernode2/53.13.138.72:2181]
2025-02-17 16:05:41,069 INFO [epollEventLoopGroup-4-1] o.apache.zookeeper.ClientCnxnSocketNetty channel is told closing
2025-02-17 16:05:41,111 INFO [epollEventLoopGroup-4-1] o.apache.zookeeper.ClientCnxnSocketNetty SSL handler added for channel: [id: 0x29
83cd7a]
2025-02-17 16:05:41,113 INFO [epollEventLoopGroup-4-1] o.apache.zookeeper.ClientCnxnSocketNetty channel is connected: [id: 0x2983cd7a, L
:/53.13.138.69:57908 - R:zookeepernode1/53.13.138.71:2181]
2025-02-17 16:05:41,114 INFO [epollEventLoopGroup-4-1] o.apache.zookeeper.ClientCnxnSocketNetty channel is disconnected: [id: 0x2983cd7a
, L:/53.13.138.69:57908 ! R:zookeepernode1/53.13.138.71:2181]
2025-02-17 16:05:41,114 INFO [epollEventLoopGroup-4-1] o.apache.zookeeper.ClientCnxnSocketNetty channel is told closing
2025-02-17 16:05:41,157 INFO [epollEventLoopGroup-4-1] o.apache.zookeeper.ClientCnxnSocketNetty SSL handler added for channel: [id: 0x80
2bf7fc]
2025-02-17 16:05:41,158 INFO [epollEventLoopGroup-4-1] o.apache.zookeeper.ClientCnxnSocketNetty channel is connected: [id: 0x802bf7fc, L
:/53.13.138.69:38350 - R:zookeepernode3/53.13.247.198:2181]
2025-02-17 16:05:41,159 INFO [epollEventLoopGroup-4-1] o.apache.zookeeper.ClientCnxnSocketNetty channel is disconnected: [id: 0x802bf7fc
, L:/53.13.138.69:38350 ! R:zookeepernode3/53.13.247.198:2181]
2025-02-17 16:05:41,159 INFO [epollEventLoopGroup-4-1] o.apache.zookeeper.ClientCnxnSocketNetty channel is told closing

here are my configurations
# zookeeper properties, used for cluster management #
nifi.zookeeper.connect.string=zookeepernode1:2181,zookeepernode2:2181,zookeepernode3:2181
nifi.zookeeper.connect.timeout=10 secs
nifi.zookeeper.session.timeout=10 secs
nifi.zookeeper.root.node=/nifi

root@nifinode1:/opt/nifi/conf# cat ./zookeeper.properties
initLimit=10
autopurge.purgeInterval=24
syncLimit=5
tickTime=2000
dataDir=./state/zookeeper
autopurge.snapRetainCount=30
server.1=zookeepernode1:2888:3888;2181
server.2=zookeepernode2:2888:3888;2181
server.3=zookeepernode3:2888:3888;2181

also i have myid file created properly