- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Primary Node Changing Often / ConnectionLoss
- Labels:
-
Apache NiFi
Created 09-08-2017 08:44 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We are seeing the primary node change pretty frequently and with DEBUG we noted the following error. Any ideas on how to resolve this or improve it?
2017-09-08 16:24:48,326 DEBUG [CommitProcessor:2] o.a.z.server.FinalRequestProcessor Processing request:: sessionid:0x25e631231920001 type:getData cxid:0x552 zxid:0xfffffffffffffffe txntype:unknown reqpath:/nifi/leaders/Cluster Coordinator/_c_436facb3-d463-4782-ada5-48d11856bfdf-lock-0000000092
2017-09-08 16:24:48,326 DEBUG [CommitProcessor:2] o.a.z.server.FinalRequestProcessor sessionid:0x25e631231920001 type:getData cxid:0x552 zxid:0xfffffffffffffffe txntype:unknown reqpath:/nifi/leaders/Cluster Coordinator/_c_436facb3-d463-4782-ada5-48d11856bfdf-lock-0000000092
2017-09-08 16:24:48,661 INFO [Process Cluster Protocol Request-8] o.a.n.c.p.impl.SocketProtocolListener Finished processing request 65c0e064-6313-4fed-ae4d-57cbf0fec692 (type=HEARTBEAT, length=4809 bytes) from dcwipphnif005.edc.nam.gm.com:8443 in 331 millis
2017-09-08 16:24:48,771 INFO [main-EventThread] o.a.c.f.state.ConnectionStateManager State change: SUSPENDED
2017-09-08 16:24:48,773 INFO [Curator-ConnectionStateManager-0] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@40da1bc7 Connection State changed to SUSPENDED
2017-09-08 16:24:48,773 DEBUG [Replicate Request Thread-1197] org.apache.curator.RetryLoop Retry-able exception received
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /nifi/leaders/Cluster Coordinator/_c_436facb3-d463-4782-ada5-48d11856bfdf-lock-0000000092
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:310)
at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:299)
at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:108)
at org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:295)
at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:287)
at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:34)
at org.apache.curator.framework.recipes.leader.LeaderSelector.participantForPath(LeaderSelector.java:375)
at org.apache.curator.framework.recipes.leader.LeaderSelector.getLeader(LeaderSelector.java:346)
at org.apache.curator.framework.recipes.leader.LeaderSelector.getLeader(LeaderSelector.java:339)
at org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager.getLeader(CuratorLeaderElectionManager.java:217)
at org.apache.nifi.cluster.coordination.node.NodeClusterCoordinator.getElectedActiveCoordinatorAddress(NodeClusterCoordinator.java:174)
at org.apache.nifi.cluster.coordination.node.NodeClusterCoordinator.getElectedActiveCoordinatorNode(NodeClusterCoordinator.java:460)
at org.apache.nifi.cluster.coordination.node.NodeClusterCoordinator.getElectedActiveCoordinatorNode(NodeClusterCoordinator.java:454)
at org.apache.nifi.cluster.coordination.node.NodeClusterCoordinator.isActiveClusterCoordinator(NodeClusterCoordinator.java:542)
at org.apache.nifi.cluster.coordination.node.NodeClusterCoordinator.afterRequest(NodeClusterCoordinator.java:965)
at org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator.onCompletedResponse(ThreadPoolRequestReplicator.java:702)
at org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator.lambda$replicate$19(ThreadPoolRequestReplicator.java:382)
at org.apache.nifi.cluster.coordination.http.replication.StandardAsyncClusterResponse.add(StandardAsyncClusterResponse.java:307)
at org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator.lambda$replicate$21(ThreadPoolRequestReplicator.java:425)
at org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator$NodeHttpRequest.run(ThreadPoolRequestReplicator.java:831)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
2017-09-08 16:24:48,775 DEBUG [Clustering Tasks Thread-1] org.apache.curator.RetryLoop Retry-able exception received
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /nifi/leaders/Cluster Coordinator
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1590)
at org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:230)
at org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:219)
at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:108)
at org.apache.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:215)
at org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:207)
at org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:40)
at org.apache.curator.framework.recipes.locks.LockInternals.getSortedChildren(LockInternals.java:151)
at org.apache.curator.framework.recipes.locks.LockInternals.getParticipantNodes(LockInternals.java:133)
at org.apache.curator.framework.recipes.locks.InterProcessMutex.getParticipantNodes(InterProcessMutex.java:170)
at org.apache.curator.framework.recipes.leader.LeaderSelector.getLeader(LeaderSelector.java:338)
at org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager.getLeader(CuratorLeaderElectionManager.java:217)
at org.apache.nifi.controller.cluster.ClusterProtocolHeartbeater.getHeartbeatAddress(ClusterProtocolHeartbeater.java:63)
at org.apache.nifi.controller.cluster.ClusterProtocolHeartbeater.send(ClusterProtocolHeartbeater.java:75)
at org.apache.nifi.controller.FlowController$HeartbeatSendTask.run(FlowController.java:4245)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
2017-09-08 16:24:48,775 INFO [Curator-ConnectionStateManager-0] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@2df79081 Connection State changed to SUSPENDED
2017-09-08 16:24:48,775 INFO [Leader Election Notification Thread-1] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@40da1bc7 has been interrupted; no longer leader for role 'Cluster Coordinator'
2017-09-08 16:24:48,776 INFO [Leader Election Notification Thread-2] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@2df79081 has been interrupted; no longer leader for role 'Primary Node'
2017-09-08 16:24:48,776 INFO [Leader Election Notification Thread-2] o.a.n.c.l.e.CuratorLeaderElectionManager
Created 09-09-2017 07:22 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Check these properties, the default values are 3 seconds, change them to 30 seconds and see if it helps
nifi.zookeeper.connect.timeout nifi.zookeeper.session.timeout
I would also check these properties, the default values are 5 seconds, change them to 30 seconds also
nifi.cluster.node.connection.timeout nifi.cluster.node.read.timeout
Finally, check this property, change it from the default of 10 to 40 or 50
nifi.cluster.node.protocol.threads
