Support Questions

wesley_bohannon · ‎09-08-2017

We are seeing the primary node change pretty frequently and with DEBUG we noted the following error. Any ideas on how to resolve this or improve it?

2017-09-08 16:24:48,326 DEBUG [CommitProcessor:2] o.a.z.server.FinalRequestProcessor Processing request:: sessionid:0x25e631231920001 type:getData cxid:0x552 zxid:0xfffffffffffffffe txntype:unknown reqpath:/nifi/leaders/Cluster Coordinator/_c_436facb3-d463-4782-ada5-48d11856bfdf-lock-0000000092

2017-09-08 16:24:48,326 DEBUG [CommitProcessor:2] o.a.z.server.FinalRequestProcessor sessionid:0x25e631231920001 type:getData cxid:0x552 zxid:0xfffffffffffffffe txntype:unknown reqpath:/nifi/leaders/Cluster Coordinator/_c_436facb3-d463-4782-ada5-48d11856bfdf-lock-0000000092

2017-09-08 16:24:48,661 INFO [Process Cluster Protocol Request-8] o.a.n.c.p.impl.SocketProtocolListener Finished processing request 65c0e064-6313-4fed-ae4d-57cbf0fec692 (type=HEARTBEAT, length=4809 bytes) from dcwipphnif005.edc.nam.gm.com:8443 in 331 millis

2017-09-08 16:24:48,771 INFO [main-EventThread] o.a.c.f.state.ConnectionStateManager State change: SUSPENDED

2017-09-08 16:24:48,773 INFO [Curator-ConnectionStateManager-0] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@40da1bc7 Connection State changed to SUSPENDED

2017-09-08 16:24:48,773 DEBUG [Replicate Request Thread-1197] org.apache.curator.RetryLoop Retry-able exception received

org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /nifi/leaders/Cluster Coordinator/_c_436facb3-d463-4782-ada5-48d11856bfdf-lock-0000000092

at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)

at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)

at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)

at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:310)

at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:299)

at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:108)

at org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:295)

at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:287)

at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:34)

at org.apache.curator.framework.recipes.leader.LeaderSelector.participantForPath(LeaderSelector.java:375)

at org.apache.curator.framework.recipes.leader.LeaderSelector.getLeader(LeaderSelector.java:346)

at org.apache.curator.framework.recipes.leader.LeaderSelector.getLeader(LeaderSelector.java:339)

at org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager.getLeader(CuratorLeaderElectionManager.java:217)

at org.apache.nifi.cluster.coordination.node.NodeClusterCoordinator.getElectedActiveCoordinatorAddress(NodeClusterCoordinator.java:174)

at org.apache.nifi.cluster.coordination.node.NodeClusterCoordinator.getElectedActiveCoordinatorNode(NodeClusterCoordinator.java:460)

at org.apache.nifi.cluster.coordination.node.NodeClusterCoordinator.getElectedActiveCoordinatorNode(NodeClusterCoordinator.java:454)

at org.apache.nifi.cluster.coordination.node.NodeClusterCoordinator.isActiveClusterCoordinator(NodeClusterCoordinator.java:542)

at org.apache.nifi.cluster.coordination.node.NodeClusterCoordinator.afterRequest(NodeClusterCoordinator.java:965)

at org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator.onCompletedResponse(ThreadPoolRequestReplicator.java:702)

at org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator.lambda$replicate$19(ThreadPoolRequestReplicator.java:382)

at org.apache.nifi.cluster.coordination.http.replication.StandardAsyncClusterResponse.add(StandardAsyncClusterResponse.java:307)

at org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator.lambda$replicate$21(ThreadPoolRequestReplicator.java:425)

at org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator$NodeHttpRequest.run(ThreadPoolRequestReplicator.java:831)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:748)

2017-09-08 16:24:48,775 DEBUG [Clustering Tasks Thread-1] org.apache.curator.RetryLoop Retry-able exception received

org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /nifi/leaders/Cluster Coordinator

at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)

at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)

at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1590)

at org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:230)

at org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:219)

at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:108)

at org.apache.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:215)

at org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:207)

at org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:40)

at org.apache.curator.framework.recipes.locks.LockInternals.getSortedChildren(LockInternals.java:151)

at org.apache.curator.framework.recipes.locks.LockInternals.getParticipantNodes(LockInternals.java:133)

at org.apache.curator.framework.recipes.locks.InterProcessMutex.getParticipantNodes(InterProcessMutex.java:170)

at org.apache.curator.framework.recipes.leader.LeaderSelector.getLeader(LeaderSelector.java:338)

at org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager.getLeader(CuratorLeaderElectionManager.java:217)

at org.apache.nifi.controller.cluster.ClusterProtocolHeartbeater.getHeartbeatAddress(ClusterProtocolHeartbeater.java:63)

at org.apache.nifi.controller.cluster.ClusterProtocolHeartbeater.send(ClusterProtocolHeartbeater.java:75)

at org.apache.nifi.controller.FlowController$HeartbeatSendTask.run(FlowController.java:4245)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:748)

2017-09-08 16:24:48,775 INFO [Curator-ConnectionStateManager-0] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@2df79081 Connection State changed to SUSPENDED

2017-09-08 16:24:48,775 INFO [Leader Election Notification Thread-1] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@40da1bc7 has been interrupted; no longer leader for role 'Cluster Coordinator'

2017-09-08 16:24:48,776 INFO [Leader Election Notification Thread-2] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@2df79081 has been interrupted; no longer leader for role 'Primary Node'

2017-09-08 16:24:48,776 INFO [Leader Election Notification Thread-2] o.a.n.c.l.e.CuratorLeaderElectionManager

Wynner · ‎09-09-2017

@Wesley Bohannon

Check these properties, the default values are 3 seconds, change them to 30 seconds and see if it helps

nifi.zookeeper.connect.timeout 
nifi.zookeeper.session.timeout

I would also check these properties, the default values are 5 seconds, change them to 30 seconds also

nifi.cluster.node.connection.timeout 
nifi.cluster.node.read.timeout

Finally, check this property, change it from the default of 10 to 40 or 50

nifi.cluster.node.protocol.threads

Cloudera Community

Support Questions

Primary Node Changing Often / ConnectionLoss