Support Questions

mpandit · ‎02-08-2017

I have 3 node NiFi cluster, during the startup I am getting following exception. Please advise.

2017-02-08 04:17:23,356 INFO [main] o.a.n.c.l.e.CuratorLeaderElectionManager CuratorLeaderElectionManager[stopped=false] Attempted to register Leader Election for role 'Cluster Coordinator' but this role is already registered 2017-02-08 04:17:30,108 INFO [Curator-Framework-0] o.a.c.f.state.ConnectionStateManager State change: SUSPENDED 2017-02-08 04:17:30,115 INFO [Curator-ConnectionStateManager-0] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@7376a113 Connection State changed to SUSPENDED 2017-02-08 04:17:30,126 ERROR [Curator-Framework-0] o.a.c.f.imps.CuratorFrameworkImpl Background operation retry gave up org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) ~[zookeeper-3.4.6.jar:3.4.6-1569965] at org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:728) [curator-framework-2.11.0.jar:na] at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:857) [curator-framework-2.11.0.jar:na] at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809) [curator-framework-2.11.0.jar:na] at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64) [curator-framework-2.11.0.jar:na] at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267) [curator-framework-2.11.0.jar:na] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_77] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_77] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [na:1.8.0_77] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_77]

MattWho · ‎02-08-2017

@milind pandit

Tell us something about your particular NiFi installation method: 1. Was this NiFi cluster installed via Ambari or command line? 2. Are you using NiFi internal zookeepers or external zookeepers? Is this the entire stack trace from the nifi-app log?

mpandit · ‎02-08-2017

@Matt Clarke This is command line installation (nifi 1.1.1). I am using the embedded zookeeper(internal). The stack trace is first occurrence of error which repeats continuously.

MattWho · ‎02-09-2017

@milind pandit

The errors you are seeing would be expected during startup since your ZK will not establish quorum until all three nodes have completely started. As a node goes through its startup process it will being begin trying to establish zk quorum between all other zk nodes. Those other nodes may not be running yet if the other nodes are still starting as well, thus producing a lot of ERROR messages. Using the embedded zk is not recommended in a production environment since they are stopped and started along with NiFi. It is best to use dedicated external zookeepers in production.

If the errors continue to persist even after all three nodes are fully running, check the below:

1. Verify that you have enabled the embedded zk on all three of your nodes.

2. Verify the zk nodes on each of your servers started and bound to the configured zk ports configured in your zookeepeer.properties file.

3. Make sure you are using resolvable hostnames for each of your zk nodes.

4. Make sure you do not have any firewalls that would prevent your NiFi nodes from being able to communicate between each other over the configured zk hostnames and ports.

Thanks,

Matt

rzuidhof · ‎01-23-2018

I had the same message in the nifi-app.log:

2018-01-22 16:06:42,479 INFO [NiFi Web Server-21] o.a.n.w.a.c.IllegalClusterStateExceptionMapper org.apache.nifi.cluster.manager.exception.IllegalClusterStateException: Cluster is still in the process of voting on the appropriate Data Flow.. Returning Conflict response.

But the cause was different

Caused by: java.lang.RuntimeException: Found Invalid ProcessGroup ID for Destination: 58db74de-f860-3152-9244-819f0bb09e39
        at org.apache.nifi.controller.StandardFlowSynchronizer.addProcessGroup(StandardFlowSynchronizer.java:1186) ~[nifi-framework-core-1.1.0.2.1.4.0-5.jar:1.1.0.2.1.4.0-5]
        at org.apache.nifi.controller.StandardFlowSynchronizer.addProcessGroup(StandardFlowSynchronizer.java:1087) ~[nifi-framework-core-1.1.0.2.1.4.0-5.jar:1.1.0.2.1.4.0-5]
        at org.apache.nifi.controller.StandardFlowSynchronizer.sync(StandardFlowSynchronizer.java:286) ~[nifi-framework-core-1.1.0.2.1.4.0-5.jar:1.1.0.2.1.4.0-5]

The flow xml was indeed corrupted on all nodes. Some sort of orphaned connection, pointing to a removed group ID.
Solution: restore one before the last file from flow archive directory (on all cluster nodes). The last archive flow is always the same as the current flow in conf.

suraj_lawand · ‎04-24-2018

I Have faced similar error. In my case Nifi was running fine but in cluster, nodes was not connected. In nifi-app.log found below errors.

ERROR [Curator-Framework-0] o.a.c.f.imps.CuratorFrameworkImpl Background retry gave up
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss

Solution - ZK services was not running. I have started first then started Nifi cluster. Now Nifi nodes are connected properly in a cluster and cluster is running fine.

4pingwin · ‎08-30-2018

@Lawand Suraj Hi .please provide details how you fixed issue ? Thank you.i have embedded zk.and 6 nodes.

4pingwin · ‎08-31-2018

@Matt Clarke

I Have faced similar error.

ERROR [Curator-Framework-0] o.a.c.f.imps.CuratorFrameworkImplBackgroundretry gave uporg.apache.curator.CuratorConnectionLossException:KeeperErrorCode=ConnectionLoss

Nifi version 1.7

Please advise.

bsaoula · ‎02-21-2019

I think you have to check the log on the zookeeper side, my advice is to increase the value of

"nifi.zookeeper.connect.timeout" and "nifi.zookeeper.session.timeout" settings.

Also check your network connection between the Zookeeper servers and Nifi servers, Network latency can cause the issue.

Cloudera Community

Support Questions

NiFi Cluster : Startup exception "Cluster is still in the process of voting on the appropriate Data Flow"