<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Nifi getting hung on invalid Zookeeper hostname in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Nifi-getting-hung-on-invalid-Zookeeper-hostname/m-p/282804#M210207</link>
    <description>&lt;P&gt;I am running a kubernetes cluster with three nodes, each running a Nifi pod (nifi-0, nifi-1, nifi-2) and a Zookeeper pod (zk-0, zk-1, zk-2).&amp;nbsp; Everything worked.&amp;nbsp; These are the relevant lines from nifi.properties:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;nifi.state.management.embedded.zookeeper.start=false
nifi.zookeeper.connect.string=zk-0.nifi.svc.cluster.local:2181,zk-1.nifi.svc.cluster.local:2181,zk-2.nifi.svc.cluster.local:2181&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Today, one of the nodes crashed, taking out nifi-0 and zk-1 which were both running on it.&amp;nbsp; For reasons completely unrelated, kubernetes was unable to spin up a new node to replace it.&amp;nbsp; However, with two Nifi and two Zookeeper pods still running, it is my understanding that this should not have been a problem.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Zookeeper seems to work fine.&amp;nbsp; "/opt/zookeeper/bin/zkServer.sh status" on zk-0 reports that it is the leader, and zk-2 reports that it is the follower.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But Nifi is not working.&amp;nbsp;Attempting to connect to the UI just returns the "&lt;SPAN&gt;Action cannot be performed because there is currently no Cluster Coordinator elected. The request should be tried again after a moment, after a Cluster Coordinator has been automatically elected."&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;nifi-2 believes that it is the cluster leader:&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;nifi-app_2019-11-12_19.0.log:2019-11-12 19:31:32,875 INFO [Leader Election Notification Thread-1] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@59188519 This node has been elected Leader for Role 'Primary Node'
nifi-app_2019-11-12_19.0.log:2019-11-12 19:31:32,875 INFO [Leader Election Notification Thread-4] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@1905a793 This node has been elected Leader for Role 'Cluster Coordinator'
nifi-app_2019-11-12_19.0.log:2019-11-12 19:31:32,876 INFO [Leader Election Notification Thread-4] o.apache.nifi.controller.FlowController This node elected Active Cluster Coordinator
nifi-app_2019-11-12_19.0.log:2019-11-12 19:31:32,876 INFO [Leader Election Notification Thread-1] o.apache.nifi.controller.FlowController This node has been elected Primary Node&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;nifi-1, though, believes that there is no cluster leader.&amp;nbsp; It is stuck in a cycle of trying to connect to&amp;nbsp;&lt;SPAN&gt;zk-1.nifi.svc.cluster.local; however, because that pod no longer exists (it crashed), this is no longer a resolveable name (kubernetes manages the DNS in this regard).&amp;nbsp; The exact error is:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="java"&gt;2019-11-12 23:15:18,232 ERROR [Leader Election Notification Thread-1] o.a.c.f.imps.CuratorFrameworkImpl Background exception was not retry-able or retry gave up
java.net.UnknownHostException: zk-1.nifi.svc.cluster.local
        at java.net.InetAddress.getAllByName0(InetAddress.java:1281)
        at java.net.InetAddress.getAllByName(InetAddress.java:1193)
        at java.net.InetAddress.getAllByName(InetAddress.java:1127)
        at org.apache.zookeeper.client.StaticHostProvider.&amp;lt;init&amp;gt;(StaticHostProvider.java:61)
        at org.apache.zookeeper.ZooKeeper.&amp;lt;init&amp;gt;(ZooKeeper.java:445)
        at org.apache.curator.utils.DefaultZookeeperFactory.newZooKeeper(DefaultZookeeperFactory.java:29)
        at org.apache.curator.framework.imps.CuratorFrameworkImpl$2.newZooKeeper(CuratorFrameworkImpl.java:150)
        at org.apache.curator.HandleHolder$1.getZooKeeper(HandleHolder.java:94)
        at org.apache.curator.HandleHolder.getZooKeeper(HandleHolder.java:55)
        at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:91)
        at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:116)
        at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:835)
        at org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:507)
        at org.apache.curator.framework.imps.FindAndDeleteProtectedNodeInBackground.execute(FindAndDeleteProtectedNodeInBackground.java:60)
        at org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:496)
        at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:474)
        at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:44)
        at org.apache.curator.framework.recipes.locks.StandardLockInternalsDriver.createsTheLock(StandardLockInternalsDriver.java:50)
        at org.apache.curator.framework.recipes.locks.LockInternals.attemptLock(LockInternals.java:217)
        at org.apache.curator.framework.recipes.locks.InterProcessMutex.internalLock(InterProcessMutex.java:232)
        at org.apache.curator.framework.recipes.locks.InterProcessMutex.acquire(InterProcessMutex.java:89)
        at org.apache.curator.framework.recipes.leader.LeaderSelector.doWork(LeaderSelector.java:386)
        at org.apache.curator.framework.recipes.leader.LeaderSelector.doWorkLoop(LeaderSelector.java:441)
        at org.apache.curator.framework.recipes.leader.LeaderSelector.access$100(LeaderSelector.java:64)
        at org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:245)
        at org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:239)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This is the only thing in nifi-app.log for hours; it does not seem like it is attempting to connect to zk-0 or zk-2 at all.&amp;nbsp; nifi-2 does not have these errors.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So, the questions:&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;1. Should Nifi be able to handle the situation where it can't resolve or connect to a Zookeeper address?&lt;/P&gt;
&lt;P&gt;2. Is there any reason why a Nifi node might get "stuck" on a particular Zookeeper instance, or not attempt to try other instances?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 13 Nov 2019 06:36:36 GMT</pubDate>
    <dc:creator>cmcguigan</dc:creator>
    <dc:date>2019-11-13T06:36:36Z</dc:date>
  </channel>
</rss>

