Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Issue in setting insecure Nifi Clustering for nifi-1.1.1

avatar
Contributor

I am trying to set up clustering with 2 nodes. One machine being local and another on an EC2 instance. When I try to connect, it gives following logs:

2017-04-26 19:10:54,126 INFO [main] /nifi-docs No Spring WebApplicationInitializer types detected on classpath 2017-04-26 19:10:54,157 INFO [main] o.e.jetty.server.handler.ContextHandler Started o.e.j.w.WebAppContext@6e4ac3f5{/nifi-docs,file:///home/jatin/Downloads/Softwares/nifi-1.1.1/work/jetty/nifi-web-docs-1.1.1.war/webapp/,AVAILABLE}{./work/nar/framework/nifi-framework-nar-1.1.1.nar-unpacked/META-INF/bundled-dependencies/nifi-web-docs-1.1.1.war} 2017-04-26 19:10:54,197 INFO [main] / No Spring WebApplicationInitializer types detected on classpath 2017-04-26 19:10:54,232 INFO [main] o.e.jetty.server.handler.ContextHandler Started o.e.j.w.WebAppContext@6418075c{/,file:///home/jatin/Downloads/Softwares/nifi-1.1.1/work/jetty/nifi-web-error-1.1.1.war/webapp/,AVAILABLE}{./work/nar/framework/nifi-framework-nar-1.1.1.nar-unpacked/META-INF/bundled-dependencies/nifi-web-error-1.1.1.war} 2017-04-26 19:10:54,239 INFO [main] o.eclipse.jetty.server.AbstractConnector Started ServerConnector@2f164fab{HTTP/1.1,[http/1.1]}{localhost:8088} 2017-04-26 19:10:54,239 INFO [main] org.eclipse.jetty.server.Server Started @80506ms 2017-04-26 19:10:55,124 INFO [main] org.apache.nifi.web.server.JettyServer Loading Flow... 2017-04-26 19:10:55,131 INFO [main] org.apache.nifi.io.socket.SocketListener Now listening for connections from nodes on port 9990 2017-04-26 19:10:55,161 INFO [main] o.a.nifi.controller.StandardFlowService Connecting Node: localhost:8088 2017-04-26 19:11:01,265 WARN [main] o.a.nifi.controller.StandardFlowService There is currently no Cluster Coordinator. This often happens upon restart of NiFi when running an embedded ZooKeeper. Will register this node to become the active Cluster Coordinator and will attempt to connect to cluster again 2017-04-26 19:11:01,265 INFO [main] o.a.n.c.l.e.CuratorLeaderElectionManager CuratorLeaderElectionManager[stopped=false] Attempted to register Leader Election for role 'Cluster Coordinator' but this role is already registered 2017-04-26 19:11:05,694 INFO [Curator-Framework-0] o.a.c.f.state.ConnectionStateManager State change: SUSPENDED 2017-04-26 19:11:05,696 INFO [Curator-ConnectionStateManager-0] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@65557951 Connection State changed to SUSPENDED 2017-04-26 19:11:05,705 ERROR [Curator-Framework-0] o.a.c.f.imps.CuratorFrameworkImpl Background operation retry gave up org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) ~[zookeeper-3.4.6.jar:3.4.6-1569965] at org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:728) [curator-framework-2.11.0.jar:na] at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:857) [curator-framework-2.11.0.jar:na] at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809) [curator-framework-2.11.0.jar:na] at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64) [curator-framework-2.11.0.jar:na] at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267) [curator-framework-2.11.0.jar:na] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_101] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_101] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [na:1.8.0_101] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_101] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_101] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101] 2017-04-26 19:11:05,707 ERROR [Curator-Framework-0] o.a.c.f.imps.CuratorFrameworkImpl Background retry gave up org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:197) ~[curator-client-2.11.0.jar:na] at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88) ~[curator-client-2.11.0.jar:na] at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:116) ~[curator-client-2.11.0.jar:na] at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:835) [curator-framework-2.11.0.jar:na] at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809) [curator-framework-2.11.0.jar:na] at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64) [curator-framework-2.11.0.jar:na] at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267) [curator-framework-2.11.0.jar:na] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_101] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_101] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [na:1.8.0_101] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_101] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_101] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101] 2017-04-26 19:11:05,791 ERROR [Curator-Framework-0] o.a.c.f.imps.CuratorFrameworkImpl Background operation retry gave up org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) ~[zookeeper-3.4.6.jar:3.4.6-1569965] at org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:728) [curator-framework-2.11.0.jar:na] at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:857) [curator-framework-2.11.0.jar:na] at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809) [curator-framework-2.11.0.jar:na] at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64) [curator-framework-2.11.0.jar:na] at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267) [curator-framework-2.11.0.jar:na] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_101] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_101] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [na:1.8.0_101] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_101] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_101] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101]

My nifi.properties are as follows:

# cluster common properties (all nodes must have same values) #

nifi.cluster.protocol.heartbeat.interval=5 sec

nifi.cluster.protocol.is.secure=false

# cluster node properties (only configure for cluster nodes) #

nifi.cluster.is.node=true nifi.cluster.node.address=localhost #nifi.cluster.node.address=107.23.49.252 nifi.cluster.node.protocol.port=9990

nifi.cluster.node.protocol.threads=10

nifi.cluster.node.event.history.size=25

nifi.cluster.node.connection.timeout=5 sec

nifi.cluster.node.read.timeout=5 sec

nifi.cluster.firewall.file=

nifi.cluster.flow.election.max.wait.time=2 mins

nifi.cluster.flow.election.max.candidates=2

nifi.zookeeper.connect.string=localhost:2181,107.23.49.252:2181

nifi.zookeeper.connect.timeout=3 secs

nifi.zookeeper.session.timeout=100 secs

nifi.zookeeper.root.node=/nifi

nifi.remote.input.host=localhost

nifi.remote.input.secure=false

nifi.remote.input.socket.port=9998

nifi.remote.input.http.enabled=true

nifi.remote.input.http.transaction.ttl=30 sec

nifi.state.management.provider.cluster=zk-provider

nifi.state.management.embedded.zookeeper.start=true

nifi.state.management.embedded.zookeeper.properties=./conf/zookeeper.properties

My zookeeper.properties file contents are as follows:

clientPort=9001

initLimit=10

autopurge.purgeInterval=24

syncLimit=5

tickTime=10000

dataDir=./state/zookeeper

autopurge.snapRetainCount=30

server.1=localhost:2888:3888

server.2=107.23.49.252:2888:3888

And my state-management.xml file content is as follows:

<cluster-provider>

<id>zk-provider</id>

<class>org.apache.nifi.controller.state.providers.zookeeper.ZooKeeperStateProvider</class>

<property name="Connect String">localhost:2181,107.23.49.252:2181</property>

<property name="Root Node">/nifi</property>

<property name="Session Timeout">10 seconds</property>

<property name="Access Control">Open</property>

</cluster-provider>

I also created a file myid with content 1 in state/zookeeper directory for localhost.

When I start the node, after some time, service starts and I can see the UI but it show me the following message:

This node is currently not connected to the cluster. Any modifications to the data flow made here will not replicate across the cluster.

and I see the state Disconnected in right top corner where it should show clusters connected.

1 ACCEPTED SOLUTION

avatar
@Jatin Kheradiya

The NiFi node running on your local system should be configured to use the public IP address of your local system, instead of localhost, so the NiFi running on the EC2 instance is able to resolve the address of you local system. Using localhost is an unresolvable host name to the EC2 NiFi instance, so it cannot find the NiFi node to join the cluster.

View solution in original post

4 REPLIES 4

avatar
@Jatin Kheradiya

The NiFi node running on your local system should be configured to use the public IP address of your local system, instead of localhost, so the NiFi running on the EC2 instance is able to resolve the address of you local system. Using localhost is an unresolvable host name to the EC2 NiFi instance, so it cannot find the NiFi node to join the cluster.

avatar
Super Mentor

@Jatin Kheradiya

In addition, zookeeper which is used for cluster elections will not work very well using localhost since quorum will not work properly between them. Assume you fix zookeeper to use valid public IP addresses or publicly resolvable hostnames, you still need to make sure node is configured to use a publicly resolvable hostname or ip as well.

When a node start it communicates with ZK to see if a cluster coordinator has already been elected or it throws its hat in the mix to become the coordinator himself. Assume localhost becomes elected as the coordinator. all other nodes will be informed of this via ZK and try to send heartbeats directly to "localhost". This will of course fail.

Dave is correct that you must avoid using localhost anywhere when installing a cluster.

Thanks,

Matt

avatar
Contributor

Hello,

Thanks @Wynner and @Matt Clarke. I tried changing it to public IP of the machine instead of localhost. I am trying some other way of not using embedded zookeeper, instead, I installed a Zookeeper server separately. Nifi in a single node is getting connected to zookeeper, but there it does not get connected to the cluster. It shows a popup with following message

This node is currently not connected to the cluster. Any modifications to the data flow made here will not replicate across the cluster.

Can you please help, what does this message mean? Since the node is not connected to the cluster, how do I connect it.

nifi.state.management.embedded.zookeeper.start=false

nifi.cluster.is.node=true

nifi.cluster.node.address=node2

nifi.cluster.node.protocol.port=9990

nifi.cluster.node.protocol.threads=10

nifi.cluster.node.event.history.size=25

nifi.cluster.node.connection.timeout=5 sec

nifi.cluster.node.read.timeout=5 sec

nifi.cluster.firewall.file=

nifi.cluster.flow.election.max.wait.time=2 mins

nifi.cluster.flow.election.max.candidates=2

nifi.zookeeper.connect.string=107.22.208.210:2181

Where 107.22.208.210 is the IP of another machine, where I am running zookeeper. I want 2 EC2 instances to run NIFI and both to connect to same zookeeper server at the mentioned IP, without using embedded zookeeper.

Also, in zookeeper.properties, I disabled server.1 and server.2 properties.

avatar
Super Mentor

@Jatin Kheradiya

Couple things....

1. zookeeper is not going to work very well with a single instance running. In order to achieve Quorum there should be an odd number of zookeeper servers (3, 5, 7, etc...) with 3 as a min to achieve quorum.

2. When NiFi nodes start they communicate with ZK to find out who the currently elected cluster coordinator is. They will all request to become the cluster coordinator and an election process will begin. Until this election completes, the nodes will not join the cluster. You should see election will end messages in the nifi-app.log when an election is on-going.

There are two properties in the nifi.properties file that control the election process:

nifi.cluster.flow.election.max.candidates=
nifi.cluster.flow.election.max.wait.time=5 mins

By default candidates is left blank which means the election will always run the full 5 minutes each time your NiFi cluster is restarted. To reduce how long the election takes to complete, set the candidates property to the number of nodes you have in your cluster. The election will complete once the configured number of candidates have checked in with zk or 5 minutes has passed.

Thanks,

Matt