Support Questions

Find answers, ask questions, and share your expertise

Why is Zookeeper not re-electing new leader in Apache Nifi Cluster on failover?

avatar
New Contributor
Following is my architecture
2 Servers:
Server 1: running Apache Nifi + Zookeeper (Not embedded)
Server 2: running Apache Nifi + Zookeeper (Not embedded)

To test failovers, I close down the Server that has been selected as Cluster Coordinator

In this case, zookeeper should automatically elect the remaining one server as leader. But it keeps failing and goes into continuous loop of trying to connect to the first server

Zookeeper Logs in Server 2 when leader (Server 1) went down:

2019-10-22 18:44:01,135 [myid:2] - WARN  [NIOWorkerThread-2:NIOServerCnxn@370] - Exception causing close of session 0x0: ZooKeeperServer not running
2019-10-22 18:44:02,925 [myid:2] - WARN  [NIOWorkerThread-3:NIOServerCnxn@370] - Exception causing close of session 0x0: ZooKeeperServer not running
2019-10-22 18:44:03,320 [myid:2] - WARN  [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):QuorumCnxManager@677] - 
Cannot open channel to 1 at election address ec2-server-1.compute-1.amazonaws.com/172.xx.x.x:3888
    java.net.ConnectException: Connection refused (Connection refused)
            at java.net.PlainSocketImpl.socketConnect(Native Method)
            at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)

Server 2 Config files:

zoo.cfg

tickTime=2000
initLimit=5
syncLimit=2
dataDir=/home/ec2-user/zookeeper
clientPort=2181
server.1=ec2-server-1.compute-1.amazonaws.com:2888:3888
server.2=0.0.0.0:2888:3888

nifi.properties

nifi.cluster.is.node=true
nifi.cluster.node.address=ec2-server-2.compute-1.amazonaws.com
nifi.cluster.node.protocol.port=8082
nifi.cluster.flow.election.max.wait.time=2 mins
nifi.cluster.flow.election.max.candidates=1

# zookeeper properties, used for cluster management #
nifi.zookeeper.connect.string=localhost:2181
nifi.zookeeper.root.node=/nifi

Server 1 Config files:

zoo.cfg

tickTime=2000
initLimit=5
syncLimit=2
dataDir=/home/ec2-user/zookeeper
clientPort=2181
server.1=0.0.0.0:2888:3888
server.2=ec2-server-2.compute-1.amazonaws.com:2888:3888

nifi.properties

nifi.cluster.is.node=true
nifi.cluster.node.address=ec2-server-1.compute-1.amazonaws.com
nifi.cluster.node.protocol.port=8082
nifi.cluster.flow.election.max.wait.time=2 mins
nifi.cluster.flow.election.max.candidates=1

# zookeeper properties, used for cluster management #
nifi.zookeeper.connect.string=localhost:2181
nifi.zookeeper.root.node=/nifi

What am I doing wrong?

1 REPLY 1

avatar
Master Mentor

@jspuri 

 

I suggest making the following configuration changes:

 

1. The zoo.cfg files on both your ZK nodes should be the same

tickTime=2000
initLimit=5
syncLimit=2
dataDir=/home/ec2-user/zookeeper
clientPort=2181
server.1=ec2-server-1.compute-1.amazonaws.com:2888:3888
server.2=ec2-server-2.compute-1.amazonaws.com:2888:3888

 Note: For proper Quorum zookeeper cluster should also have an odd number of servers (3 or 5)

 

2. In NiFi state-management.xml

<property name="Connect String">ec2-server-1.compute-1.amazonaws.com:2888:3888,ec2-server-2.compute-1.amazonaws.com:2888:3888</property>

 

3. in nifi.properties file

nifi.zookeeper.connect.string=ec2-server-1.compute-1.amazonaws.com:2888:3888,ec2-server-2.compute-1.amazonaws.com:2888:3888

 

 

Hope this helps,

Matt