Created on 03-21-2018 06:05 PM - edited 09-16-2022 06:00 AM
It was noticed today the nifi re-elected the coordinator node although the previous coordinator node stayed connected based on nifi app log. What is the logic for kicking off coordinator reelection?
Thanks,
Mark
Created 03-21-2018 06:47 PM
Zookeeper is responsible for electing both the cluster coordinator and primary node for a nifi cluster.
Reason why ZK may elect a new primary node include:
Current primary node has not heartbeated to ZK:
- possibly because network issues?
- possibly because current Cluster Coordinator and/or Primary node is having issues preventing heartbeat form being sent such as Java garbage collection. As a stop-the-world event heartbeats would not be sent out while GC is running. by the time GC ends, ZK may have already elected a new Cluster coordinator and/or primary node. Node would notified of change next time it did successfully talk to ZK
So you may never see node actually become disconnected from cluster. Node send heartbeats to current elected cluster coordinator. As long as those heartbeats are making it within configured timeouts, nodes will stay connected in cluster.
Thanks you,
Matt
Created 03-21-2018 06:47 PM
Zookeeper is responsible for electing both the cluster coordinator and primary node for a nifi cluster.
Reason why ZK may elect a new primary node include:
Current primary node has not heartbeated to ZK:
- possibly because network issues?
- possibly because current Cluster Coordinator and/or Primary node is having issues preventing heartbeat form being sent such as Java garbage collection. As a stop-the-world event heartbeats would not be sent out while GC is running. by the time GC ends, ZK may have already elected a new Cluster coordinator and/or primary node. Node would notified of change next time it did successfully talk to ZK
So you may never see node actually become disconnected from cluster. Node send heartbeats to current elected cluster coordinator. As long as those heartbeats are making it within configured timeouts, nodes will stay connected in cluster.
Thanks you,
Matt