Support Questions
Find answers, ask questions, and share your expertise

[NIFI 1.3.0] : Unable to rejoint again the cluster

Explorer

Hi all,

It is weird situation, my node nifi001 eject from the cluster now it could not joint the cluster.

I've removed users.xml, authorizations.xml and flow.xml.gz

2017-06-22 16:25:55,434 INFO [main] o.a.n.c.c.n.LeaderElectionNodeProtocolSender Determined that Cluster Coordinator is located at nifi002:11443; will use this address for sending heartbeat messages
2017-06-22 16:25:55,907 INFO [main] o.a.n.c.c.node.NodeClusterCoordinator Resetting cluster node statuses from {} to {nifi001:9443=NodeConnectionStatus[nodeId=nifi001:9443, state=CONNECTING, updateId=103], nifi002:9443=NodeConnectionStatus[nodeId=nifi002:9443, state=CONNECTED, updateId=59], nifi003:9443=NodeConnectionStatus[nodeId=nifi003:9443, state=CONNECTED, updateId=86]}
2017-06-22 16:25:56,016 ERROR [main] o.a.nifi.controller.StandardFlowService Failed to load flow from cluster due to: org.apache.nifi.controller.UninheritableFlowException: Failed to connect node to cluster because local flow is different than cluster flow.
org.apache.nifi.controller.UninheritableFlowException: Failed to connect node to cluster because local flow is different than cluster flow.
        at org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:936)
        at org.apache.nifi.controller.StandardFlowService.load(StandardFlowService.java:515)
        at org.apache.nifi.web.server.JettyServer.start(JettyServer.java:800)
        at org.apache.nifi.NiFi.<init>(NiFi.java:160)
        at org.apache.nifi.NiFi.main(NiFi.java:267)
Caused by: org.apache.nifi.controller.UninheritableFlowException: Proposed Authorizer is not inheritable by the flow controller because of Authorizer differences: Proposed Authorizations do not match current Authorizations
        at org.apache.nifi.controller.StandardFlowSynchronizer.sync(StandardFlowSynchronizer.java:277)
        at org.apache.nifi.controller.FlowController.synchronize(FlowController.java:1576)
        at org.apache.nifi.persistence.StandardXMLFlowConfigurationDAO.load(StandardXMLFlowConfigurationDAO.java:84)
        at org.apache.nifi.controller.StandardFlowService.loadFromBytes(StandardFlowService.java:722)
        at org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:911)
        ... 4 common frames omitted
2017-06-22 16:25:56,017 INFO [main] o.a.n.c.c.node.NodeClusterCoordinator nifi001:9443 requested disconnection from cluster due to org.apache.nifi.controller.UninheritableFlowException: Failed to connect node to cluster because local flow is different than cluster flow.
2017-06-22 16:25:56,017 INFO [main] o.a.n.c.c.node.NodeClusterCoordinator Status of nifi001:9443 changed from NodeConnectionStatus[nodeId=nifi001:9443, state=CONNECTING, updateId=103] to NodeConnectionStatus[nodeId=nifi001:9443, state=DISCONNECTED, Disconnect Code=Node's Flow did not Match Cluster Flow, Disconnect Reason=org.apache.nifi.controller.UninheritableFlowException: Failed to connect node to cluster because local flow is different than cluster flow., updateId=1]
2017-06-22 16:25:56,137 ERROR [main] o.a.n.c.c.node.NodeClusterCoordinator Event Reported for nifi001:9443 -- Node disconnected from cluster due to org.apache.nifi.controller.UninheritableFlowException: Failed to connect node to cluster because local flow is different than cluster flow.
2017-06-22 16:25:56,137 INFO [main] o.a.n.c.l.e.CuratorLeaderElectionManager Cannot unregister Leader Election Role 'Primary Node' becuase that role is not registered
2017-06-22 16:25:56,138 WARN [main] org.apache.nifi.web.server.JettyServer Failed to start web server... shutting down.
java.lang.IllegalStateException: Already closed or has not been started
        at com.google.common.base.Preconditions.checkState(Preconditions.java:173)
        at org.apache.curator.framework.recipes.leader.LeaderSelector.close(LeaderSelector.java:270)
        at org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager.unregister(CuratorLeaderElectionManager.java:151)
        at org.apache.nifi.controller.FlowController.setClustered(FlowController.java:3744)
        at org.apache.nifi.controller.StandardFlowService.handleConnectionFailure(StandardFlowService.java:554)
        at org.apache.nifi.controller.StandardFlowService.load(StandardFlowService.java:518)
        at org.apache.nifi.web.server.JettyServer.start(JettyServer.java:800)
        at org.apache.nifi.NiFi.<init>(NiFi.java:160)
        at org.apache.nifi.NiFi.main(NiFi.java:267)
2017-06-22 16:25:56,141 INFO [Thread-1] org.apache.nifi.NiFi Initiating shutdown of Jetty web server...
2017-06-22 16:25:56,146 INFO [Thread-1] o.eclipse.jetty.server.AbstractConnector Stopped ServerConnector@785cfd33{SSL,[ssl, http/1.1]}{nifi001:9443}
2017-06-22 16:25:56,146 INFO [Thread-1] org.eclipse.jetty.server.session Stopped scavenging

thanks for help

5 REPLIES 5

@mayki wogno

To get your node back into the cluster. Copy the users.xml, authorizations.xml and the flow.xml.gz from one of the nodes in the cluster and then restart the NiFi on that node. It should join the cluster.

Hi, This workaround is working, but it is happening kind of frequently in our environment and very annoying.

Why is nifi node not able to get the latest copy from primary node and join the cluster automatically?

Appreciate if you can share any insights why this happens and any reliable solution?

Master Guru
@Ravi Papisetti

The specific mismatch is between the authorizations.xml on the disconnected node and the authorizations being used by the cluster coordinator. The Primary node plays no role in this process. You say that this happens often? Is the the same reason given every time (is it always because of authorizations mismatch)? The authorizations.xml file gets updated anytime an access policy is added, removed, or modified. For some reason the authorizations.xml file is not being updated on this node.

Verify proper ownership and permission on the users.xml, authorizations.xml, and flow.xml.gz files and containing directories. The user that owns the NiFi process must be able to read, and write these files.

If ownership is not an issue, you will want to check your nifi-user.log for any issues when replication requests are being made. This occurs when a change is made while logged in to any cluster node. The change must be replicated to all nodes and there may be a authentication/authorization issue preventing this node from updating.

Thanks,

Matt

The reason is not authorizations.xml (not updated during these times.). It is varying:

Sometimes it is as folows

2017-10-23 09:02:23,516 INFO [Process Cluster Protocol Request-6] o.a.n.c.c.node.NodeClusterCoordinator Status of creando-qa26.cisco.com:8090 changed from NodeConnectionStatus[nodeId=hostxxx:8090, state=DISCONNECTED, Disconnect Code=Failed to Service Request, Disconnect Reason=Failed to process request POST /nifi-api/process-groups/487c7858-015f-1000-0000-000028594ca6/template-instance, updateId=48] to NodeConnectionStatus[nodeId=nodexxx:8090, state=REMOVED, updateId=51]

Another time it said node was shutdown. Checking our infra why it got restarted during that time.

Master Guru

@Ravi Papisetti

My suggestion here would be to make the following changes in your nifi.properties file on all nodes:

nifi.cluster.node.protocol.threads=15
nifi.cluster.node.connection.timeout=30 secs
nifi.cluster.node.read.timeout= 30 secs

This will give nodes longer to respond to change requests before they get dropped by cluster coordinator.

You may also want to keep an eye our for any OOM or GC issues on your nodes that may be occurring at the times of making these changes.

Thanks,

Matt