Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Node disconnectes from Nifi cluster

avatar
New Contributor

Dears,

 

We have two node using embedded Zookeeper. The first node ran okay but the second node it always disconnects from the cluster with this error:

IAJ_0-1638909620116.png

 

 

CONNECTING to DISCONNECTED due to Proposed configuration is not inheritable by the flow controller because of flow differences: Found difference in Flows: Local Fingerprint: s.email.ConsumeIMAPNO_VALUEorg.apache.nifinifi-email-nar1.13.210 sec30 sec1 secWARNfalseTIMER_DRIVENPRIMARY0Mark Messages as Read=truefolder=Inboxhost=outlook.office365.commail.imap.socketFactory.clas Cluster Fingerprint: s.email.ConsumeIMAPNO_VALUEorg.apache.nifinifi-email-nar1.13.210 sec30 sec1 secWARNfalseTIMER_DRIVENALL0Mark Messages as Read=truefolder=Inboxhost=outlook.office365.commail.imap.socketFactory.class=ja

 

 

 

BTW, using Nifi 1.13.2

1 ACCEPTED SOLUTION

avatar
Master Collaborator

Hello @IAJ

 

Good Day. 

 

From the error in the community post I see " Proposed configuration is not inheritable by the flow controller because of flow differences: Found difference in Flows: Local Fingerprint" 

 

- I see the flow.xml.gz is not in sync with the coordinator (nifi node).

- You can follow the below steps to get the node connected to the cluster. 

      1) SSH to the nifi node which is disconnected from the nifi cluster.

       2) Take a backup of the existing flow.xml.gz and move it to a different location. 

       3) Remove the flow.xml.gz after taking the backup (make sure you note the permission and ownership               of the flow.xml.gz ) 

       4) SSH into the coordinator nifi node, you can see the coordinator node on nifi cluster, where you see all the  nodes are disconnected/disconnected

       5) SCP the flow.xml.gz from the coordinator node to the disconnected nifi node (in your home folder).

       6) Now copy the flow.xml.gz to the exact location from where you have removed the flow.xml.gz                       (step3) , once you copy it to the original location make sure the permissions and ownership are                      updated properly.

       7) Once the steps are followed try to restart the node from .from backend uisng ./nifi.sh start (you need to find the location where are those scripts in your cluster.  Don't connect the node from nifi UI.

 

Another way is to take backup of flow.xml.gz on the disconnected node and remove the flow.xml.gz from the location and start the nifi service.  make sure there is no defunct/zoombie process already running for nifi on the disconnected node.

 

===

Reason for disconnection and reconnection :

- Can you please confirm If there are multiple processors that are in disabled states? And How many templates are there in your nifi registry?

 

View solution in original post

4 REPLIES 4

avatar
Expert Contributor

Hi @IAJ, typically this happens when the local flow is different from the cluster flow, this can be due to a mismatch in processors, connections, etc. 

 

Does this persist after the following steps?

1. Stop NiFi on this node
2. Copy the flow.xml.gz from the current coordinator to the failing node
3. Start NiFi 

 

avatar
Master Collaborator

Hello @IAJ

 

Good Day. 

 

From the error in the community post I see " Proposed configuration is not inheritable by the flow controller because of flow differences: Found difference in Flows: Local Fingerprint" 

 

- I see the flow.xml.gz is not in sync with the coordinator (nifi node).

- You can follow the below steps to get the node connected to the cluster. 

      1) SSH to the nifi node which is disconnected from the nifi cluster.

       2) Take a backup of the existing flow.xml.gz and move it to a different location. 

       3) Remove the flow.xml.gz after taking the backup (make sure you note the permission and ownership               of the flow.xml.gz ) 

       4) SSH into the coordinator nifi node, you can see the coordinator node on nifi cluster, where you see all the  nodes are disconnected/disconnected

       5) SCP the flow.xml.gz from the coordinator node to the disconnected nifi node (in your home folder).

       6) Now copy the flow.xml.gz to the exact location from where you have removed the flow.xml.gz                       (step3) , once you copy it to the original location make sure the permissions and ownership are                      updated properly.

       7) Once the steps are followed try to restart the node from .from backend uisng ./nifi.sh start (you need to find the location where are those scripts in your cluster.  Don't connect the node from nifi UI.

 

Another way is to take backup of flow.xml.gz on the disconnected node and remove the flow.xml.gz from the location and start the nifi service.  make sure there is no defunct/zoombie process already running for nifi on the disconnected node.

 

===

Reason for disconnection and reconnection :

- Can you please confirm If there are multiple processors that are in disabled states? And How many templates are there in your nifi registry?

 

avatar
Super Mentor

@IAJ 

 

Specific to your case we see the following difference:
Local Fingerprint (This is fingerprint of flow from node trying to join cluster):

s.email.ConsumeIMAPNO_VALUEorg.apache.nifinifi-email-nar1.13.210 sec30 sec1 secWARNfalseTIMER_DRIVENPRIMARY0Mark Messages as Read=truefolder=Inboxhost=outlook.office365.commail.imap.socketFactory.clas 

Cluster Fingerprint (This is fingerprint of flow currently being used by the cluster):

s.email.ConsumeIMAPNO_VALUEorg.apache.nifinifi-email-nar1.13.210 sec30 sec1 secWARNfalseTIMER_DRIVENALL0Mark Messages as Read=truefolder=Inboxhost=outlook.office365.commail.imap.socketFactory.clas


If we look closely we see that you have a consumeIMAP processor in your cluster flow configured to execute on "ALL" nodes.  On the node trying to connect the same consumeIMAP processor is configured to execute on "Primary" node only.

MattWho_0-1638988500384.png

 

A common scenario where this can happen is if the node was "Disconnected" from the cluster and a user edited the configuration of this processor at that time.  Now that node can not rejoin the cluster because of the mismatch.   As other have said the flow loaded from the flow.xml.gz in to NiFi JVM Heap memory must match across all nodes.  Since all nodes run the same flow.xml.gz, you can copy the this file form any node connected to the cluster to the node failing to connect.

NOTE:  Understand that by this, you will lose those local changes made while that node was disconnected.

Another scenario is the configuration change was made while both nodes were connected, which means that changed was replicated to all nodes in the cluster.  If in your 2 node cluster only one node responded to that replicated request that the change was made, the node(s) that did not would get disconnected.  Failure to respond to a replication request can happen most often because of server resource issues and/or network issues.


Thank you,

Matt



avatar
Community Manager

@IAJ Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thank you.


Regards,

Diana Torres,
Community Moderator


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community: