Support Questions

Find answers, ask questions, and share your expertise

Nifi can't perform delete operation if one node is down

avatar
Contributor

Hi Team,

 

I am using 3 - node Nifi cluster, version : 1.23.2. I observed that Nifi throws error for deleting any processor/parameter context if any node is down. Can you please share the reason of it? Is there any document from Apache that clearly list out the restrictions of task list if any one node is down?

Thanks,

Priyanka

1 ACCEPTED SOLUTION

avatar
Master Mentor

@PriyankaMondal 

In version of Apache NiFi older then 1.16, NiFi does not allow any edits within the NiFi cluster while a node is disconnected.  Changes are only allowed on the actual disconnected node.

In Apache NiFi 1.16.0 NiFi introduced a new flow inheritance feature that allowed joining nodes with an existing flow.xml.gz/flow.json.gz that does not match the cluster elected flow to join the cluster by inheriting the cluster elected flow.  A joining node would only be blocked from this process if the inheritance of the cluster flow would result in dataloss (meaning the joining node's flow contains a connection holding queued FlowFiles and the cluster elected flow does not have that connection).

Later it was determined that this change can make it difficult handle the outcome of above issue.  https://issues.apache.org/jira/browse/NIFI-11333   So it was decided that the best course of action was not allow any component deletion while a node is disconnected.

When a NiFi node is started it attempts to join that node to the cluster.  If the nodes fails to join the cluster, it shuts back down to avoid users from mistakenly using it as a standalone node. That means user had no easy way to handle the queued data in connection preventing the rejoin.  Of course users could configure the node to come up standalone, but that does not make things any easier on the end user.  The node loads up standalone, loads its FlowFiles and depending in whether auto.resume was set or not, start processing FlowFiles.  This still leaves the user with FlowFiles queued in many connection all throughout the UI would have a very difficult time determining which connection(s) were removed and need to be processed out in order to rejoin the cluster.  So decision was made to stop allowing deletion when a node is disconnected.

That being said, when a NIFi cluster has a disconnected node, users can decide to navigate to the cluster UI and drop the disconnected node(s) from the cluster.  The cluster will now have full functionality again as it will report all existing nodes as connected.  It will require a restart of the dropped node(s) to get them to attempt to connect to the cluster again.  But keep in mind that when it attempts to join cluster and inherit the cluster flow you may run into the problem described above.

Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.

Thank you,
Matt

View solution in original post

1 REPLY 1

avatar
Master Mentor

@PriyankaMondal 

In version of Apache NiFi older then 1.16, NiFi does not allow any edits within the NiFi cluster while a node is disconnected.  Changes are only allowed on the actual disconnected node.

In Apache NiFi 1.16.0 NiFi introduced a new flow inheritance feature that allowed joining nodes with an existing flow.xml.gz/flow.json.gz that does not match the cluster elected flow to join the cluster by inheriting the cluster elected flow.  A joining node would only be blocked from this process if the inheritance of the cluster flow would result in dataloss (meaning the joining node's flow contains a connection holding queued FlowFiles and the cluster elected flow does not have that connection).

Later it was determined that this change can make it difficult handle the outcome of above issue.  https://issues.apache.org/jira/browse/NIFI-11333   So it was decided that the best course of action was not allow any component deletion while a node is disconnected.

When a NiFi node is started it attempts to join that node to the cluster.  If the nodes fails to join the cluster, it shuts back down to avoid users from mistakenly using it as a standalone node. That means user had no easy way to handle the queued data in connection preventing the rejoin.  Of course users could configure the node to come up standalone, but that does not make things any easier on the end user.  The node loads up standalone, loads its FlowFiles and depending in whether auto.resume was set or not, start processing FlowFiles.  This still leaves the user with FlowFiles queued in many connection all throughout the UI would have a very difficult time determining which connection(s) were removed and need to be processed out in order to rejoin the cluster.  So decision was made to stop allowing deletion when a node is disconnected.

That being said, when a NIFi cluster has a disconnected node, users can decide to navigate to the cluster UI and drop the disconnected node(s) from the cluster.  The cluster will now have full functionality again as it will report all existing nodes as connected.  It will require a restart of the dropped node(s) to get them to attempt to connect to the cluster again.  But keep in mind that when it attempts to join cluster and inherit the cluster flow you may run into the problem described above.

Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.

Thank you,
Matt