Created 09-20-2022 08:51 AM
Hello everyone,
I'm running a 11-nodes NiFi 1.15.3 cluster. One of the process groups is versioned on NiFi Registry and for some reason the local flowfile does not reflect the versioned configuration, so now the process group is stuck: I cannot do anything on it, not even moving it on the canvas because it always return an error:
Node XXXXXXXXX is unable to fulfill this request due to: [15, xxxxx-xxxxx-xxxxxx] is not the most up-to-date revision. This component appears to have been modified
The local configuration shows no changes, and nothing I tried so far worked (deleting the flow file, restarting the cluster node, etc). so I just want to delete the process group and deploy it again from the registry, but the web interface won't let me, throwing the same error.
Is there a way to force the deletion the process group?
Thanks
Created 09-20-2022 10:16 AM
@wasabipeas
The revision is incremented anytime a change occurs on a component to make sure that all nodes are running the exact same dataflow. Revisions have nothing directly to do with version controlled dataflows. If you were to restart your entire cluster (not a rolling restart, but a shutdown all and start all nodes), component revisions will start over.
"for some reason the local flowfile does not reflect the versioned configuration"
Are you saying that if you access the NiFi UI from a different node in your 11 node cluster, this process groups renders differently?
Screenshots would be helpful in understanding your descriptions.
Does the process group indicate it is under version control?
Does it report "local changes"?
Revision issues can happen when a NiFi node is not running the same version as other nodes in the cluster.
Let's say some processor component you are using has a newer version on other nodes and the newer version of the processor introduced a new property. So on some nodes the property exists and on others it does not.
I suggest verifying that all nodes ion your cluster are running the same version of NiFi. Additionally compare the contents of the NiFi lib directory(s) to make sure they are the same on all nodes. This includes any custom lib directories or anything you may have added to the extensions directory.
If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped.
Thank you,
Matt
Created 09-21-2022 03:01 AM
Hi Matt,
Thank you for your help. I'lll try to give you more context about the issue.
Yes, all the nodes of the cluster run the same version: 1.15.3 and the same libs.
Unfortunately, shutting down the entire cluster is not an option because it is a production environment and it receives live streaming data we cannot afford to lose.
We operate the environment this way:
This is why I'm sure that the configurations stored in the registry are valid and working: the same process group on the staging cluster has no issues.
Are you saying that if you access the NiFi UI from a different node in your 11 node cluster, this process groups renders differently?
No, the process group renders the same on all nodes, but every attempt to do anything on it results in the same error message:
We cannot edit it, delete, change version, detach from version control, not even move it. It always returns the same error, albeit no modification is present locally.
In fact, the version control menu seems to "believe" that there are local changes to the PG:
But then, if I select "Show local changes", nothing is shown, as expected:
Same if I select "Revert local changes":
My assumption is that the process group definition in the local flow file is corrupted or not in sync with the version control of the registry.
So I think that the best solution is to just force-delete the PG and the re-create it.
Is there a way to do this?
Thank you
S
Created on 09-22-2022 05:32 AM - edited 09-22-2022 05:59 AM
@wasabipeas
What version of NiFi-Registry is being used as well?
In your NiFi UI, search for component UUID (a8db3982-1350-1b8b-ffff-fffff988699d).
What kind/type of component is it? What is current state of the component (enabled, disabled, running, stopped, enabling, disabling, starting, stopping)
Share screenshot of its current configuration.
Thanks,
Matt
Created 09-22-2022 07:55 AM
Hi Matt,
the registry is also version 1.15.3. Some more context: the cluster contains dozens of other process groups that have been managed in the same way and versioned on the same registry for months. This is the first time we experience this issue.
The component with UUID a8db3982-1350-1b8b-ffff-fffff988699d is the "freezed" process group itself:
This is its configuration:
And its associated controller services:
I still think that the fastest and safest solution is to delete and re-deploy.
Is it possible?
S
Created 09-22-2022 11:04 AM
@wasabipeas
I can think of no way to force delete if it is blocking on a revision mismatch between nodes. Nothing here has anything to do with version control.
Is it always the same node reported in the pop-up message that fails to process the request?
If so, have you verified the libs and version running on that one node match rest of cluster?
If you go to the cluster UI and select "VERSIONS" tab, they all reflect same version?
You could manually disconnect the one node that it keeps complaining about from the "NODES" tab.
After it is disconnected, you could delete it from the cluster (Deleting the nodes does nothing flows or data on that node. It will require a restart of that one node to get it to rejoin cluster).
Once the node is removed form your cluster (temporarily), your cluster should reflect 10/10 connected nodes now in the status bar of the canvas UI.
Check to see if your are still having revision issues with the process after reloading the page.
If all looks good, you could access the filesystem of the currently disconnected and deleted node, stop the NiFi service on that node, and delete/rename the flow.xml.gz and flow.json.gz files. Then start this node again. On startup, NiFi will inherit the flow from the cluster and in doing so get the cluster flows current revision for the problematic process group.
If problem persists, restart node that was deleted so that it rejoins the cluster.
Then disconnected the currently elected cluster coordinator. A new cluster coordinator will then be elected by zookeeper. Check to see if issue with process group is resolved. Reload your browser to force a page refresh.
If issue is resolved, rejoin node to cluster via the cluster UI to see if issue returns. If so, we at least know which is our problematic node. You can of course, disconnect, delete, rename flow.xml.gz and flow.json.gz, and then restart node, just as we performed before so that flow is pulled from cluster on startup. If issue still persists, there is something unique about this node. Disk space ok?, any exceptions in logs?, while node may report same NiFi version, something different with contents of lib(s) folders (get a checksum and compare against other nodes).
Hope this helps without needing to restart entire cluster,
Matt
Created 09-21-2022 03:24 PM
Is your dev cluster running the exact same version of NiFi as production, including the NiFi lib folder?
Created 09-22-2022 07:04 AM
Hi, yes same version and libs.
Created on 08-28-2024 06:21 PM - edited 08-28-2024 06:21 PM
Hi,
I'm currently having the exact same issues. Did you already solve this problem? If you did, could you help me and tell me how to resolved it? Cause it's affecting my deployment process in the production environment
Regards,
Muhammad Dharmawan
Created 09-03-2024 01:34 PM
@wasabipeas @Adhitya
In the thrown exception it reports which node has the mismatched revision.
Is that node the currently elected cluster coordinator?
Have you tried on just that reported node:
1. Stopping NiFi
2. Remove or rename both the flow.xml.gz and flow.json gz files (only deleting one will not work)
3. Restart that NiFi node. It will inherit the flow from the cluster coordinator when it joins.
If this node was the elected cluster coordinator, when you shut it down another node will assume the cluster coordinator role.
----
Another option is to disconnect the node reporting the mismatch in revision. Then from the same cluster UI used to disconnect that node, select to drop/delete it from cluster. Your cluster will now report one less node. See if you can then move the process group or if it reports another node with mismatched revision?
NOTE: Deleting/Dropping a node from cluster using the Cluster UI does nothing to that node. If you restart that node that was deleted/dropped it will rejoin the cluster again.
Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.
Thank you,
Matt