- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Nifi balancing cause loss of data
- Labels:
-
Apache NiFi
Created ‎06-30-2022 07:37 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I was editing configuration for connection from single node to round robin balancing and then the two nodes goes down.
Then i tried to offload nodes from nifi ui but it seemed which had no effect, and then I tried to decommission node but it gave me this problem after 30 iterations:
Iteration: 27 Successfully executed get-node command Node 898f9 still not offloaded Retry after 10 sec Iteration: 28 Successfully executed get-node command Node 898f9 still not offloaded Retry after 10 sec Iteration: 29 Successfully executed get-node command Node 898f9 still not offloaded Retry after 10 sec ERROR: Nifi node offload with id 898f9, failed!
Does anybody knows how to Offload node in this case?
Created ‎06-30-2022 12:34 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@pandav
You can not offload a NiFi node that is down. Can you clarify what you mean by "down"? Was the NiFi service not running on the nodes you attempted to offload?
The offload option from the cluster UI sends a request to the disconnected (not down) node to offload its queued FlowFiles to nodes still connected to the cluster.
If your nodes are down, you'll need to start the service on those nodes again. On startup (assuming no issues), these nodes will rejoin your cluster. If you plan to decomission a node later, you can use the NiFi cluster UI to manually disconnect a node and then offload that nodes FlowFiles. Once the FlowFiles have been successfully offloaded, the node can be deleted from the cluster using the NiFi cluster UI.
Note: restarting a node that has been dropped/deleted from the cluster will trigger that node to start heartbeating to the cluster and thus reconnect unless you edit the configuration of the node so it does not use the same zookeeper znode as the current cluster (nifi.zookeeper.root.node property in nifi.properties file). https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#basic-cluster-setup
As far as your nodes going down on a configuration change, you'll want to inspect the NiFi logs for an exceptions or timeouts that may have occurred. Network issues, long Garbage Collection (GC) pauses, and resource congestion/exhaustion can lead to nodes not responding or receiving the replicated change request. As a result a node can get disconnected. In the scenarios like this if you are using the latest Apache NiFi release, those nodes should automatically reconnect. Upon reconnect, if the nodes flow does not match the cluster flow, the node will automatically take the clusters flow and join. In order release a flow mismatch would between connecting node and cluster flow, would require manual intervention (copying the flow.xml.gz from a node still in the cluster to the node not connecting).
If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post.
Thank you,
Matt
