Created on 11-01-2024 06:26 AM - edited 11-01-2024 06:27 AM
We have a Nifi cluster with 3 nodes running in docker containers that are always processing something. I included keys in the truststore.jks file of each node and for the changes have effect, I need to restart each node at a time, so that Nifi could still be running and we don't lose any data . There a way to restart them safely? ChatGPT said that it's possible to execute inside the container the command nifi.sh stop, but I don't know if this will cause data loss.
Created 11-01-2024 08:26 AM
@HenriqueAX
It is safe to restart the NiFi service without encountering any data loss.
NiFi is designed to protect against dataloss relative to the FlowFile traversing connection between processor components added to the NiFi canvas.
FlowFiles are persisted to disk (Content is store in content claims within the "content_repository" and metadata/attributes associated with a FlowFile is stored in the "flowfile_repository"). These repositories should be protected against loss through RAID storage or some other protected storage. When a processor is scheduled to execute it will begin processing of a FlowFile from an inbound connection. Only when a processor has completed execution is the FlowFile moved to one of the processors outbound relationships. If you were to shutdown NiFi or NiFi was to abruptly die, upon restart FlowFiles will be loaded in last known connection and execution on them will start over at that processor's execution. There exists opportunity within some race conditions that data duplication could occur (NiFi happens to die just after processing of FlowFile is complete, but before it is committed to downstream relationship resulting in FlowFile being reprocessed by that component). But this only matters where specific processor is writing out the content external to NiFi Or when NiFi is ingesting data in some scenarios (consuming from a topic and dies after consumption but before offset is written resulting in same messages consumed again).
With a normal NiFi shutdown, NiFi has a configurable shutdown grace period. During that grace period NiFi no longer schedules and processors to execute new threads and NiFi waits up to that configured race period for existing running threads to complete before killing them.
IMPORTANT: Keep in mind that each node in NiFi cluster executes the dataflows on the NiFi canvas against only the FlowFiles present on the individual node. one node has no knowledge of the FlowFiles on another node.
NiFi also persists state (for those components that use local or cluster state) either in a local state directory or in zookeeper for cluster state. Even in a NiFi cluster some components will still use local state (example: ListFile). So protection of the local state directory via RAID storage of other means of protected storage is important. Loss of stare would not result in dataloss, but rather potential for a lot of data duplication through ingestion of same data again (depending on processors used).
Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.
Thank you,
Matt
Created 11-01-2024 08:26 AM
@HenriqueAX
It is safe to restart the NiFi service without encountering any data loss.
NiFi is designed to protect against dataloss relative to the FlowFile traversing connection between processor components added to the NiFi canvas.
FlowFiles are persisted to disk (Content is store in content claims within the "content_repository" and metadata/attributes associated with a FlowFile is stored in the "flowfile_repository"). These repositories should be protected against loss through RAID storage or some other protected storage. When a processor is scheduled to execute it will begin processing of a FlowFile from an inbound connection. Only when a processor has completed execution is the FlowFile moved to one of the processors outbound relationships. If you were to shutdown NiFi or NiFi was to abruptly die, upon restart FlowFiles will be loaded in last known connection and execution on them will start over at that processor's execution. There exists opportunity within some race conditions that data duplication could occur (NiFi happens to die just after processing of FlowFile is complete, but before it is committed to downstream relationship resulting in FlowFile being reprocessed by that component). But this only matters where specific processor is writing out the content external to NiFi Or when NiFi is ingesting data in some scenarios (consuming from a topic and dies after consumption but before offset is written resulting in same messages consumed again).
With a normal NiFi shutdown, NiFi has a configurable shutdown grace period. During that grace period NiFi no longer schedules and processors to execute new threads and NiFi waits up to that configured race period for existing running threads to complete before killing them.
IMPORTANT: Keep in mind that each node in NiFi cluster executes the dataflows on the NiFi canvas against only the FlowFiles present on the individual node. one node has no knowledge of the FlowFiles on another node.
NiFi also persists state (for those components that use local or cluster state) either in a local state directory or in zookeeper for cluster state. Even in a NiFi cluster some components will still use local state (example: ListFile). So protection of the local state directory via RAID storage of other means of protected storage is important. Loss of stare would not result in dataloss, but rather potential for a lot of data duplication through ingestion of same data again (depending on processors used).
Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.
Thank you,
Matt
Created 11-27-2024 11:22 AM
Running the restart shell script inside container did it for me! Thanks for the help @MattWho!