Created on 09-19-2022 02:05 AM - edited 09-19-2022 02:08 AM
Hello,
I wanted to understand what happens to the process and flow files which is currently running when the nifi gets restarted?
Please share the details for the better understanding of nifi.
Created 09-19-2022 05:57 AM
@Sanchari
NiFi FlowFiles reside in connection between NiFi component processors. When a processor gets a thread to execute, it takes the highest priority FlowFile from an inbound connection queue and executes the processor code utilizing that FlowFiles metadata/attributes and content (if processor needs content). The FlowFile is not transferred to a processors outbound connection(s) until execution is complete.
When NiFi is shutdown gracefully (meaning a user has initiated a shutdown), NiFi stops scheduling future component execution. NiFi then gives existing executing threads a grace period to complete their thread execution. At the end of that grace period, any still running threads are killed with the JVM. Since FlowFiles do not transfer to an outbound connection until code execution has completed, and FlowFile that was owned by a thread at the time the thread was killed still remains on the inbound connection. When NiFi is started again and the dataflows started, the file processing will start over when the processor executes again and executes against the highest priority FlowFile in the connection.
Above being said, NiFi will favor data duplication over data loss every time. It is possible in a small window of time that processor executes and part of that execution is let's say to write a file to a remote server. NiFi may for example ack the completion of that transfer to the remote system and NiFi JVM was killed before internally it received ack back from target server. So the FlowFile would end up being processed again resulting potentially data duplication on the target server. These are rare race conditions, but possible.
A restart is nothing more than a standard shutdown followed by a start. The same behavior exists in the shutdown process as described above when a restart is performed.
If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped.
Thank you,
Matt
Created 09-19-2022 05:57 AM
@Sanchari
NiFi FlowFiles reside in connection between NiFi component processors. When a processor gets a thread to execute, it takes the highest priority FlowFile from an inbound connection queue and executes the processor code utilizing that FlowFiles metadata/attributes and content (if processor needs content). The FlowFile is not transferred to a processors outbound connection(s) until execution is complete.
When NiFi is shutdown gracefully (meaning a user has initiated a shutdown), NiFi stops scheduling future component execution. NiFi then gives existing executing threads a grace period to complete their thread execution. At the end of that grace period, any still running threads are killed with the JVM. Since FlowFiles do not transfer to an outbound connection until code execution has completed, and FlowFile that was owned by a thread at the time the thread was killed still remains on the inbound connection. When NiFi is started again and the dataflows started, the file processing will start over when the processor executes again and executes against the highest priority FlowFile in the connection.
Above being said, NiFi will favor data duplication over data loss every time. It is possible in a small window of time that processor executes and part of that execution is let's say to write a file to a remote server. NiFi may for example ack the completion of that transfer to the remote system and NiFi JVM was killed before internally it received ack back from target server. So the FlowFile would end up being processed again resulting potentially data duplication on the target server. These are rare race conditions, but possible.
A restart is nothing more than a standard shutdown followed by a start. The same behavior exists in the shutdown process as described above when a restart is performed.
If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped.
Thank you,
Matt
Created 08-05-2024 11:15 PM
Hi @MattWho
We are streaming data continuosly to our NIFI PutDataBaseRecord Processor.
When any Nifi restart or crash happens , the streaming data during that time get failed with below exception.
we can able to see the failed queues but once we select View to see the data it says below
Please kindly help me on this.
Created 08-06-2024 05:33 AM
@CDC-
I encourage you to start a new community question rather then adding to an existing question with an accepted answer. Your query is really unrelated to this question. Something appears to be happening to your content prior to even reaching the PutDataBaseRecord processor. I say this because the exception shared indicates the processor is looking for the content in an "archived" content claim. Content claims are only moved to archive once the claimant count is zero (meaning no actively queued FlowFiles are still referencing content on that claim). Any content claims moved to archive are subject to removal/deletion by the background archive clean-up thread. So not surprised the content is missing. The real question here is what is the lineage of this FlowFile and at what point upstream from your putDatabaseRecord processor did the problem develop.
Please start a new community question and provide as much detail as possible.
Thanks,
Matt