Created 01-31-2024 10:58 PM
Hi Team,
I am using Nifi (3-node cluster) version 1.23.2 in my environment. My doubt, is there any case if Nifi processor (any processor within a process group) stops suddenly due to load/any other issue? If that happens, then what will be the solution? Does it lead to data loss?
Created 02-01-2024 07:09 AM
@PriyankaMondal
I am not clear by you statement:
if Nifi processor (any processor within a process group) stops suddenly due to load/any other issue
You are saying you see a NiFi processor transition to a stopped state unexpectedly?
This should never happen.
Or are you saying the processor seems to dtop processing FlowFiles even though it is currently in a running/started state?
NiFi queues FlowFiles in connection between processor components. A FlowFile is not removed from the inbound connection to a processor component until that FlowFile has been successfully processed by the consuming processor.
The FlowFile consist of two parts:
1. FlowFile attributes/metadata that is persisted in the NiFi flowfile_repository.
2. FlowFile content persisted within claims inside the content_repository.
To protect from data loss these repositories should be using protected storage such as RAID.
So if NiFi were to suddenly crash or server itself crash, when NiFi is restarted on that down node it will load its flow and then load the FlowFile back in to the connections. Processing will begin again against those FlowFiles by downstream processor component.
NiFi's design favors data duplication over data loss ir order to avoid data loss posibilities. For example: Let's assume that a NiFi processor completed execution against a FlowFile resulting in writing something out to an external endpoint. in response to that successful operation the processor would then move the FlowFile from the inbound connection to some a downstream relationship. If NiFi were to crash in that very moment before the FlowFile was moved, on startup the same FlowFile would load in the inbound connection and get processed again.
Also keep in mind that you are running 3 node NiFi cluster and within a NiFi cluster each connected node runs its own copy of the flow, its own set of repositories, and its own local state. So each node is unaware of the FlowFiles being processed by another node in the same cluster.
Generally speaking when you have a processor that shows active threads indicator on it and zeroed out stats, you either have a very long running thread or a hung thread (only examination of serious of thread dumps can make the determination. Most commonly this is a resource utilization problem. But could also be dataflow design issue, client library issue, or network issue.
If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped.
Thank you,
Matt
Created 02-01-2024 07:09 AM
@PriyankaMondal
I am not clear by you statement:
if Nifi processor (any processor within a process group) stops suddenly due to load/any other issue
You are saying you see a NiFi processor transition to a stopped state unexpectedly?
This should never happen.
Or are you saying the processor seems to dtop processing FlowFiles even though it is currently in a running/started state?
NiFi queues FlowFiles in connection between processor components. A FlowFile is not removed from the inbound connection to a processor component until that FlowFile has been successfully processed by the consuming processor.
The FlowFile consist of two parts:
1. FlowFile attributes/metadata that is persisted in the NiFi flowfile_repository.
2. FlowFile content persisted within claims inside the content_repository.
To protect from data loss these repositories should be using protected storage such as RAID.
So if NiFi were to suddenly crash or server itself crash, when NiFi is restarted on that down node it will load its flow and then load the FlowFile back in to the connections. Processing will begin again against those FlowFiles by downstream processor component.
NiFi's design favors data duplication over data loss ir order to avoid data loss posibilities. For example: Let's assume that a NiFi processor completed execution against a FlowFile resulting in writing something out to an external endpoint. in response to that successful operation the processor would then move the FlowFile from the inbound connection to some a downstream relationship. If NiFi were to crash in that very moment before the FlowFile was moved, on startup the same FlowFile would load in the inbound connection and get processed again.
Also keep in mind that you are running 3 node NiFi cluster and within a NiFi cluster each connected node runs its own copy of the flow, its own set of repositories, and its own local state. So each node is unaware of the FlowFiles being processed by another node in the same cluster.
Generally speaking when you have a processor that shows active threads indicator on it and zeroed out stats, you either have a very long running thread or a hung thread (only examination of serious of thread dumps can make the determination. Most commonly this is a resource utilization problem. But could also be dataflow design issue, client library issue, or network issue.
If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped.
Thank you,
Matt