Support Questions

Find answers, ask questions, and share your expertise

What if nifi fails to write data?

avatar
New Contributor

If nifi is writing to a disk using PutHDFS or PutFile, and it is killed or process goes down, after writing 4 blocks out of 10. After restoring will nifi write from 5th block onwards or rewrite the whole file again creating duplicates?

1 ACCEPTED SOLUTION

avatar
Master Mentor
@Sharoon Babu

NiFi processors like these execute against FlowFiles on inbound connections to the processor. The FlowFile is only removed from the inbound connection when that code execution results in that FlowFile being transitioned to an outbound connection.

-

There are two types of scenarios here:

1. NiFi is shutdown or dies in the middle of a processors execution. This means the FlowFile was never transferred to an outbound connection. When NiFi is restarted, NiFi will reload FlowFiles in to the last connection they were recorded as belong to. In this case that would be an inbound connection. The consuming processor of that connection will then be scheduled to run/execute again. Processors do not record and intermediate phase fo processing and thus will begin executing against the entire FlowFile again.
-
2. Some network failure results in execution not being able to complete. NiFi processors should acknowledge failures in such case which would result in the the FlowFile(s) being moved from the inbound connection to an outbound connection (like a "failure" relationship). It is the responsibility of the dataflow designer to account for such unexpected failures and route those outbound failure relationships accordingly. Often times failure type relationships may be just looped back on the same processor for retry. Wherever this FlowFile is routed (even if in a loop), Execution will again be against the entire Flowfiles content again.
-
The target systems should handle such scenarios and not except unconfirmed file transfers.

-

For example: PutFile will write the file using a "dot" rename strategy. The FlowFiles content is originally written as a ".<filename>" and then upon successful completion of writing the data, the filename is renamed from ".<filename>" to just "<filename>". Since dot files are in most cases considered hidden files and ignored by source systems that incomplete transfer would be ignored by destination system. Upon recover and re-attempt (depending on processor configuration) NiFi will repeat this process.
-
There are some unavoidable scenarios that at times can lead to some data duplication. Considering NiFi's design architecture, NiFi has always favored data duplication over data loss in such rare scenarios.

-

Thank you,

Matt

-

If you found this answer has addressed your question, please take a moment to log in and click the "accept" link on the answer.

View solution in original post

1 REPLY 1

avatar
Master Mentor
@Sharoon Babu

NiFi processors like these execute against FlowFiles on inbound connections to the processor. The FlowFile is only removed from the inbound connection when that code execution results in that FlowFile being transitioned to an outbound connection.

-

There are two types of scenarios here:

1. NiFi is shutdown or dies in the middle of a processors execution. This means the FlowFile was never transferred to an outbound connection. When NiFi is restarted, NiFi will reload FlowFiles in to the last connection they were recorded as belong to. In this case that would be an inbound connection. The consuming processor of that connection will then be scheduled to run/execute again. Processors do not record and intermediate phase fo processing and thus will begin executing against the entire FlowFile again.
-
2. Some network failure results in execution not being able to complete. NiFi processors should acknowledge failures in such case which would result in the the FlowFile(s) being moved from the inbound connection to an outbound connection (like a "failure" relationship). It is the responsibility of the dataflow designer to account for such unexpected failures and route those outbound failure relationships accordingly. Often times failure type relationships may be just looped back on the same processor for retry. Wherever this FlowFile is routed (even if in a loop), Execution will again be against the entire Flowfiles content again.
-
The target systems should handle such scenarios and not except unconfirmed file transfers.

-

For example: PutFile will write the file using a "dot" rename strategy. The FlowFiles content is originally written as a ".<filename>" and then upon successful completion of writing the data, the filename is renamed from ".<filename>" to just "<filename>". Since dot files are in most cases considered hidden files and ignored by source systems that incomplete transfer would be ignored by destination system. Upon recover and re-attempt (depending on processor configuration) NiFi will repeat this process.
-
There are some unavoidable scenarios that at times can lead to some data duplication. Considering NiFi's design architecture, NiFi has always favored data duplication over data loss in such rare scenarios.

-

Thank you,

Matt

-

If you found this answer has addressed your question, please take a moment to log in and click the "accept" link on the answer.