Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

​Can nifi promise each of the flowfiles can be processed correctly , no missing and no duplicate ?

avatar
Rising Star

Can nifi promise each of the flowfiles can be processed correctly, no missing and no duplicate ?

How the nifi realize it.

Thanks for your reply.

1 ACCEPTED SOLUTION

avatar
Expert Contributor

@David DN what you are describing here is notion of "Exactly Once Delivery." I would refer you to http://bravenewgeek.com/you-cannot-have-exactly-once-delivery/ in order to get an understanding of why this is actually not possible in any distributed system.

Often what we hear people discussing is the notion of "Exactly Once semantics" in order to overcome this. However, the notion of Exactly Once semantics can be achieved between two systems only if the sending system can guarantee At Least Once delivery and the receiving side provides a mechanism for data de-duplicaiton.

When NiFi receives data from an external source, it does provide the capability for data de-duplication via the DetectDuplicate processor. So you can construct your flow so that if you receive data multiple times, you will process it only once. However, this is only achieved if you are receiving data over a reliable channel (for instance, ListenUDP may drop data as the UDP protocol is inherently lossy).

NiFi generally will guarantee At Least Once delivery of your data (I say generally because it depends on the processor. For instance, the PutKafka processor will provide At Least Once delivery if configured to do so but if configured as Best Effort delivery, it may not) wen sending to an external system. However, to ensure that data is not duplicated on the receiving system, it would require that the receiving system also have some way to de-duplicate data.

View solution in original post

2 REPLIES 2

avatar
Expert Contributor

@David DN what you are describing here is notion of "Exactly Once Delivery." I would refer you to http://bravenewgeek.com/you-cannot-have-exactly-once-delivery/ in order to get an understanding of why this is actually not possible in any distributed system.

Often what we hear people discussing is the notion of "Exactly Once semantics" in order to overcome this. However, the notion of Exactly Once semantics can be achieved between two systems only if the sending system can guarantee At Least Once delivery and the receiving side provides a mechanism for data de-duplicaiton.

When NiFi receives data from an external source, it does provide the capability for data de-duplication via the DetectDuplicate processor. So you can construct your flow so that if you receive data multiple times, you will process it only once. However, this is only achieved if you are receiving data over a reliable channel (for instance, ListenUDP may drop data as the UDP protocol is inherently lossy).

NiFi generally will guarantee At Least Once delivery of your data (I say generally because it depends on the processor. For instance, the PutKafka processor will provide At Least Once delivery if configured to do so but if configured as Best Effort delivery, it may not) wen sending to an external system. However, to ensure that data is not duplicated on the receiving system, it would require that the receiving system also have some way to de-duplicate data.

avatar
Rising Star

Thanks very much!