Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

​Can nifi promise each of the flowfiles can be processed correctly , no missing and no duplicate ?

Solved Go to solution

​Can nifi promise each of the flowfiles can be processed correctly , no missing and no duplicate ?

Contributor

Can nifi promise each of the flowfiles can be processed correctly, no missing and no duplicate ?

How the nifi realize it.

Thanks for your reply.

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: ​Can nifi promise each of the flowfiles can be processed correctly , no missing and no duplicate ?

Rising Star

@David DN what you are describing here is notion of "Exactly Once Delivery." I would refer you to http://bravenewgeek.com/you-cannot-have-exactly-once-delivery/ in order to get an understanding of why this is actually not possible in any distributed system.

Often what we hear people discussing is the notion of "Exactly Once semantics" in order to overcome this. However, the notion of Exactly Once semantics can be achieved between two systems only if the sending system can guarantee At Least Once delivery and the receiving side provides a mechanism for data de-duplicaiton.

When NiFi receives data from an external source, it does provide the capability for data de-duplication via the DetectDuplicate processor. So you can construct your flow so that if you receive data multiple times, you will process it only once. However, this is only achieved if you are receiving data over a reliable channel (for instance, ListenUDP may drop data as the UDP protocol is inherently lossy).

NiFi generally will guarantee At Least Once delivery of your data (I say generally because it depends on the processor. For instance, the PutKafka processor will provide At Least Once delivery if configured to do so but if configured as Best Effort delivery, it may not) wen sending to an external system. However, to ensure that data is not duplicated on the receiving system, it would require that the receiving system also have some way to de-duplicate data.

View solution in original post

2 REPLIES 2
Highlighted

Re: ​Can nifi promise each of the flowfiles can be processed correctly , no missing and no duplicate ?

Rising Star

@David DN what you are describing here is notion of "Exactly Once Delivery." I would refer you to http://bravenewgeek.com/you-cannot-have-exactly-once-delivery/ in order to get an understanding of why this is actually not possible in any distributed system.

Often what we hear people discussing is the notion of "Exactly Once semantics" in order to overcome this. However, the notion of Exactly Once semantics can be achieved between two systems only if the sending system can guarantee At Least Once delivery and the receiving side provides a mechanism for data de-duplicaiton.

When NiFi receives data from an external source, it does provide the capability for data de-duplication via the DetectDuplicate processor. So you can construct your flow so that if you receive data multiple times, you will process it only once. However, this is only achieved if you are receiving data over a reliable channel (for instance, ListenUDP may drop data as the UDP protocol is inherently lossy).

NiFi generally will guarantee At Least Once delivery of your data (I say generally because it depends on the processor. For instance, the PutKafka processor will provide At Least Once delivery if configured to do so but if configured as Best Effort delivery, it may not) wen sending to an external system. However, to ensure that data is not duplicated on the receiving system, it would require that the receiving system also have some way to de-duplicate data.

View solution in original post

Highlighted

Re: ​Can nifi promise each of the flowfiles can be processed correctly , no missing and no duplicate ?

Contributor

Thanks very much!

Don't have an account?
Coming from Hortonworks? Activate your account here