03-02-2016 11:38 PM
I have a scenario where I need to transfer file data over Apache Kafka Queue.
If I get a big file then the file is split into multiple chunks and each chunk is sent over KAFKA topic. It is all good if all the messages are sent to kafka and processed from topic.
But, say I have split the file into 10 chunks and after sending 6 out of 10 chunks, there is some issue and I am not able to send next 4. In case of resend of file, it is again split into 10 chunks and earlier sent 6 blocks are resend making it duplicate.
Having said do we have any transaction management in Apache Kafka, where if I am not able to send all chunks ,whatever chunks sent earlier before commit has to be discarded from kafka queue.
Any help will be of great help.
03-08-2016 01:03 PM
It sounds like you are asking for transaction management for a group of kafka transactions. That would be up to you to write currently, as Kafka would only guarantee each individual message would be delivered at least once. see here for more information you are correct in that each of the 10 messages (piece of the original file in your example) would be delivered at least once, but could be delivered multiple times, and it sounds like you have additional logic ontop of that, that would deliver the file pieces again should it get interrupted.