I have a Kafka Producer running in order to receive Server-sent events (SSE).
Unfortunately the events cannot be buffered at source for re-retrieval, so I have to make sure that my Kafka producer is always available receiving the events. This approach though implies the risk that if my producer is going down I would definitely loose events.
What would be the best approach from a architectural perspective to make this mechanism fail save?
I assume running two producers against the same source would generate duplicate events which I would like to avoid ...
Thanks, in advance,
Kafka's mirroring feature makes it possible to maintain a replica of an existing Kafka cluster. The following diagram shows how to use theMirrorMaker tool to mirror a source Kafka cluster into a target (mirror) Kafka cluster. The tool uses a Kafka consumer to consume messages from the source cluster, and re-publishes those messages to the local (target) cluster using an embedded Kafka producer.
Thanks Neeraj, very good blog. Though still my problem is less on the kafka cluster/broker site but more on the unix process / kafka producer side ... I cannot affort that my producer is going down (because the source is sending the events only once) ... this leads to the idea that I am having 2 producers running against the same source ... but then I have duplicate messages ... of course 2 active/active redundant producers could write into different topics ... but anyhow I have to sort out the deduplication manually afterwards and this is what I would like to avoid ... I am searching ideas for high available kafka producers ... any ideas?
Thanks for clarifying that the problem is less on kafka side. Not sure why such good information deserved a down vote.
so..the question is to have HA for the original source. You can solve this problem by having true DR. I have seen the architecture where customer lands the data in the safe/HA zone to avoid data loss.