Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Share topics for different kafka connectors

avatar
Explorer

Here is the scenario. I need two confluent jdbc source connectors to set up in kafka connectors. Both connectors will source data from the same table. The first connector will be setup in timestamp|incrementing mode and the second connector will be set up in bulk mode and will be used on demand in cases like complete reload due to data corruption.

Is it possible and desirable to have both connectors to share the same topic?

Thanks,

Mark

1 ACCEPTED SOLUTION

avatar

@Mark Lin

You shouldn't use the same topic for both these operations, IMHO. Here is a scenario I can think of. Let's say we are using topic topic1 to fetch the data in an increment manner. Kafka persists the data by default for 7 days. So let's say you have last 7 days data in the topic already!

Now you get this request to do the "bulk" data load again and suddenly you have all the data pulled to this topic, with some 7 days of data already in the table. That will cause data duplication in Kafka and your downstream consumer needs to do this extra effort to make sure you don't persist the duplicate records in the final storage. That might be some extra work to do.

Also, a bulk load can bypass Kafka all together and we can use either NiFi or Sqoop to do it, whenever needed.

Hope that helps!

View solution in original post

2 REPLIES 2

avatar

@Mark Lin

You shouldn't use the same topic for both these operations, IMHO. Here is a scenario I can think of. Let's say we are using topic topic1 to fetch the data in an increment manner. Kafka persists the data by default for 7 days. So let's say you have last 7 days data in the topic already!

Now you get this request to do the "bulk" data load again and suddenly you have all the data pulled to this topic, with some 7 days of data already in the table. That will cause data duplication in Kafka and your downstream consumer needs to do this extra effort to make sure you don't persist the duplicate records in the final storage. That might be some extra work to do.

Also, a bulk load can bypass Kafka all together and we can use either NiFi or Sqoop to do it, whenever needed.

Hope that helps!

avatar
Explorer

Thanks Rahul. Using separate topic means good isolation. Directly using Nifi is a good option.

Mark