Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Share topics for different kafka connectors

Solved Go to solution
Highlighted

Share topics for different kafka connectors

New Contributor

Here is the scenario. I need two confluent jdbc source connectors to set up in kafka connectors. Both connectors will source data from the same table. The first connector will be setup in timestamp|incrementing mode and the second connector will be set up in bulk mode and will be used on demand in cases like complete reload due to data corruption.

Is it possible and desirable to have both connectors to share the same topic?

Thanks,

Mark

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Share topics for different kafka connectors

@Mark Lin

You shouldn't use the same topic for both these operations, IMHO. Here is a scenario I can think of. Let's say we are using topic topic1 to fetch the data in an increment manner. Kafka persists the data by default for 7 days. So let's say you have last 7 days data in the topic already!

Now you get this request to do the "bulk" data load again and suddenly you have all the data pulled to this topic, with some 7 days of data already in the table. That will cause data duplication in Kafka and your downstream consumer needs to do this extra effort to make sure you don't persist the duplicate records in the final storage. That might be some extra work to do.

Also, a bulk load can bypass Kafka all together and we can use either NiFi or Sqoop to do it, whenever needed.

Hope that helps!

2 REPLIES 2

Re: Share topics for different kafka connectors

@Mark Lin

You shouldn't use the same topic for both these operations, IMHO. Here is a scenario I can think of. Let's say we are using topic topic1 to fetch the data in an increment manner. Kafka persists the data by default for 7 days. So let's say you have last 7 days data in the topic already!

Now you get this request to do the "bulk" data load again and suddenly you have all the data pulled to this topic, with some 7 days of data already in the table. That will cause data duplication in Kafka and your downstream consumer needs to do this extra effort to make sure you don't persist the duplicate records in the final storage. That might be some extra work to do.

Also, a bulk load can bypass Kafka all together and we can use either NiFi or Sqoop to do it, whenever needed.

Hope that helps!

Re: Share topics for different kafka connectors

New Contributor

Thanks Rahul. Using separate topic means good isolation. Directly using Nifi is a good option.

Mark