Support Questions

roshanbi · ‎06-28-2022

Hello Team,

we are doing CDC by pushing data to Kafka and another pipeline will be reading data from Kafka. Whenever we restart the second pipeline (read from Kafka to Kudu), I notice there are thousands of records coming.

I would like to know how Kafka keeps the checkpoints? Is there any setting to change it?

Thanks,

Roshan

araujo · ‎06-29-2022

@roshanbi ,

You must configure your Kafka consumer to use a consumer group and enable offset commits. This way the client will periodically save the last read offset internally in Kafka so that it can pick up from where it left off upon restarts.

Please check the Kafka documentation for the meaning of the properties below:

group.id
enable.auto.commit
auto.offset.reset

Cheers,

André

--
Was your question answered? Please take some time to click on "Accept as Solution" below this post.
If you find a reply useful, say thanks by clicking on the thumbs up button.

DianaTorres · ‎07-01-2022

@roshanbi Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks

Regards,

Diana Torres,
Community Moderator

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
Community Guidelines
How to use the forum

Support Questions

Kafka checkpoint