Created 07-19-2020 06:38 AM
Newbie question, apologies. We have a need to backup a Kafka cluster, so that we can restore to a given point in time (as far as possible according to backup granularity) in case of problems, e.g. bad data. Replication would not help here, since bad data could be replicated.
Does anyone out there have such a use case, and did you solve it (with Cloudera or open-source tools)?
Thanks in advance.
Created 07-20-2020 11:27 PM
Want to get a detailed solution you have to login/registered on the community
Register/LoginCreated 07-19-2020 09:47 PM
There's an open source tool kafka-backup that sounds like what you are looking for. I'm not sure I follow your granularity point though.
Created 07-20-2020 08:04 AM
Thanks
Yes, I came across this kafka-backup when doing searches around this area. But I was hoping that maybe there would have been support from Cloudera itself, as a vendor that wraps Kafka with value-added-services.
Regarding granularity, I meant that if I took a backup every six hours, I would presumably be able to return to point-of-time only at that granularity, e.g. to state at 13:00, 19:00, 01:00, 07:00, etc. Unless the backup capability included a continuous log that allowed fine-grained return to point of time.
Created 07-20-2020 08:59 AM
Ok, I get your granularity point. Thanks for clarifying.
Unfortunately we don't have a Cloudera supported tool that can do a simple backup of the Kafka cluster. I can only speculate on the reason, but this is likely a rare case where a backup (rather than replication) is required.
Created 07-20-2020 11:27 PM
Want to get a detailed solution you have to login/registered on the community
Register/Login