Member since
03-17-2016
2
Posts
0
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
17099 | 03-17-2016 08:30 AM |
03-23-2016
05:51 AM
To be clear the old receiver based consumer doesn't solve the problem for you either. You'll get data loss unless you're using hdfs as a write ahead log, and even then it won't allow for exactly once output semantics.
... View more
03-17-2016
08:30 AM
Yeah checkpoints aren't great for that. I honestly don't rely on them. There are examples of how to save offsets yourself in https://github.com/koeninger/kafka-exactly-once/ specifically https://github.com/koeninger/kafka-exactly-once/blob/master/src/main/scala/example/TransactionalPerPartition.scala The newer Kafka consumer will allow committing offsets to kafka (which still isn't transactional, but should be better than Zookeeper in most ways). There's work towards making the new consumer work with spark at https://issues.apache.org/jira/browse/SPARK-12177 and https://github.com/koeninger/spark-1/tree/kafka-0.9/external/kafka-beta/src/main/scala/org/apache/spark/streaming/kafka
... View more