Support Questions
Find answers, ask questions, and share your expertise

Spark Streaming Offset Lost

Highlighted

Spark Streaming Offset Lost

We are using HDP version 2.6.0.3 and use spark streaming ( version ;2.1.0) to read data from kafka and perisit to hive. We are seeing an unusual behavior. The spark job failed with an error and we had to restart it. Upon restart , it is reading all over from the beginning ignoring the previous commits for the groupid. This happens say once in 1 month. Spark Streaming is storing offset in Kafka and we can see from the Kafka Manager that the last committed offset is properly reflected in Kafka. Not sure why spark streaming is not picking it up from Kafka and instead starts reading from the very beginning. Can you please help.

Don't have an account?