I have a kafka spark streaming job which runs a consumer reading data every 10 minutes which reads from a producer and the job runs fine for a while up to 25 hours but then for some reason it fails and it gives this error message.
Each time after about that time it fails with the same error message attached below.
As well I just attached the a more detailed error message the shutdown hook seems to be called on its own.
Looks like the error message you highlightet is just a result of the 'driver terminated or disconnected' event before. I think your client gets disconnected and the next 'read' operation is then failing. I think the root cause for the disconnect is what needs to be identified.
I can only guess on this, as nothing is in the screenshot, but as you write 24h up to 25h, I would investigate in these directions:
Thanks for the reply. Ive looked through my logs yarn logs stdout and stderr and have not found anything that explain this. Now I am facing issue where it may time out in only a few iterations of running the spark streaming I am trying to pinpoint the cause do you have any suggestion how I may go about it. It is very unpredictable may run well for 5 hours or might just crash on first run like 5 minutes.
I guess you still have the same log entries? 'driver terminated or disconnected'? Is there any hint in the log before that message? What happens if you try a reconnect directly after the disconnect?
How is your client connected to the cluster? I.e. via LAN or via Internet?
Thanks for your help I just added a more detailed error message sorry for the delay if you have seen anything like that before. Thanks