Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark streaming kafka job runs successfully then fails after about 24 hours

Spark streaming kafka job runs successfully then fails after about 24 hours

I have a kafka spark streaming job which runs a consumer reading data every 10 minutes which reads from a producer and the job runs fine for a while up to 25 hours but then for some reason it fails and it gives this error message.

Each time after about that time it fails with the same error message attached below.

As well I just attached the a more detailed error message the shutdown hook seems to be called on its own.

45714-shutdowsparkstreaming.png

44395-errorkafkacode.png

4 REPLIES 4
Highlighted

Re: Spark streaming kafka job runs successfully then fails after about 24 hours

Super Collaborator

Looks like the error message you highlightet is just a result of the 'driver terminated or disconnected' event before. I think your client gets disconnected and the next 'read' operation is then failing. I think the root cause for the disconnect is what needs to be identified.

I can only guess on this, as nothing is in the screenshot, but as you write 24h up to 25h, I would investigate in these directions:

  • are you using Kerberos and your client uses a ticket (password) to authenticate? Typically Kerberos tickets do expire after about 24 hours and you will have to renew them.
  • you don't connect via a network with a forced reconnect after 24h (like it is usually the case for dsl lines at home in europe)?
Highlighted

Re: Spark streaming kafka job runs successfully then fails after about 24 hours

Thanks for the reply. Ive looked through my logs yarn logs stdout and stderr and have not found anything that explain this. Now I am facing issue where it may time out in only a few iterations of running the spark streaming I am trying to pinpoint the cause do you have any suggestion how I may go about it. It is very unpredictable may run well for 5 hours or might just crash on first run like 5 minutes.

Highlighted

Re: Spark streaming kafka job runs successfully then fails after about 24 hours

Super Collaborator

I guess you still have the same log entries? 'driver terminated or disconnected'? Is there any hint in the log before that message? What happens if you try a reconnect directly after the disconnect?

How is your client connected to the cluster? I.e. via LAN or via Internet?

Highlighted

Re: Spark streaming kafka job runs successfully then fails after about 24 hours

Thanks for your help I just added a more detailed error message sorry for the delay if you have seen anything like that before. Thanks

Don't have an account?
Coming from Hortonworks? Activate your account here