Support Questions

sat_046 · ‎02-01-2023

We are running a spark streaming job which reads data from kafka and writes to RDBMS. We dont want this job to fail easily due to minor fluctuations in cluster health/issue. Write now spark job has configuration to retry 5 attempts before the whole job fails. But all these retries are happening in quick succession , one after another. Is there a way we can put some delay/sleep time between retry attempts for this job?

RangaReddy · ‎02-08-2023

Hi @sat_046

I don't think we have a specific configuration parameter to handle the task retry attempts with some delay. But we have a parameters to blacklist the node if the task is failed with some no of attempts in the node.

References:

1. https://community.cloudera.com/t5/Community-Articles/Configuring-spark-task-maxFailures-amp-spark-bl...

2. https://www.waitingforcode.com/apache-spark/failed-tasks-resubmit/read

VidyaSargur · ‎02-12-2023

@sat_046 Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.

Regards,

Vidya Sargur,
Community Manager

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
Community Guidelines
How to use the forum

RangaReddy · ‎03-30-2023

Hi @sat_046

As i mentioned earlier comment, unfortunately it is not possible to delay the tasks. You can find the Spark code when tasks failed.

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSetMa...

Please accept the solution if you liked my answer.

Cloudera Community

Support Questions

How to introduce delay time between retry attempts for spark streaming job

Delay with Spark application

Implementing a real-time Hive Streaming example

Spark Structured Streaming example with CDE

Performance Delays in Namenode Caused by Multiple ...

Spark job failure with "java.lang.LinkageError: Cl...

Cloudera Data Engineering Spark Job with Python Wh...

Killed spark streaming jobs are not present in spa...

Working with CDE Spark Job Parameters in Cloudera ...

Time-Delayed HBase Performance Degradation with Ja...

Spark Streaming Explained: Kafka to Phoenix