Support Questions

Find answers, ask questions, and share your expertise

Spark 2 - attemptFailuresValidityInterval issue

avatar
New Contributor

Hi!

 

We are running a spark-submit with options:

--deploy-mode cluster

--conf "spark.yarn.maxAppAttempts=3"
--conf "spark.yarn.am.attemptFailuresValidityInterval=30s"

--conf...

 

and our application throws intentionally an exception after 70 seconds on the driver, in order to cause a manual failure.

 

We expected our application, with these parameters, to run forever, because the attemptFailuresValidityInterval should reset the maxAppAttempts counter sooner than the custom exception. But after 3 failures the application stops.

 

Our installation:

- SPARK2-2.1.0.cloudera2
- CDH 5.11

 

Any ideas are more than welcome!

1 ACCEPTED SOLUTION

avatar
Master Collaborator

Sorry, this is a bug described in SPARK-22876 which suggests that the current logic of spark.yarn.am.attemptFailuresValidityInterval is flawed.

While the jira is still being worked on, looking at the comments, I don't foresee a fix anytime soon. 

View solution in original post

1 REPLY 1

avatar
Master Collaborator

Sorry, this is a bug described in SPARK-22876 which suggests that the current logic of spark.yarn.am.attemptFailuresValidityInterval is flawed.

While the jira is still being worked on, looking at the comments, I don't foresee a fix anytime soon.