Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Spark 2 - attemptFailuresValidityInterval issue

avatar
Frequent Visitor

Hi!

 

We are running a spark-submit with options:

--deploy-mode cluster

--conf "spark.yarn.maxAppAttempts=3"
--conf "spark.yarn.am.attemptFailuresValidityInterval=30s"

--conf...

 

and our application throws intentionally an exception after 70 seconds on the driver, in order to cause a manual failure.

 

We expected our application, with these parameters, to run forever, because the attemptFailuresValidityInterval should reset the maxAppAttempts counter sooner than the custom exception. But after 3 failures the application stops.

 

Our installation:

- SPARK2-2.1.0.cloudera2
- CDH 5.11

 

Any ideas are more than welcome!

1 ACCEPTED SOLUTION

avatar
Master Collaborator

Sorry, this is a bug described in SPARK-22876 which suggests that the current logic of spark.yarn.am.attemptFailuresValidityInterval is flawed.

While the jira is still being worked on, looking at the comments, I don't foresee a fix anytime soon. 

View solution in original post

1 REPLY 1

avatar
Master Collaborator

Sorry, this is a bug described in SPARK-22876 which suggests that the current logic of spark.yarn.am.attemptFailuresValidityInterval is flawed.

While the jira is still being worked on, looking at the comments, I don't foresee a fix anytime soon.