04-03-2018 07:30 AM - last edited on 04-03-2018 12:59 PM by cjervis
We are running a spark-submit with options:
and our application throws intentionally an exception after 70 seconds on the driver, in order to cause a manual failure.
We expected our application, with these parameters, to run forever, because the attemptFailuresValidityInterval should reset the maxAppAttempts counter sooner than the custom exception. But after 3 failures the application stops.
- CDH 5.11
Any ideas are more than welcome!
04-10-2018 11:08 PM
Sorry, this is a bug described in SPARK-22876 which suggests that the current logic of spark.yarn.am.attemptFailuresValidityInterval is flawed.
While the jira is still being worked on, looking at the comments, I don't foresee a fix anytime soon.