Reply
New Contributor
Posts: 1
Registered: ‎04-03-2018
Accepted Solution

Spark 2 - attemptFailuresValidityInterval issue

[ Edited ]

Hi!

 

We are running a spark-submit with options:

--deploy-mode cluster

--conf "spark.yarn.maxAppAttempts=3"
--conf "spark.yarn.am.attemptFailuresValidityInterval=30s"

--conf...

 

and our application throws intentionally an exception after 70 seconds on the driver, in order to cause a manual failure.

 

We expected our application, with these parameters, to run forever, because the attemptFailuresValidityInterval should reset the maxAppAttempts counter sooner than the custom exception. But after 3 failures the application stops.

 

Our installation:

- SPARK2-2.1.0.cloudera2
- CDH 5.11

 

Any ideas are more than welcome!

Cloudera Employee
Posts: 40
Registered: ‎11-16-2015

Re: Spark 2 - attemptFailuresValidityInterval issue

Sorry, this is a bug described in SPARK-22876 which suggests that the current logic of spark.yarn.am.attemptFailuresValidityInterval is flawed.

While the jira is still being worked on, looking at the comments, I don't foresee a fix anytime soon. 

Announcements