Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

Spark 2 - attemptFailuresValidityInterval issue

SOLVED Go to solution

Spark 2 - attemptFailuresValidityInterval issue

New Contributor

Hi!

 

We are running a spark-submit with options:

--deploy-mode cluster

--conf "spark.yarn.maxAppAttempts=3"
--conf "spark.yarn.am.attemptFailuresValidityInterval=30s"

--conf...

 

and our application throws intentionally an exception after 70 seconds on the driver, in order to cause a manual failure.

 

We expected our application, with these parameters, to run forever, because the attemptFailuresValidityInterval should reset the maxAppAttempts counter sooner than the custom exception. But after 3 failures the application stops.

 

Our installation:

- SPARK2-2.1.0.cloudera2
- CDH 5.11

 

Any ideas are more than welcome!

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Spark 2 - attemptFailuresValidityInterval issue

Expert Contributor

Sorry, this is a bug described in SPARK-22876 which suggests that the current logic of spark.yarn.am.attemptFailuresValidityInterval is flawed.

While the jira is still being worked on, looking at the comments, I don't foresee a fix anytime soon. 

1 REPLY 1

Re: Spark 2 - attemptFailuresValidityInterval issue

Expert Contributor

Sorry, this is a bug described in SPARK-22876 which suggests that the current logic of spark.yarn.am.attemptFailuresValidityInterval is flawed.

While the jira is still being worked on, looking at the comments, I don't foresee a fix anytime soon.