Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark 2 - attemptFailuresValidityInterval issue

SOLVED Go to solution
Highlighted

Spark 2 - attemptFailuresValidityInterval issue

New Contributor

Hi!

 

We are running a spark-submit with options:

--deploy-mode cluster

--conf "spark.yarn.maxAppAttempts=3"
--conf "spark.yarn.am.attemptFailuresValidityInterval=30s"

--conf...

 

and our application throws intentionally an exception after 70 seconds on the driver, in order to cause a manual failure.

 

We expected our application, with these parameters, to run forever, because the attemptFailuresValidityInterval should reset the maxAppAttempts counter sooner than the custom exception. But after 3 failures the application stops.

 

Our installation:

- SPARK2-2.1.0.cloudera2
- CDH 5.11

 

Any ideas are more than welcome!

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Spark 2 - attemptFailuresValidityInterval issue

Expert Contributor

Sorry, this is a bug described in SPARK-22876 which suggests that the current logic of spark.yarn.am.attemptFailuresValidityInterval is flawed.

While the jira is still being worked on, looking at the comments, I don't foresee a fix anytime soon. 

1 REPLY 1

Re: Spark 2 - attemptFailuresValidityInterval issue

Expert Contributor

Sorry, this is a bug described in SPARK-22876 which suggests that the current logic of spark.yarn.am.attemptFailuresValidityInterval is flawed.

While the jira is still being worked on, looking at the comments, I don't foresee a fix anytime soon.