Hi!
We are running a spark-submit with options:
--deploy-mode cluster
--conf "spark.yarn.maxAppAttempts=3"
--conf "spark.yarn.am.attemptFailuresValidityInterval=30s"
--conf...
and our application throws intentionally an exception after 70 seconds on the driver, in order to cause a manual failure.
We expected our application, with these parameters, to run forever, because the attemptFailuresValidityInterval should reset the maxAppAttempts counter sooner than the custom exception. But after 3 failures the application stops.
Our installation:
- SPARK2-2.1.0.cloudera2
- CDH 5.11
Any ideas are more than welcome!