i am trying to get the following flow
submit a grid app (with about 300 jobs )
sometimes jobs fails and we would like to retry them
but after 20 retries (or total 20 failures) we would like to mark the entire application as failed and stop all running\pending jobs
i have set yarn.resourcemanager.am.max-attempts=2
how i can limit the total error before marking the entire job submit as failure?
i am based spark\yarn