Support Questions

Find answers, ask questions, and share your expertise

configuring failure and reties

avatar
Rising Star

i am trying to get the following flow

 

submit a grid app (with about 300 jobs )

sometimes jobs fails and we would like to retry them

but after 20 retries (or total 20 failures) we would like to mark the entire application as failed and stop all running\pending jobs

 

i have set yarn.resourcemanager.am.max-attempts=2

 

how i can limit the total error before marking the entire job submit as failure?

i am based spark\yarn

1 REPLY 1

avatar
Rising Star

anyone?