I am trying to find an option that is part of Cloudera Manager that will send an alert when a YARN job has failed. It seems like apps_failed_cumulative_rate stream does not show the metrics I would expect based on actual number of failed jobs.
I never found a good way to do this through the Cloudera Manager UI.
What I ended up doing was creating a script that polls the Cloudera Manager API for info on the state of YARN jobs and alerts based on unwanted states. I just sent an email out with info on the failed jobs.
(url for api clients: http://cloudera.github.io/cm_api/)
is there any coniguration in expression we can do to set so that we can get the email alert as job fails.
can you give that expression details