Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Monitoring jobs for failure in the cluster (over HUE or other alternatives)

Highlighted

Monitoring jobs for failure in the cluster (over HUE or other alternatives)

Hi,

We are having challenges in getting effectively monitoring our workflows/jobs in our cluster. We run many cascading based workflows (which typically run for many hours) and these workflows spawn multiple jobs. When one of these workflows/jobs fail (due to data error or a code error) there are only 2 ways to catch this: 1.Constantly monitoring the Hue console for any failed workflow or jobs a.Challenges here – Somebody has to constantly keep looking at Hue and gets tedious and prone to misses for jobs that run for

a 5/6 hours or more 2.Depending on the Status email (success/failure) that is sent out at the end of the workflow. a.Challenges here – With a lot of success emails being sent out, people tend to miss that one off failure email. This is proving to be a challenge as we are finding that jobs are failed, sometimes, even a week later. Is there a way to have a custom dashboard view in HUE which shows only the failed jobs or workflows (say for the past 1 week)? Or are there any other ways to achieve effective monitoring (either through Hue or otherwise).

Thanks,

Raga

3 REPLIES 3
Highlighted

Re: Monitoring jobs for failure in the cluster (over HUE or other alternatives)

Contributor

Not sure if this is what you are looking for this, but in the job browser view you can click on 'Failed' to view the failed jobs. See the screenshot screen-shot-2016-06-16-at-33907-pm.png

Highlighted

Re: Monitoring jobs for failure in the cluster (over HUE or other alternatives)

Thanks for your reply. There are the following challenges using this:

1. The usability of this is pretty bad. Even if we click on "Failed", it shows the failed jobs for some time and then auto-refreshes to 'alll' jobs.

2. Somebody has to keep coming to this view and keep watching it.

3. This view also slows down as the number of jobs grows. If we have a 100 workflows running on a daily basis, and this view will have multiples of 100s of jobs - and hence will become very slow. Because of this we have been forced to set limit on the number of rows show in this view.

Is there a simple way of having a custom view on the workflow/jobs dashboard ?

Regards,

Raga

Highlighted

Re: Monitoring jobs for failure in the cluster (over HUE or other alternatives)

New Contributor

Hi Raga,

Any findings on this. Even i have the same problem monitoring the many application jobs. Any idea of creation f custom dashboard for failed jobs or workflows?

Regards

Shivakumar.M

Don't have an account?
Coming from Hortonworks? Activate your account here