Created on 11-23-2015 03:16 AM - edited 09-16-2022 02:50 AM
I have an Oozie coordinator that runs a workflow every hour. The workflow is composed of two sequential actions: a shell action and a Java action. When I run the coordinator, the shell action seems to execute successfully, however, when it's time for the Java action, the Job Browser in Hue always show:
There was a problem communicating with the server: Job application_<java-action-id> has expired.
When I click on the application_id, here's the snapshot:
This seems to point on views.py and api.py. When I looked into server logs:
[23/Nov/2015 02:25:22 -0800] middleware INFO Processing exception: Job application_1448245438537_0010 has expired.: Traceback (most recent call last): File "/usr/lib/hue/build/env/lib/python2.6/site-packages/Django-1.6.10-py2.6.egg/django/core/handlers/base.py", line 112, in get_response response = wrapped_callback(request, *callback_args, **callback_kwargs) File "/usr/lib/hue/build/env/lib/python2.6/site-packages/Django-1.6.10-py2.6.egg/django/db/transaction.py", line 371, in inner return func(*args, **kwargs) File "/usr/lib/hue/apps/jobbrowser/src/jobbrowser/views.py", line 67, in decorate raise PopupException(_('Job %s has expired.') % jobid, detail=_('Cannot be found on the History Server.')) PopupException: Job application_1448245438537_0010 has expired.
When I run the workflow as a standalone, I've got a 50-50 chance of success and expiration on the Java action part, but on coordinator, all Java action's are expiring.
I'm using Cloudera Quickstart CDH 5.4.0
Created 12-08-2015 06:09 PM
Thanks for your response. The problem is already solved. My Java action uses an instance (say variable fs) of org.apache.hadoop.fs.FileSystem class. At the end of the Java action, I use fs.close(), which will cause the problem on the next period of Oozie job. So when I removed this line, everything went well again.
Created 12-08-2015 04:33 AM
Hi,
Do you have any detailed error message in stdout.log/stderr.log under
/var/run/cloudera-scm-agent/process/xx-hue-HUE_SERVER/logs
and
/var/run/cloudera-scm-agent/process/yy-oozie-OOZIE_SERVER/logs
If you run the Java action through the workflow-only is the error message the same?
Created 12-08-2015 06:09 PM
Thanks for your response. The problem is already solved. My Java action uses an instance (say variable fs) of org.apache.hadoop.fs.FileSystem class. At the end of the Java action, I use fs.close(), which will cause the problem on the next period of Oozie job. So when I removed this line, everything went well again.