Created 07-18-2024 11:16 PM
I'm running oozie HA 5.2.1 on EMR and I have an issue with this temporary directory. I have a workflow which has start node -> action node -> end node. The job start running -> runs for 10-15 minutes -> is initially marked as successful -> is marked as failed. The error is JA008 File or directory not found for the /user/oozie/oozie-oozi/"oozie_job_id"/"ActionName"--java directory. I'm running in parallel a script which looks at this directory as well as the state of the workflow and while running this directory contains action.xml and launcher.xml file -> before marking it as SUCCEDED it adds the action-data.seq file -> after it marks it as succeeded it deletes this directory -> after 1-2 minutes marks it as failed with the error above. It looks like oozie is checking again for that directory after it previously deleted it.
Although this jobs is marked as failed from oozie perspective in reality this job is successful (checking in the RM UI the jobs status and the logs).
This error occurs multiple times a day. Sometimes the workflow is succeeded other times is marked as failed from oozie perspective but in reality is succeeded.
Any idea why this happens (looks like a race condition) ?
Created 07-22-2024 01:41 PM
Hello @StefanSs
This does not seem like a Cloudera CDP distro / Oozie Supported version
We suggest to reach the Oozie Mailing list and provide simple repro steps for the community to review or contact Amazon Support
Hope this helps
-JMP
Created 07-19-2024 01:40 PM
@StefanSs Welcome to the Cloudera Community!
To help you get the best possible solution, I have tagged our Oozie experts @JoseManuel @jphelps who may be able to assist you further.
Please keep us updated on your post, and we hope you find a satisfactory solution to your query.
Regards,
Diana Torres,Created 07-22-2024 01:41 PM
Hello @StefanSs
This does not seem like a Cloudera CDP distro / Oozie Supported version
We suggest to reach the Oozie Mailing list and provide simple repro steps for the community to review or contact Amazon Support
Hope this helps
-JMP