Member since
07-18-2024
2
Posts
2
Kudos Received
0
Solutions
07-18-2024
11:16 PM
1 Kudo
I'm running oozie HA 5.2.1 on EMR and I have an issue with this temporary directory. I have a workflow which has start node -> action node -> end node. The job start running -> runs for 10-15 minutes -> is initially marked as successful -> is marked as failed. The error is JA008 File or directory not found for the /user/oozie/oozie-oozi/"oozie_job_id"/"ActionName"--java directory. I'm running in parallel a script which looks at this directory as well as the state of the workflow and while running this directory contains action.xml and launcher.xml file -> before marking it as SUCCEDED it adds the action-data.seq file -> after it marks it as succeeded it deletes this directory -> after 1-2 minutes marks it as failed with the error above. It looks like oozie is checking again for that directory after it previously deleted it. Although this jobs is marked as failed from oozie perspective in reality this job is successful (checking in the RM UI the jobs status and the logs). This error occurs multiple times a day. Sometimes the workflow is succeeded other times is marked as failed from oozie perspective but in reality is succeeded. Any idea why this happens (looks like a race condition) ?
... View more
Labels:
- Labels:
-
Apache Oozie
07-18-2024
05:09 AM
1 Kudo
I'm running oozie HA 5.2.1 on EMR and I have an issue with this temporary directory. I have a workflow which has start node -> action node -> end node. The job start running -> runs for 10-15 minutes -> is initially marked as successful -> is marked as failed. The error is JA008 File or directory not found for the /user/oozie/oozie-oozi/"oozie_job_id"/"ActionName"--java directory. I'm running in parallel a script which looks at this directory as well as the state of the workflow and while running this directory contains action.xml and launcher.xml file -> before marking it as SUCCEDED it adds the action-data.seq file -> after it marks it as succeeded it deletes this directory -> after 1-2 minutes marks it as failed with the error above. It looks like oozie is checking again for that directory after it previously deleted it. Although this jobs is marked as failed from oozie perspective in reality this job is successful (checking in the RM UI the jobs status and the logs). This error occurs multiple times a day. Sometimes the workflow is succeeded other times is marked as failed from oozie perspective but in reality is succeeded. Any idea why this happens (looks like a race condition) ?
... View more