Created on 03-13-2024 05:08 AM - edited 03-13-2024 05:11 AM
I am trying to kill an existing coordinator job and re-deploy it on oozie. I am running the command -
oozie job -run --config siftcuration.properties.
Here is my oozie properties file -
nameNode=hdfs://ip-<ip>:8020
jobTracker=ip-<ip>:8032
master=yarn
mode=client
queueName=default
#oozie.libpath=${nameNode}/user/oozie/share/lib
oozie.use.system.libpath=true
oozie.wf.rerun.failnodes=true
runtimeEnvironment=aws_prod
appName=SiftCuration
appBaseDir=${nameNode}/user/${user.name}/oozieJobs/${appName}
oozie.coord.application.path=${appBaseDir}/workflow/coordinator.xml
etl.OverwriteOrAppend=Overwrite
spark.driver.memory=4g
spark.executor.memory=4g
spark.num.executors=3
spark.executor.cores=1
etl.default.date.format=yyyy-MM-dd
etl.run.date=yesterday
coordinatorKickDate=2024-03-14
When I run this command, it gives me a coordinator job ID. But when I check the coordinator status, I see that is is still in PREP state. It has been in PREP state for over 4 hours.
0000002-240312195311040-oozie-oozi-C SiftCuration PREP 1 DAY 2024-03-14 03:00 GMT
I tried checking the logs and I am seeing this issue -
2024-03-13 11:59:34,030 WARN KillXCommand:523 - SERVER[ip-<ip>.ec2.internal] USER[hadoop] GROUP[-] TOKEN[] APP[SiftCuration] JOB[0043858-220322202819429-oozie-oozi-W] ACTION[] E0725: Workflow instance can not be killed, 0043858-220322202819429-oozie-oozi-W, Error Code: E0725
This workflow ID belongs to the previous coordinator job that was already killed.
On checking the yarn logs for this workflow, I see this
Actions
------------------------------------------------------------------------------------------------------------------------------------
ID Status Ext ID Ext Status Err Code
------------------------------------------------------------------------------------------------------------------------------------
0043858-220322202819429-oozie-oozi-W@:start: OK - OK -
------------------------------------------------------------------------------------------------------------------------------------
0043858-220322202819429-oozie-oozi-W@SnowflakeJobStartStoredProcedure OK application_1647980654756_172599SUCCEEDED -
------------------------------------------------------------------------------------------------------------------------------------
0043858-220322202819429-oozie-oozi-W@ParameterGenerator OK application_1647980654756_172603SUCCEEDED -
------------------------------------------------------------------------------------------------------------------------------------
0043858-220322202819429-oozie-oozi-W@SiftCurationSparkETL KILLED application_1647980654756_172607RUNNING JA009
------------------------------------------------------------------------------------------------------------------------------------
It says application_1647980654756_172607 is running but I cannot find this application ID when I do a yarn logs -applicationId application_1647980654756_172607.
Could not locate application logs for application_1647980654756_172607
How do I fix this issue?
Created 03-13-2024 05:55 AM
@MrBeasr, Welcome to our community! To help you get the best possible answer, I have tagged in our Oozie experts @pvishnu @mszurap @ShankerSharma who may be able to assist you further.
Please feel free to provide any additional information or details about your query, and we hope that you will find a satisfactory solution to your question.
Regards,
Vidya Sargur,Created 03-20-2024 08:10 AM
Hi @MrBeasr
Please check the status of job via Oozie Database:
# select count(*) from WF_JOBS where id like '%0043858-220322202819429-oozie-oozi-W%';
If require you may set the status of the job in Database itself:
# update wf_jobs set status='FAILED' where id like '%0043858-220322202819429-oozie-oozi-W%';