Support Questions
Find answers, ask questions, and share your expertise

oozie workflow is stuck at running without error ?

Explorer

Hello,

I have new to HDP. I have set up a very simple oozie job to run spark on the sandbox. Job started but never continue to execute the next step and it just stuck there with a "running" status. I am able to run spark-submit using the terminal so I know the spark script works. Any idea why the job is stuck ? or what other steps I can take to troubleshoot this ?

Here is the log:

88542-oozie.jpg

2018-09-10 02:24:30,295  INFO ActionStartXCommand:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@:start:] Start action [0000014-180908024054802-oozie-oozi-W@:start:] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2018-09-10 02:24:30,296  INFO ActionStartXCommand:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@:start:] [***0000014-180908024054802-oozie-oozi-W@:start:***]Action status=DONE
2018-09-10 02:24:30,296  INFO ActionStartXCommand:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@:start:] [***0000014-180908024054802-oozie-oozi-W@:start:***]Action updated in DB!
2018-09-10 02:24:30,328  INFO WorkflowNotificationXCommand:520 - SERVER[sandbox-hdp.hortonworks.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@:start:] No Notification URL is defined. Therefore nothing to notify for job 0000014-180908024054802-oozie-oozi-W@:start:
2018-09-10 02:24:30,328  INFO WorkflowNotificationXCommand:520 - SERVER[sandbox-hdp.hortonworks.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[] No Notification URL is defined. Therefore nothing to notify for job 0000014-180908024054802-oozie-oozi-W
2018-09-10 02:24:30,343  INFO ActionStartXCommand:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] Start action [0000014-180908024054802-oozie-oozi-W@spark_1] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2018-09-10 02:24:30,370  INFO SparkActionExecutor:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] Added into spark action configuration mapred.child.env=SPARK_HOME=.,HDP_VERSION=2.6.5.0-292
2018-09-10 02:24:31,425  INFO SparkActionExecutor:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] Trying to get job [job_1536377145689_0006], attempt [1]
2018-09-10 02:24:31,450  INFO SparkActionExecutor:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] checking action, hadoop job ID [job_1536377145689_0006] status [RUNNING]
2018-09-10 02:24:31,453  INFO ActionStartXCommand:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] [***0000014-180908024054802-oozie-oozi-W@spark_1***]Action status=RUNNING
2018-09-10 02:24:31,453  INFO ActionStartXCommand:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] [***0000014-180908024054802-oozie-oozi-W@spark_1***]Action updated in DB!
2018-09-10 02:24:31,456  INFO WorkflowNotificationXCommand:520 - SERVER[sandbox-hdp.hortonworks.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] No Notification URL is defined. Therefore nothing to notify for job 0000014-180908024054802-oozie-oozi-W@spark_1
2018-09-10 02:35:19,084  INFO SparkActionExecutor:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] Trying to get job [job_1536377145689_0006], attempt [1]
2018-09-10 02:35:26,289  INFO SparkActionExecutor:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] checking action, hadoop job ID [job_1536377145689_0006] status [RUNNING]
2018-09-10 02:46:19,088  INFO SparkActionExecutor:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] Trying to get job [job_1536377145689_0006], attempt [1]
2018-09-10 02:46:19,127  INFO SparkActionExecutor:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] checking action, hadoop job ID [job_1536377145689_0006] status [RUNNING]
11 REPLIES 11

Contributor

oozie can successed schedule the other mission , i.e. mapreduce?

@A C

Just to understand, did you run the spark submit using yarn cluster as master/deploy mode?
If so, let's try to check the job properties for the following parameter:
${resourceManager}

Also, here it is another example regarding pyspark + oozie (using shell to submit spark).
https://github.com/hgrif/oozie-pyspark-workflow

Hope this helps