oozie workflow is stuck at running without error ?



I have new to HDP. I have set up a very simple oozie job to run spark on the sandbox. Job started but never continue to execute the next step and it just stuck there with a "running" status. I am able to run spark-submit using the terminal so I know the spark script works. Any idea why the job is stuck ? or what other steps I can take to troubleshoot this ?

Here is the log:


2018-09-10 02:24:30,295  INFO ActionStartXCommand:520 - SERVER[] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@:start:] Start action [0000014-180908024054802-oozie-oozi-W@:start:] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2018-09-10 02:24:30,296  INFO ActionStartXCommand:520 - SERVER[] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@:start:] [***0000014-180908024054802-oozie-oozi-W@:start:***]Action status=DONE
2018-09-10 02:24:30,296  INFO ActionStartXCommand:520 - SERVER[] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@:start:] [***0000014-180908024054802-oozie-oozi-W@:start:***]Action updated in DB!
2018-09-10 02:24:30,328  INFO WorkflowNotificationXCommand:520 - SERVER[] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@:start:] No Notification URL is defined. Therefore nothing to notify for job 0000014-180908024054802-oozie-oozi-W@:start:
2018-09-10 02:24:30,328  INFO WorkflowNotificationXCommand:520 - SERVER[] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[] No Notification URL is defined. Therefore nothing to notify for job 0000014-180908024054802-oozie-oozi-W
2018-09-10 02:24:30,343  INFO ActionStartXCommand:520 - SERVER[] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] Start action [0000014-180908024054802-oozie-oozi-W@spark_1] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2018-09-10 02:24:30,370  INFO SparkActionExecutor:520 - SERVER[] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] Added into spark action configuration mapred.child.env=SPARK_HOME=.,HDP_VERSION=
2018-09-10 02:24:31,425  INFO SparkActionExecutor:520 - SERVER[] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] Trying to get job [job_1536377145689_0006], attempt [1]
2018-09-10 02:24:31,450  INFO SparkActionExecutor:520 - SERVER[] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] checking action, hadoop job ID [job_1536377145689_0006] status [RUNNING]
2018-09-10 02:24:31,453  INFO ActionStartXCommand:520 - SERVER[] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] [***0000014-180908024054802-oozie-oozi-W@spark_1***]Action status=RUNNING
2018-09-10 02:24:31,453  INFO ActionStartXCommand:520 - SERVER[] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] [***0000014-180908024054802-oozie-oozi-W@spark_1***]Action updated in DB!
2018-09-10 02:24:31,456  INFO WorkflowNotificationXCommand:520 - SERVER[] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] No Notification URL is defined. Therefore nothing to notify for job 0000014-180908024054802-oozie-oozi-W@spark_1
2018-09-10 02:35:19,084  INFO SparkActionExecutor:520 - SERVER[] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] Trying to get job [job_1536377145689_0006], attempt [1]
2018-09-10 02:35:26,289  INFO SparkActionExecutor:520 - SERVER[] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] checking action, hadoop job ID [job_1536377145689_0006] status [RUNNING]
2018-09-10 02:46:19,088  INFO SparkActionExecutor:520 - SERVER[] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] Trying to get job [job_1536377145689_0006], attempt [1]
2018-09-10 02:46:19,127  INFO SparkActionExecutor:520 - SERVER[] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] checking action, hadoop job ID [job_1536377145689_0006] status [RUNNING]


Hello @A C.
What do you see in the YARN UI? Is there any application_id running for your oozie workflow/Spark Job?


Hi @Vinicius Higa Murakami

I guess this is the YARN UI ( http://[**My HDP Sandbox**]:8188/applicationhistory ) ?? If it's not correct please let me know. I am not sure what the message means or how to resolve.



Hi @A C.
You're right, this is the YARN WEB UI 🙂
Hm, so from what I can see, it looks like yarn didn't launch your spark application.
Do you mind to share with us your oozie workflow xml?



Hi @Vinicius Higa Murakami.

<workflow-app name="spark test"
	<start to="spark_1"/>
	<action name="spark_1">
		<ok to="end"/>
		<error to="kill"/>
	<kill name="kill">
	<end name="end"/>


Hi @A C.
At first, glance, I can't see anything misconfig.
Take a look at this article, to see if helps you on smtg:

Hope this helps


Still doesn't work. 😞


Quick question, does it work running outside of oozie? E.g. using directly the spark-submit.


I agree with you!you are right about this problem,First,have to try use spark-sumit to run this App,Then can use oozie go to schedule.


it ran outside of oozie using spark-submit successfully. just not in oozie.