Support Questions

Find answers, ask questions, and share your expertise

oozie workflow is stuck at running without error ?

avatar
Explorer

Hello,

I have new to HDP. I have set up a very simple oozie job to run spark on the sandbox. Job started but never continue to execute the next step and it just stuck there with a "running" status. I am able to run spark-submit using the terminal so I know the spark script works. Any idea why the job is stuck ? or what other steps I can take to troubleshoot this ?

Here is the log:

88542-oozie.jpg

2018-09-10 02:24:30,295  INFO ActionStartXCommand:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@:start:] Start action [0000014-180908024054802-oozie-oozi-W@:start:] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2018-09-10 02:24:30,296  INFO ActionStartXCommand:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@:start:] [***0000014-180908024054802-oozie-oozi-W@:start:***]Action status=DONE
2018-09-10 02:24:30,296  INFO ActionStartXCommand:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@:start:] [***0000014-180908024054802-oozie-oozi-W@:start:***]Action updated in DB!
2018-09-10 02:24:30,328  INFO WorkflowNotificationXCommand:520 - SERVER[sandbox-hdp.hortonworks.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@:start:] No Notification URL is defined. Therefore nothing to notify for job 0000014-180908024054802-oozie-oozi-W@:start:
2018-09-10 02:24:30,328  INFO WorkflowNotificationXCommand:520 - SERVER[sandbox-hdp.hortonworks.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[] No Notification URL is defined. Therefore nothing to notify for job 0000014-180908024054802-oozie-oozi-W
2018-09-10 02:24:30,343  INFO ActionStartXCommand:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] Start action [0000014-180908024054802-oozie-oozi-W@spark_1] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2018-09-10 02:24:30,370  INFO SparkActionExecutor:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] Added into spark action configuration mapred.child.env=SPARK_HOME=.,HDP_VERSION=2.6.5.0-292
2018-09-10 02:24:31,425  INFO SparkActionExecutor:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] Trying to get job [job_1536377145689_0006], attempt [1]
2018-09-10 02:24:31,450  INFO SparkActionExecutor:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] checking action, hadoop job ID [job_1536377145689_0006] status [RUNNING]
2018-09-10 02:24:31,453  INFO ActionStartXCommand:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] [***0000014-180908024054802-oozie-oozi-W@spark_1***]Action status=RUNNING
2018-09-10 02:24:31,453  INFO ActionStartXCommand:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] [***0000014-180908024054802-oozie-oozi-W@spark_1***]Action updated in DB!
2018-09-10 02:24:31,456  INFO WorkflowNotificationXCommand:520 - SERVER[sandbox-hdp.hortonworks.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] No Notification URL is defined. Therefore nothing to notify for job 0000014-180908024054802-oozie-oozi-W@spark_1
2018-09-10 02:35:19,084  INFO SparkActionExecutor:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] Trying to get job [job_1536377145689_0006], attempt [1]
2018-09-10 02:35:26,289  INFO SparkActionExecutor:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] checking action, hadoop job ID [job_1536377145689_0006] status [RUNNING]
2018-09-10 02:46:19,088  INFO SparkActionExecutor:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] Trying to get job [job_1536377145689_0006], attempt [1]
2018-09-10 02:46:19,127  INFO SparkActionExecutor:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] checking action, hadoop job ID [job_1536377145689_0006] status [RUNNING]
11 REPLIES 11

avatar

Hello @A C.
What do you see in the YARN UI? Is there any application_id running for your oozie workflow/Spark Job?
Thanks.

avatar
Explorer

Hi @Vinicius Higa Murakami

I guess this is the YARN UI ( http://[**My HDP Sandbox**]:8188/applicationhistory ) ?? If it's not correct please let me know. I am not sure what the message means or how to resolve.

90382-oozie1.jpg

avatar

Hi @A C.
You're right, this is the YARN WEB UI 🙂
Hm, so from what I can see, it looks like yarn didn't launch your spark application.
Do you mind to share with us your oozie workflow xml?

Thanks.

avatar
Explorer

Hi @Vinicius Higa Murakami.

<workflow-app name="spark test"
	xmlns="uri:oozie:workflow:0.5">
	<start to="spark_1"/>
	<action name="spark_1">
		<spark
			xmlns="uri:oozie:spark-action:0.2">
			<job-tracker>${resourceManager}</job-tracker>
			<name-node>${nameNode}</name-node>
			<master>yarn-cluster</master>
			<name>pySpark</name>
			<jar>/tmp/pySparkTest.py</jar>
		</spark>
		<ok to="end"/>
		<error to="kill"/>
	</action>
	<kill name="kill">
		<message>${wf:errorMessage(wf:lastErrorNode())}</message>
	</kill>
	<end name="end"/>
</workflow-app>

avatar

Hi @A C.
At first, glance, I can't see anything misconfig.
Take a look at this article, to see if helps you on smtg:
https://community.hortonworks.com/articles/84071/apache-ambari-workflow-manager-view-for-apache-ooz-...

Hope this helps

avatar
Explorer

Still doesn't work. 😞

avatar

Quick question, does it work running outside of oozie? E.g. using directly the spark-submit.

avatar
Contributor

I agree with you!you are right about this problem,First,have to try use spark-sumit to run this App,Then can use oozie go to schedule.

avatar
Explorer

it ran outside of oozie using spark-submit successfully. just not in oozie.