Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

oozie workflow is stuck at running without error ?

Explorer

Hello,

I have new to HDP. I have set up a very simple oozie job to run spark on the sandbox. Job started but never continue to execute the next step and it just stuck there with a "running" status. I am able to run spark-submit using the terminal so I know the spark script works. Any idea why the job is stuck ? or what other steps I can take to troubleshoot this ?

Here is the log:

88542-oozie.jpg

2018-09-10 02:24:30,295  INFO ActionStartXCommand:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@:start:] Start action [0000014-180908024054802-oozie-oozi-W@:start:] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2018-09-10 02:24:30,296  INFO ActionStartXCommand:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@:start:] [***0000014-180908024054802-oozie-oozi-W@:start:***]Action status=DONE
2018-09-10 02:24:30,296  INFO ActionStartXCommand:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@:start:] [***0000014-180908024054802-oozie-oozi-W@:start:***]Action updated in DB!
2018-09-10 02:24:30,328  INFO WorkflowNotificationXCommand:520 - SERVER[sandbox-hdp.hortonworks.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@:start:] No Notification URL is defined. Therefore nothing to notify for job 0000014-180908024054802-oozie-oozi-W@:start:
2018-09-10 02:24:30,328  INFO WorkflowNotificationXCommand:520 - SERVER[sandbox-hdp.hortonworks.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[] No Notification URL is defined. Therefore nothing to notify for job 0000014-180908024054802-oozie-oozi-W
2018-09-10 02:24:30,343  INFO ActionStartXCommand:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] Start action [0000014-180908024054802-oozie-oozi-W@spark_1] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2018-09-10 02:24:30,370  INFO SparkActionExecutor:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] Added into spark action configuration mapred.child.env=SPARK_HOME=.,HDP_VERSION=2.6.5.0-292
2018-09-10 02:24:31,425  INFO SparkActionExecutor:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] Trying to get job [job_1536377145689_0006], attempt [1]
2018-09-10 02:24:31,450  INFO SparkActionExecutor:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] checking action, hadoop job ID [job_1536377145689_0006] status [RUNNING]
2018-09-10 02:24:31,453  INFO ActionStartXCommand:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] [***0000014-180908024054802-oozie-oozi-W@spark_1***]Action status=RUNNING
2018-09-10 02:24:31,453  INFO ActionStartXCommand:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] [***0000014-180908024054802-oozie-oozi-W@spark_1***]Action updated in DB!
2018-09-10 02:24:31,456  INFO WorkflowNotificationXCommand:520 - SERVER[sandbox-hdp.hortonworks.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] No Notification URL is defined. Therefore nothing to notify for job 0000014-180908024054802-oozie-oozi-W@spark_1
2018-09-10 02:35:19,084  INFO SparkActionExecutor:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] Trying to get job [job_1536377145689_0006], attempt [1]
2018-09-10 02:35:26,289  INFO SparkActionExecutor:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] checking action, hadoop job ID [job_1536377145689_0006] status [RUNNING]
2018-09-10 02:46:19,088  INFO SparkActionExecutor:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] Trying to get job [job_1536377145689_0006], attempt [1]
2018-09-10 02:46:19,127  INFO SparkActionExecutor:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] checking action, hadoop job ID [job_1536377145689_0006] status [RUNNING]
11 REPLIES 11

Hello @A C.
What do you see in the YARN UI? Is there any application_id running for your oozie workflow/Spark Job?
Thanks.

Explorer

Hi @Vinicius Higa Murakami

I guess this is the YARN UI ( http://[**My HDP Sandbox**]:8188/applicationhistory ) ?? If it's not correct please let me know. I am not sure what the message means or how to resolve.

90382-oozie1.jpg

Hi @A C.
You're right, this is the YARN WEB UI 🙂
Hm, so from what I can see, it looks like yarn didn't launch your spark application.
Do you mind to share with us your oozie workflow xml?

Thanks.

Explorer

Hi @Vinicius Higa Murakami.

<workflow-app name="spark test"
	xmlns="uri:oozie:workflow:0.5">
	<start to="spark_1"/>
	<action name="spark_1">
		<spark
			xmlns="uri:oozie:spark-action:0.2">
			<job-tracker>${resourceManager}</job-tracker>
			<name-node>${nameNode}</name-node>
			<master>yarn-cluster</master>
			<name>pySpark</name>
			<jar>/tmp/pySparkTest.py</jar>
		</spark>
		<ok to="end"/>
		<error to="kill"/>
	</action>
	<kill name="kill">
		<message>${wf:errorMessage(wf:lastErrorNode())}</message>
	</kill>
	<end name="end"/>
</workflow-app>

Hi @A C.
At first, glance, I can't see anything misconfig.
Take a look at this article, to see if helps you on smtg:
https://community.hortonworks.com/articles/84071/apache-ambari-workflow-manager-view-for-apache-ooz-...

Hope this helps

Explorer

Still doesn't work. 😞

Quick question, does it work running outside of oozie? E.g. using directly the spark-submit.

Contributor

I agree with you!you are right about this problem,First,have to try use spark-sumit to run this App,Then can use oozie go to schedule.

Explorer

it ran outside of oozie using spark-submit successfully. just not in oozie.

Contributor

oozie can successed schedule the other mission , i.e. mapreduce?

@A C

Just to understand, did you run the spark submit using yarn cluster as master/deploy mode?
If so, let's try to check the job properties for the following parameter:
${resourceManager}

Also, here it is another example regarding pyspark + oozie (using shell to submit spark).
https://github.com/hgrif/oozie-pyspark-workflow

Hope this helps