Created on 09-10-2018 12:08 PM - edited 08-18-2019 01:13 AM
Hello,
I have new to HDP. I have set up a very simple oozie job to run spark on the sandbox. Job started but never continue to execute the next step and it just stuck there with a "running" status. I am able to run spark-submit using the terminal so I know the spark script works. Any idea why the job is stuck ? or what other steps I can take to troubleshoot this ?
Here is the log:
2018-09-10 02:24:30,295 INFO ActionStartXCommand:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@:start:] Start action [0000014-180908024054802-oozie-oozi-W@:start:] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10] 2018-09-10 02:24:30,296 INFO ActionStartXCommand:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@:start:] [***0000014-180908024054802-oozie-oozi-W@:start:***]Action status=DONE 2018-09-10 02:24:30,296 INFO ActionStartXCommand:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@:start:] [***0000014-180908024054802-oozie-oozi-W@:start:***]Action updated in DB! 2018-09-10 02:24:30,328 INFO WorkflowNotificationXCommand:520 - SERVER[sandbox-hdp.hortonworks.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@:start:] No Notification URL is defined. Therefore nothing to notify for job 0000014-180908024054802-oozie-oozi-W@:start: 2018-09-10 02:24:30,328 INFO WorkflowNotificationXCommand:520 - SERVER[sandbox-hdp.hortonworks.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[] No Notification URL is defined. Therefore nothing to notify for job 0000014-180908024054802-oozie-oozi-W 2018-09-10 02:24:30,343 INFO ActionStartXCommand:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] Start action [0000014-180908024054802-oozie-oozi-W@spark_1] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10] 2018-09-10 02:24:30,370 INFO SparkActionExecutor:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] Added into spark action configuration mapred.child.env=SPARK_HOME=.,HDP_VERSION=2.6.5.0-292 2018-09-10 02:24:31,425 INFO SparkActionExecutor:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] Trying to get job [job_1536377145689_0006], attempt [1] 2018-09-10 02:24:31,450 INFO SparkActionExecutor:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] checking action, hadoop job ID [job_1536377145689_0006] status [RUNNING] 2018-09-10 02:24:31,453 INFO ActionStartXCommand:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] [***0000014-180908024054802-oozie-oozi-W@spark_1***]Action status=RUNNING 2018-09-10 02:24:31,453 INFO ActionStartXCommand:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] [***0000014-180908024054802-oozie-oozi-W@spark_1***]Action updated in DB! 2018-09-10 02:24:31,456 INFO WorkflowNotificationXCommand:520 - SERVER[sandbox-hdp.hortonworks.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] No Notification URL is defined. Therefore nothing to notify for job 0000014-180908024054802-oozie-oozi-W@spark_1 2018-09-10 02:35:19,084 INFO SparkActionExecutor:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] Trying to get job [job_1536377145689_0006], attempt [1] 2018-09-10 02:35:26,289 INFO SparkActionExecutor:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] checking action, hadoop job ID [job_1536377145689_0006] status [RUNNING] 2018-09-10 02:46:19,088 INFO SparkActionExecutor:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] Trying to get job [job_1536377145689_0006], attempt [1] 2018-09-10 02:46:19,127 INFO SparkActionExecutor:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] checking action, hadoop job ID [job_1536377145689_0006] status [RUNNING]
Created 09-10-2018 02:26 PM
Hello @A C.
What do you see in the YARN UI? Is there any application_id running for your oozie workflow/Spark Job?
Thanks.
Created on 09-10-2018 04:10 PM - edited 08-18-2019 01:13 AM
I guess this is the YARN UI ( http://[**My HDP Sandbox**]:8188/applicationhistory ) ?? If it's not correct please let me know. I am not sure what the message means or how to resolve.
Created 09-10-2018 04:14 PM
Hi @A C.
You're right, this is the YARN WEB UI 🙂
Hm, so from what I can see, it looks like yarn didn't launch your spark application.
Do you mind to share with us your oozie workflow xml?
Thanks.
Created 09-10-2018 04:23 PM
<workflow-app name="spark test" xmlns="uri:oozie:workflow:0.5"> <start to="spark_1"/> <action name="spark_1"> <spark xmlns="uri:oozie:spark-action:0.2"> <job-tracker>${resourceManager}</job-tracker> <name-node>${nameNode}</name-node> <master>yarn-cluster</master> <name>pySpark</name> <jar>/tmp/pySparkTest.py</jar> </spark> <ok to="end"/> <error to="kill"/> </action> <kill name="kill"> <message>${wf:errorMessage(wf:lastErrorNode())}</message> </kill> <end name="end"/> </workflow-app>
Created 09-10-2018 04:54 PM
Hi @A C.
At first, glance, I can't see anything misconfig.
Take a look at this article, to see if helps you on smtg:
https://community.hortonworks.com/articles/84071/apache-ambari-workflow-manager-view-for-apache-ooz-...
Hope this helps
Created 09-15-2018 05:04 PM
Still doesn't work. 😞
Created 09-16-2018 05:01 AM
Quick question, does it work running outside of oozie? E.g. using directly the spark-submit.
Created 09-16-2018 06:42 AM
I agree with you!you are right about this problem,First,have to try use spark-sumit to run this App,Then can use oozie go to schedule.
Created 09-16-2018 06:54 PM
it ran outside of oozie using spark-submit successfully. just not in oozie.