Created on 09-10-2018 12:08 PM - edited 08-18-2019 01:13 AM
Hello,
I have new to HDP. I have set up a very simple oozie job to run spark on the sandbox. Job started but never continue to execute the next step and it just stuck there with a "running" status. I am able to run spark-submit using the terminal so I know the spark script works. Any idea why the job is stuck ? or what other steps I can take to troubleshoot this ?
Here is the log:
2018-09-10 02:24:30,295 INFO ActionStartXCommand:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@:start:] Start action [0000014-180908024054802-oozie-oozi-W@:start:] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10] 2018-09-10 02:24:30,296 INFO ActionStartXCommand:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@:start:] [***0000014-180908024054802-oozie-oozi-W@:start:***]Action status=DONE 2018-09-10 02:24:30,296 INFO ActionStartXCommand:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@:start:] [***0000014-180908024054802-oozie-oozi-W@:start:***]Action updated in DB! 2018-09-10 02:24:30,328 INFO WorkflowNotificationXCommand:520 - SERVER[sandbox-hdp.hortonworks.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@:start:] No Notification URL is defined. Therefore nothing to notify for job 0000014-180908024054802-oozie-oozi-W@:start: 2018-09-10 02:24:30,328 INFO WorkflowNotificationXCommand:520 - SERVER[sandbox-hdp.hortonworks.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[] No Notification URL is defined. Therefore nothing to notify for job 0000014-180908024054802-oozie-oozi-W 2018-09-10 02:24:30,343 INFO ActionStartXCommand:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] Start action [0000014-180908024054802-oozie-oozi-W@spark_1] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10] 2018-09-10 02:24:30,370 INFO SparkActionExecutor:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] Added into spark action configuration mapred.child.env=SPARK_HOME=.,HDP_VERSION=2.6.5.0-292 2018-09-10 02:24:31,425 INFO SparkActionExecutor:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] Trying to get job [job_1536377145689_0006], attempt [1] 2018-09-10 02:24:31,450 INFO SparkActionExecutor:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] checking action, hadoop job ID [job_1536377145689_0006] status [RUNNING] 2018-09-10 02:24:31,453 INFO ActionStartXCommand:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] [***0000014-180908024054802-oozie-oozi-W@spark_1***]Action status=RUNNING 2018-09-10 02:24:31,453 INFO ActionStartXCommand:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] [***0000014-180908024054802-oozie-oozi-W@spark_1***]Action updated in DB! 2018-09-10 02:24:31,456 INFO WorkflowNotificationXCommand:520 - SERVER[sandbox-hdp.hortonworks.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] No Notification URL is defined. Therefore nothing to notify for job 0000014-180908024054802-oozie-oozi-W@spark_1 2018-09-10 02:35:19,084 INFO SparkActionExecutor:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] Trying to get job [job_1536377145689_0006], attempt [1] 2018-09-10 02:35:26,289 INFO SparkActionExecutor:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] checking action, hadoop job ID [job_1536377145689_0006] status [RUNNING] 2018-09-10 02:46:19,088 INFO SparkActionExecutor:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] Trying to get job [job_1536377145689_0006], attempt [1] 2018-09-10 02:46:19,127 INFO SparkActionExecutor:520 - SERVER[sandbox-hdp.hortonworks.com] USER[admin] GROUP[-] TOKEN[] APP[spark test] JOB[0000014-180908024054802-oozie-oozi-W] ACTION[0000014-180908024054802-oozie-oozi-W@spark_1] checking action, hadoop job ID [job_1536377145689_0006] status [RUNNING]
Created 09-17-2018 01:43 AM
oozie can successed schedule the other mission , i.e. mapreduce?
Created 09-17-2018 05:46 AM
Just to understand, did you run the spark submit using yarn cluster as master/deploy mode?
If so, let's try to check the job properties for the following parameter:
${resourceManager}
Also, here it is another example regarding pyspark + oozie (using shell to submit spark).
https://github.com/hgrif/oozie-pyspark-workflow
Hope this helps