Created 06-12-2015 02:59 PM
Hello,
I am trying to use my self-built spark jar with the oozie scheduler in CM--> HUE--> Workflow Editor--> Oozie Editor.
I ran the oozie spark example and it works fine.
For the jar file, I am using SBT build for may scala work: scala version 2.11.6, spark 1.3.0. It can be compile and run in my local MAC machine. Also
I can run it using spark-submit in cloudera server I deployed.
Howerver when I try to run my own WordCount in HUE--> oozie UI, it has an error:
2015-06-12 17:39:09,699 WARN org.apache.oozie.action.hadoop.SparkActionExecutor: SERVER[ec2-52-24-84-183.us-west-2.compute.amazonaws.com] USER[hue] GROUP[-] TOKEN[] APP[My_Workflow] JOB[0000000-150612022451385-oozie-oozi-W] ACTION[0000000-150612022451385-oozie-oozi-W@spark-d4b6] Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.SparkMain], exit code [101] 2015-06-12 17:39:09,756 INFO org.apache.oozie.command.wf.ActionEndXCommand: SERVER[ec2-52-24-84-183.us-west-2.compute.amazonaws.com] USER[hue] GROUP[-] TOKEN[] APP[My_Workflow] JOB[0000000-150612022451385-oozie-oozi-W] ACTION[0000000-150612022451385-oozie-oozi-W@spark-d4b6] ERROR is considered as FAILED for SLA
Here is the config:
<workflow-app name="My_Workflow" xmlns="uri:oozie:workflow:0.5">
<start to="spark-d4b6"/>
<kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="spark-d4b6">
<spark xmlns="uri:oozie:spark-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<master>local[*]</master>
<mode>client</mode>
<name>WorldCount</name>
<class>com.analytics.spark.scala.WordCount</class>
<jar>/user/hue/FuhuSparkStatistics-assembly-1.0.jar</jar>
<arg>/user/hue/test.txt</arg>
<arg></arg>
</spark>
<ok to="End"/>
<error to="Kill"/>
</action>
<end name="End"/>
</workflow-app>
name := "SparkStatistics"
version := "1.0"
scalaVersion := "2.11.6"
libraryDependencies ++= Seq(
"org.apache.oozie" % "oozie-client" % "4.1.0",
"org.apache.spark" %% "spark-streaming" % "1.3.0" % "provided",
"org.apache.spark" %% "spark-core" % "1.3.0" % "provided",
"net.liftweb" % "lift-json_2.11" % "3.0-M5-1",
"org.scalaz" %% "scalaz-core" % "7.1.1",
"com.github.nscala-time" %% "nscala-time" % "1.8.0",
"com.typesafe" % "config" % "1.3.0"
)
Any pointers and suggestions would be great help :)
Created 06-24-2015 02:41 PM
Created on 12-16-2015 03:32 PM - edited 12-16-2015 03:33 PM
I have the same issue.
1209101029984-oozie-oozi-W] ACTION[0000010-151209101029984-oozie-oozi-W@spark-4345] Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.SparkMain], exit code [101]
2015-12-16 22:54:50,736 INFO org.apache.oozie.command.wf.ActionEndXCommand: SERVER[ip-172-30-0-133] USER[admin] GROUP[-] TOKEN[] APP[My_Workflow] JOB[0000010-151209101029984-oozie-oozi-W] ACTION[0000010-151209101029984-oozie-oozi-W@spark-4345] ERROR is considered as FAILED for SLA
The Job is failing before the launch of spark job. The are running well through submit spark script.
Oozie Worflow
<workflow-app name="Message Parquet Job" xmlns="uri:oozie:workflow:0.3">
<start to="JStreamMerger" />
<action name="JStreamMerger">
<spark xmlns="uri:oozie:spark-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${jobOutput}/2015-12-16/output" />
</prepare>
<master>${sparkMaster}</master>
<mode>${sparkMode}</mode>
<name>${sparkJobName}</name>
<class>${sparkMainClass}</class>
<jar>${sparkJars}</jar>
<spark-opts>${sparkOpts}</spark-opts>
<arg>${jobInput}</arg>
<arg>${timestamp()}</arg>
<arg>${mergeInterval}</arg>
<arg>${jobOutput}</arg>
<arg>${nameNode}</arg>
</spark>
<ok to="end" />
<error to="fail" />
</action>
<kill name="fail">
<message>
Java failed, error message[${wf:errorMessage(wf:lastErrorNode())}]
</message>
</kill>
<end name='end' />
</workflow-app>
nameNode=hdfs://ip-172-30-0-133:8020
jobTracker=http://ip-172-30-0-133:50030/
jobOutput=/clickstream/message-merge-output
sparkMaster=yarn-client
sparkMode=cluster
sparkJobName=Message Parquet File Merger
sparkMainClass=com.spotdy.jmessage.mergers.JMessageParquetMerger
sparkJars=original-spotdy-spark-offlinemerger-0.0.1.jar
sparkOpts=--driver-java-options "-Dlog4j.configuration=file:/root/spotdy-sparkmessageprocessor/src/main/resources/log4j.properties -Ddm.logging.level=INFO" --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:/etc/spark/conf/log4j.properties -Ddm.logging.name=myapp -Ddm.logging.level=INFO" --conf "spark.ui.port=4050"
jobInput=/clickstream/events
mergeInterval=60
oozie.use.system.libpath=true
oozie.wf.application.path=hdfs://ip-172-30-0-133:8020/offline-jobs/message-parquet-merger
Can we please get some help here.
Created 12-17-2015 01:04 PM
Did you try placing the spark jars used for your spark submit in lib folder of the workflow?
Created 08-03-2016 10:57 PM
Created 08-04-2016 02:42 PM