Reply
New Contributor
Posts: 1
Registered: ‎06-12-2015

CDH-5.4.2 Run Oozie with self-built spark jar.

Hello,

 

I am trying to use my self-built spark jar with the oozie scheduler in CM--> HUE--> Workflow Editor--> Oozie Editor.

I ran the oozie spark example and it works fine. 

 

For the jar file, I am using SBT build for may scala work: scala version 2.11.6, spark 1.3.0. It can be compile and run in my local MAC machine. Also 

I can run it using spark-submit in cloudera server I deployed. 

 

Howerver when I try to run my own WordCount in HUE--> oozie UI, it has an error:

 

2015-06-12 17:39:09,699 WARN org.apache.oozie.action.hadoop.SparkActionExecutor: SERVER[ec2-52-24-84-183.us-west-2.compute.amazonaws.com] USER[hue] GROUP[-] TOKEN[] APP[My_Workflow] JOB[0000000-150612022451385-oozie-oozi-W] ACTION[0000000-150612022451385-oozie-oozi-W@spark-d4b6] Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.SparkMain], exit code [101]
2015-06-12 17:39:09,756 INFO org.apache.oozie.command.wf.ActionEndXCommand: SERVER[ec2-52-24-84-183.us-west-2.compute.amazonaws.com] USER[hue] GROUP[-] TOKEN[] APP[My_Workflow] JOB[0000000-150612022451385-oozie-oozi-W] ACTION[0000000-150612022451385-oozie-oozi-W@spark-d4b6] ERROR is considered as FAILED for SLA

 

Here is the config:

 

<workflow-app name="My_Workflow" xmlns="uri:oozie:workflow:0.5">
<start to="spark-d4b6"/>
<kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="spark-d4b6">
<spark xmlns="uri:oozie:spark-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<master>local[*]</master>
<mode>client</mode>
<name>WorldCount</name>
<class>com.analytics.spark.scala.WordCount</class>
<jar>/user/hue/FuhuSparkStatistics-assembly-1.0.jar</jar>
<arg>/user/hue/test.txt</arg>
<arg></arg>
</spark>
<ok to="End"/>
<error to="Kill"/>
</action>
<end name="End"/>
</workflow-app>

 

 

 

name := "SparkStatistics"

 

version := "1.0"

 

scalaVersion := "2.11.6"

 

libraryDependencies ++= Seq(

"org.apache.oozie" % "oozie-client" % "4.1.0",

"org.apache.spark" %% "spark-streaming" % "1.3.0" % "provided",

"org.apache.spark" %% "spark-core" % "1.3.0" % "provided",

"net.liftweb" % "lift-json_2.11" % "3.0-M5-1",

"org.scalaz" %% "scalaz-core" % "7.1.1",

"com.github.nscala-time" %% "nscala-time" % "1.8.0",

"com.typesafe" % "config" % "1.3.0"

)

 

Any pointers and suggestions would be great help :)

 

Posts: 1,903
Kudos: 435
Solutions: 307
Registered: ‎07-31-2013

Re: CDH-5.4.2 Run Oozie with self-built spark jar.

Could you post your full failed job's map task logs?
New Contributor
Posts: 1
Registered: ‎12-16-2015

Re: CDH-5.4.2 Run Oozie with self-built spark jar.

[ Edited ]

I have the same issue. 

 

1209101029984-oozie-oozi-W] ACTION[0000010-151209101029984-oozie-oozi-W@spark-4345] Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.SparkMain], exit code [101]
2015-12-16 22:54:50,736 INFO org.apache.oozie.command.wf.ActionEndXCommand: SERVER[ip-172-30-0-133] USER[admin] GROUP[-] TOKEN[] APP[My_Workflow] JOB[0000010-151209101029984-oozie-oozi-W] ACTION[0000010-151209101029984-oozie-oozi-W@spark-4345] ERROR is considered as FAILED for SLA

 

The Job is failing before the launch of spark job. The are running well through submit spark script. 

Oozie Worflow 

 

<workflow-app name="Message Parquet Job" xmlns="uri:oozie:workflow:0.3">

<start to="JStreamMerger" />

<action name="JStreamMerger">

<spark xmlns="uri:oozie:spark-action:0.1">

<job-tracker>${jobTracker}</job-tracker>

<name-node>${nameNode}</name-node>

<prepare>

<delete path="${jobOutput}/2015-12-16/output" />

</prepare>

<master>${sparkMaster}</master>

<mode>${sparkMode}</mode>

<name>${sparkJobName}</name>

<class>${sparkMainClass}</class>

<jar>${sparkJars}</jar>

<spark-opts>${sparkOpts}</spark-opts>

<arg>${jobInput}</arg>

<arg>${timestamp()}</arg>

<arg>${mergeInterval}</arg>

<arg>${jobOutput}</arg>

<arg>${nameNode}</arg>

</spark>

<ok to="end" />

<error to="fail" />

</action>

<kill name="fail">

<message>

Java failed, error message[${wf:errorMessage(wf:lastErrorNode())}]

</message>

</kill>

<end name='end' />

</workflow-app>

 

 

nameNode=hdfs://ip-172-30-0-133:8020

jobTracker=http://ip-172-30-0-133:50030/

jobOutput=/clickstream/message-merge-output

sparkMaster=yarn-client

sparkMode=cluster

sparkJobName=Message Parquet File Merger

sparkMainClass=com.spotdy.jmessage.mergers.JMessageParquetMerger

sparkJars=original-spotdy-spark-offlinemerger-0.0.1.jar

sparkOpts=--driver-java-options "-Dlog4j.configuration=file:/root/spotdy-sparkmessageprocessor/src/main/resources/log4j.properties -Ddm.logging.level=INFO" --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:/etc/spark/conf/log4j.properties -Ddm.logging.name=myapp -Ddm.logging.level=INFO" --conf "spark.ui.port=4050"

jobInput=/clickstream/events

mergeInterval=60

oozie.use.system.libpath=true

oozie.wf.application.path=hdfs://ip-172-30-0-133:8020/offline-jobs/message-parquet-merger

Can we please get some help here.

Explorer
Posts: 21
Registered: ‎09-09-2015

Re: CDH-5.4.2 Run Oozie with self-built spark jar.

Did you try placing the spark jars used for your spark submit in lib folder of the workflow?

New Contributor
Posts: 3
Registered: ‎07-11-2016

Re: CDH-5.4.2 Run Oozie with self-built spark jar.

Did you resolve this problem? Unfortunately I have the same issue. Another question, if you know: how I set the library dependecies in Oozie graphic tool? I have a cluster with Cluodera 5.7.
Highlighted
New Contributor
Posts: 3
Registered: ‎07-11-2016

Re: CDH-5.4.2 Run Oozie with self-built spark jar.

If anyone knows the solution, or something else, I opened a thread on stackoverflow: http://stackoverflow.com/q/38746995/4866657

Please, help me!!