Support Questions
Find answers, ask questions, and share your expertise

Oozie SparkAction failing

Rising Star

I'm currently exploring Oozie's SparkAction, but I'm running into errors.

The code is pretty straightforward; it's just a simple select from a Hive table then I count the records of the Dataframe. It's just some simple dummy code to use while I learn how to work with Oozie:

val tbl = sqlContext.sql("SELECT * FROM tbl")
val count = tbl.count   
log.info(s"The table has ${count} records.")

It works as expected when using `spark-submit` but when trying to run it as an Oozie SparkAction, I get the following error in the logs:

Main class:
org.apache.spark.deploy.yarn.Client
Arguments:
--name
Testing Spark Action
--jar
hdfs://myhost.com:8020/user/bigdata/workflows/sparkaction-test/lib/sparkaction-test_2.10-1.0.jar
--class
com.myCompany.SparkActionTest
System properties:
SPARK_SUBMIT -> true
spark.app.name -> Testing Spark Action
spark.submit.deployMode -> cluster
spark.master -> yarn-cluster
Classpath elements:



Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, Application application_1454025267777_0681 finished with failed status
org.apache.spark.SparkException: Application application_1454025267777_0681 finished with failed status
	at org.apache.spark.deploy.yarn.Client.run(Client.scala:974)
	at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1020)
	at org.apache.spark.deploy.yarn.Client.main(Client.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:685)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
	at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:104)
	at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:95)
	at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:47)
	at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:38)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:241)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
log4j:WARN No appenders could be found for logger (org.apache.spark.util.ShutdownHookManager).
log4j:WARN Please initialize the log4j system properly.

The project directory is arranged as follows:

sparkaction-test
-workflow.xml
-hive-site.xml
-job.properties
-lib/
  -sparkaction-test_2.10-1.0.jar

The content of job.properties:

nameNode=hdfs://myhost.com:8020
jobTracker=myhost.com:8032
queueName=default
projectRoot=user/${user.name}/workflows/sparkaction-test

master=yarn-cluster
mode=cluster
class=com.myCompany.SparkActionTest
hiveSite=hive-site.xml
jars=${nameNode}/${projectRoot}/lib/sparkaction-test_2.10-1.0.jar


oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/${projectRoot}
spark.yarn.historyServer.address=http://myhost.com:18080/
spark.eventLog.dir=${nameNode}/user/spark/applicationHistory
spark.eventLog.enabled=true

workflow.xml:

<workflow-app name="spark-test-wf" xmlns="uri:oozie:workflow:0.4">
    <start to="spark-test"/>
    <action name="spark-test">
        <spark xmlns="uri:oozie:spark-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <configuration>
                <property>
                    <name>mapred.compress.map.output</name>
                    <value>true</value>
                </property>
            </configuration>
            <master>${master}</master>
            <mode>${mode}</mode>
            <name>Testing Spark Action</name>
            <class>${class}</class>
        <jar>${jars}</jar>
         </spark>
        <ok to="end"/>
        <error to="errorcleanup" />
    </action>

    <kill name="errorcleanup">
      <message>Spark Test WF failed. [${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <end name ="end"/>
</workflow-app>

These are the jars in the Oozie sharelib:

  • datanucleus-api-jdo-3.2.6.jar
  • datanucleus-core-3.2.10.jar
  • datanucleus-rdbms-3.2.9.jar
  • oozie-sharelib-spark-4.2.0.2.3.4.0-3485.jar
  • spark-1.5.2.2.3.4.0-3485-yarn-shuffle.jar
  • spark-assembly-1.5.2.2.3.4.0-3485-hadoop2.7.1.2.3.4.0-3485.jar
  • spark-examples-1.5.2.2.3.4.0-3485-hadoop2.7.1.2.3.4.0-3485.jar

Environment:

  • HDP 2.3.4
  • Spark 1.5.2
  • Oozie 4.2.0

What could be the problem?

11 REPLIES 11

@ Luis Antonio Torres

Can you please share below jar

  1. spark-assembly-1.5.2.2.3.4.0-3485-hadoop2.7.1.2.3.4.0-3485.jar I am trying to schedule spark 1.5.2 job on oozie 4.2.0 (HDP 2.3.x). Spark 1.5.2 has been installed externally i am not using default spark version provided by hortonworks.

http://stackoverflow.com/questions/38770545/needed-spark-assembly-1-5-2-hadoop2-7-jar-for-spark-oozi...

Explorer

hei @Luis Antonio Torres

work around is changing port from 8050 to 8032 ? please point it out

thx