Support Questions

Find answers, ask questions, and share your expertise

Oozie SparkAction failing

avatar
Expert Contributor

I'm currently exploring Oozie's SparkAction, but I'm running into errors.

The code is pretty straightforward; it's just a simple select from a Hive table then I count the records of the Dataframe. It's just some simple dummy code to use while I learn how to work with Oozie:

val tbl = sqlContext.sql("SELECT * FROM tbl")
val count = tbl.count   
log.info(s"The table has ${count} records.")

It works as expected when using `spark-submit` but when trying to run it as an Oozie SparkAction, I get the following error in the logs:

Main class:
org.apache.spark.deploy.yarn.Client
Arguments:
--name
Testing Spark Action
--jar
hdfs://myhost.com:8020/user/bigdata/workflows/sparkaction-test/lib/sparkaction-test_2.10-1.0.jar
--class
com.myCompany.SparkActionTest
System properties:
SPARK_SUBMIT -> true
spark.app.name -> Testing Spark Action
spark.submit.deployMode -> cluster
spark.master -> yarn-cluster
Classpath elements:



Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, Application application_1454025267777_0681 finished with failed status
org.apache.spark.SparkException: Application application_1454025267777_0681 finished with failed status
	at org.apache.spark.deploy.yarn.Client.run(Client.scala:974)
	at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1020)
	at org.apache.spark.deploy.yarn.Client.main(Client.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:685)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
	at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:104)
	at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:95)
	at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:47)
	at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:38)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:241)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
log4j:WARN No appenders could be found for logger (org.apache.spark.util.ShutdownHookManager).
log4j:WARN Please initialize the log4j system properly.

The project directory is arranged as follows:

sparkaction-test
-workflow.xml
-hive-site.xml
-job.properties
-lib/
  -sparkaction-test_2.10-1.0.jar

The content of job.properties:

nameNode=hdfs://myhost.com:8020
jobTracker=myhost.com:8032
queueName=default
projectRoot=user/${user.name}/workflows/sparkaction-test

master=yarn-cluster
mode=cluster
class=com.myCompany.SparkActionTest
hiveSite=hive-site.xml
jars=${nameNode}/${projectRoot}/lib/sparkaction-test_2.10-1.0.jar


oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/${projectRoot}
spark.yarn.historyServer.address=http://myhost.com:18080/
spark.eventLog.dir=${nameNode}/user/spark/applicationHistory
spark.eventLog.enabled=true

workflow.xml:

<workflow-app name="spark-test-wf" xmlns="uri:oozie:workflow:0.4">
    <start to="spark-test"/>
    <action name="spark-test">
        <spark xmlns="uri:oozie:spark-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <configuration>
                <property>
                    <name>mapred.compress.map.output</name>
                    <value>true</value>
                </property>
            </configuration>
            <master>${master}</master>
            <mode>${mode}</mode>
            <name>Testing Spark Action</name>
            <class>${class}</class>
        <jar>${jars}</jar>
         </spark>
        <ok to="end"/>
        <error to="errorcleanup" />
    </action>

    <kill name="errorcleanup">
      <message>Spark Test WF failed. [${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <end name ="end"/>
</workflow-app>

These are the jars in the Oozie sharelib:

  • datanucleus-api-jdo-3.2.6.jar
  • datanucleus-core-3.2.10.jar
  • datanucleus-rdbms-3.2.9.jar
  • oozie-sharelib-spark-4.2.0.2.3.4.0-3485.jar
  • spark-1.5.2.2.3.4.0-3485-yarn-shuffle.jar
  • spark-assembly-1.5.2.2.3.4.0-3485-hadoop2.7.1.2.3.4.0-3485.jar
  • spark-examples-1.5.2.2.3.4.0-3485-hadoop2.7.1.2.3.4.0-3485.jar

Environment:

  • HDP 2.3.4
  • Spark 1.5.2
  • Oozie 4.2.0

What could be the problem?

1 ACCEPTED SOLUTION

avatar
Super Collaborator
11 REPLIES 11

avatar
Contributor

@ Luis Antonio Torres

Can you please share below jar

  1. spark-assembly-1.5.2.2.3.4.0-3485-hadoop2.7.1.2.3.4.0-3485.jar I am trying to schedule spark 1.5.2 job on oozie 4.2.0 (HDP 2.3.x). Spark 1.5.2 has been installed externally i am not using default spark version provided by hortonworks.

http://stackoverflow.com/questions/38770545/needed-spark-assembly-1-5-2-hadoop2-7-jar-for-spark-oozi...

avatar
Explorer

hei @Luis Antonio Torres

work around is changing port from 8050 to 8032 ? please point it out

thx