Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Oozie SparkAction failing

avatar
Expert Contributor

I'm currently exploring Oozie's SparkAction, but I'm running into errors.

The code is pretty straightforward; it's just a simple select from a Hive table then I count the records of the Dataframe. It's just some simple dummy code to use while I learn how to work with Oozie:

val tbl = sqlContext.sql("SELECT * FROM tbl")
val count = tbl.count   
log.info(s"The table has ${count} records.")

It works as expected when using `spark-submit` but when trying to run it as an Oozie SparkAction, I get the following error in the logs:

Main class:
org.apache.spark.deploy.yarn.Client
Arguments:
--name
Testing Spark Action
--jar
hdfs://myhost.com:8020/user/bigdata/workflows/sparkaction-test/lib/sparkaction-test_2.10-1.0.jar
--class
com.myCompany.SparkActionTest
System properties:
SPARK_SUBMIT -> true
spark.app.name -> Testing Spark Action
spark.submit.deployMode -> cluster
spark.master -> yarn-cluster
Classpath elements:



Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, Application application_1454025267777_0681 finished with failed status
org.apache.spark.SparkException: Application application_1454025267777_0681 finished with failed status
	at org.apache.spark.deploy.yarn.Client.run(Client.scala:974)
	at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1020)
	at org.apache.spark.deploy.yarn.Client.main(Client.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:685)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
	at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:104)
	at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:95)
	at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:47)
	at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:38)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:241)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
log4j:WARN No appenders could be found for logger (org.apache.spark.util.ShutdownHookManager).
log4j:WARN Please initialize the log4j system properly.

The project directory is arranged as follows:

sparkaction-test
-workflow.xml
-hive-site.xml
-job.properties
-lib/
  -sparkaction-test_2.10-1.0.jar

The content of job.properties:

nameNode=hdfs://myhost.com:8020
jobTracker=myhost.com:8032
queueName=default
projectRoot=user/${user.name}/workflows/sparkaction-test

master=yarn-cluster
mode=cluster
class=com.myCompany.SparkActionTest
hiveSite=hive-site.xml
jars=${nameNode}/${projectRoot}/lib/sparkaction-test_2.10-1.0.jar


oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/${projectRoot}
spark.yarn.historyServer.address=http://myhost.com:18080/
spark.eventLog.dir=${nameNode}/user/spark/applicationHistory
spark.eventLog.enabled=true

workflow.xml:

<workflow-app name="spark-test-wf" xmlns="uri:oozie:workflow:0.4">
    <start to="spark-test"/>
    <action name="spark-test">
        <spark xmlns="uri:oozie:spark-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <configuration>
                <property>
                    <name>mapred.compress.map.output</name>
                    <value>true</value>
                </property>
            </configuration>
            <master>${master}</master>
            <mode>${mode}</mode>
            <name>Testing Spark Action</name>
            <class>${class}</class>
        <jar>${jars}</jar>
         </spark>
        <ok to="end"/>
        <error to="errorcleanup" />
    </action>

    <kill name="errorcleanup">
      <message>Spark Test WF failed. [${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <end name ="end"/>
</workflow-app>

These are the jars in the Oozie sharelib:

  • datanucleus-api-jdo-3.2.6.jar
  • datanucleus-core-3.2.10.jar
  • datanucleus-rdbms-3.2.9.jar
  • oozie-sharelib-spark-4.2.0.2.3.4.0-3485.jar
  • spark-1.5.2.2.3.4.0-3485-yarn-shuffle.jar
  • spark-assembly-1.5.2.2.3.4.0-3485-hadoop2.7.1.2.3.4.0-3485.jar
  • spark-examples-1.5.2.2.3.4.0-3485-hadoop2.7.1.2.3.4.0-3485.jar

Environment:

  • HDP 2.3.4
  • Spark 1.5.2
  • Oozie 4.2.0

What could be the problem?

1 ACCEPTED SOLUTION

avatar
Super Collaborator
11 REPLIES 11

avatar
Super Collaborator

you need to sort out the oozie SparkAction lib directory. These are the only jars (or equivalent depends on you HDP verison) I have in there (the existing jars in there are all rubbish):

datanucleus-api-jdo-3.2.6.jar
datanucleus-core-3.2.10.jar
datanucleus-rdbms-3.2.9.jar
oozie-sharelib-spark-4.2.0.2.3.4.0-3485.jar
spark-1.5.2.2.3.4.0-3485-yarn-shuffle.jar
spark-assembly-1.5.2.2.3.4.0-3485-hadoop2.7.1.2.3.4.0-3485.jar
spark-examples-1.5.2.2.3.4.0-3485-hadoop2.7.1.2.3.4.0-3485.jar

I am on HDP 2.3.4 and the location for my manually upgraded sandbox is hdfs -> /user/oozie/share/lib/lib_20160115164104/spark.

hope this help.

avatar
Super Collaborator

@Luis Antonio Torres:

was the job is "prep" state for 20 mins before being killed?

the 8032 port(job.properties -> jobtracker) needs to match Yarn settings at Ambari -> Yarn -> Configs -> Advanced > Advanced yarnsite -> yarn.resourcemanager.address (i.e. set it to 8032)

Also try yarn-client first - I havent got yarn-cluster to work (and I think yarn-client is better, but if there is any reason why yarn-cluster is better for running oozie job pls let me know).

avatar
Expert Contributor

@David Tam no, it was in "Running" state before getting killed. The yarn.resourcemanager.address setting in our yarn configs is set to port 8050, so I'm not really sure why there was an attempt to connect to 8032. I tried yarn-client mode, but I still get the same error

avatar
Contributor

I'm having the same issue. The jobTracker port in my workflow/job.properties is set to 8050 (to match the yarn setting) and I can see in the oozie UI (click on job > action > action configuration) that 8050 is being used:

...
<job-tracker>mydomain:8050</job-tracker>
...

But when I drill down into the hadoop job history logs I see the error:

Call From mydomain to 0.0.0.0:8032 failed on connection exception: java.net.ConnectException: Connection refused

Where is it pulling 8032 from? Why does it not use the port configured in the job.properties?

"The workflow only started to work once I changed both of these to 8032"

I'd rather not do this, is there a way to get it to respect the port in the job.properties?

avatar
Master Mentor

@Breandán Mac Parland please create new question

avatar
Super Collaborator
@Luis Antonio Torres

- regarding port 8032 absolutely! See : this thread

avatar
Expert Contributor

This fixed it

avatar
New Contributor

A quick workaround for the problem is like so:

In workflow.xml:

```

<spark-opts>--conf spark.hadoop.yarn.resourcemanager.address=your-rm:8050<spark-opts>

```

yarn client will then connect to the correct rm.

More doc on spark-opts:

https://oozie.apache.org/docs/4.2.0/DG_SparkActionExtension.html

You just need to append the string in the above <spark-opts> xml element to your already existing spark-opts if any.

The problem occurs when yarn client tries to connect to rm and get cluster metrics:

https://github.com/apache/spark/blob/f47dbf27fa034629fab12d0f3c89ab75edb03f86/yarn/src/main/scala/or...

but fails to get configuration for the rm address.

Actually once you fix this, you will realize that the rm's address is not the only configuration that the spark's yarn client is unable to pick up. So your misery won't end.

A proper workaround, which I haven't been able to reach yet, should probably tell oozie/spark to take yarn configuration from the hadoop configuration already existing in the cluster. If anyone can point out any spark option that can do that, please let me know.

@hortonworks: Please include a preview mode for answers so that we can check how the formatting looks