question Re: Spark application fails on slaves when launching from Oozie on Yarn in Archives of Support Questions (Read Only)

Spark application fails on slaves when launching from Oozie on Yarn

shariyar_murtaz — Thu, 11 Aug 2016 02:14:42 GMT

Hi,

I am trying to launch a spark application which works perfectly well from shell but executors fail when launched from OOzie. On the slaves (Executors) side, I see the following:

Error: Could not find or load main
class org.apache.spark.executor.CoarseGrainedExecutorBackend

On the driver side I see the following, but it is not really any null pointer in my code. My code is working fine when I launch spark directly from shell. It has something to do with executors.

[Driver] ERROR ogminer.main.LogMinerMain - nulljava.lang.InterruptedExceptionat java.lang.Object.wait(Native Method) ~[?:1.8.0_66]at java.lang.Object.wait(Object.java:502) ~[?:1.8.0_66]
at org.apache.spark.scheduler.JobWaiter.awaitResult(JobWaiter.scala:73)~[spark-assembly-1.3.1.2.3.0.0-2557-hadoop2.7.1.2.3.0.0-2557.jar:?]
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:513)~[spark-assembly-1.3.1.2.3.0.0-2557-hadoop2.7.1.2.3.0.0-2557.jar:?]
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1466)~[spark-assembly-1.3.1.2.3.0.0-2557-hadoop2.7.1.2.3.0.0-2557.jar:?]
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1484)~[spark-assembly-1.3.1.2.3.0.0-2557-hadoop2.7.1.2.3.0.0-2557.jar:?]
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1498) ~[spark-assembly-1.3.1.2.3.0.0-2557-hadoop2.7.1.2.3.0.0-2557.jar:?]
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1512)~[spark-assembly-1.3.1.2.3.0.0-2557-hadoop2.7.1.2.3.0.0-2557.jar:?]
at org.apache.spark.rdd.RDD.collect(RDD.scala:813)~[spark-assembly-1.3.1.2.3.0.0-2557-hadoop2.7.1.2.3.0.0-2557.jar:?]
at org.apache.spark.api.java.JavaRDDLike$class.collect(JavaRDDLike.scala:320)~[spark-assembly-1.3.1.2.3.0.0-2557-hadoop2.7.1.2.3.0.0-2557.jar:?]
at org.apache.spark.api.java.AbstractJavaRDDLike.collect(JavaRDDLike.scala:46)~[spark-assembly-1.3.1.2.3.0.0-2557-hadoop2.7.1.2.3.0.0-2557.jar:?]
at logminer.main.LogSparkTester.test(LogSparkTester.java:214)~[__app__.jar:?]
at logminer.main.LogMinerMain.testTrainOnHdfs(LogMinerMain.java:232)~[__app__.jar:?]
at com.telus.argus.logminer.main.LogMinerMain.main(LogMinerMain.java:159)[__app__.jar:?]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_66]at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
~[?:1.8. 0_66]
atsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
~[?:1.8.0_66]at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_66]
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:484)
[spark-assembly-1.3.1.2.3.0.0-2557-hadoop2.7.1.2.3.0.0-2557.jar:?]

I am not sure how to solve this issue. I have put all the spark related jars in the lib folder for this oozie job. Here is my directory structure on hdfs for this OOzie job

oozie/ oozie/workflow.xml oozie/job.properties

oozie/lib/argus-logminer-1.0.jar

oozie/lib/core-site.xml

oozie/lib/hdfs-site.xml

oozie/lib/kms-site.xml

oozie/lib/mapred-site.xml

oozie/lib/oozie-sharelib-spark-4.2.0.2.3.0.0-2557.jar

oozie/lib/spark-1.3.1.2.3.0.0-2557-yarn-shuffle.jar

oozie/lib/spark-assembly-1.3.1.2.3.0.0-2557-hadoop2.7.1.2.3.0.0-2557.jar

oozie/lib/yarn-site.xml

Does any know how to solve this? Any idea which jar has this: CoarseGrainedExceutorBackend class?

Re: Spark application fails on slaves when launching from Oozie on Yarn

rreddy — Thu, 11 Aug 2016 02:49:50 GMT

What version of spark and hdp? Can you list out all jar under SPARK_HOME directory from worker machine in cluster?

Re: Spark application fails on slaves when launching from Oozie on Yarn

shariyar_murtaz — Thu, 11 Aug 2016 03:15:40 GMT

Spark: 1.3.1 HDP: 2.3.0.0-2557

I don't see any SPARK_HOME variable in my shell. But here is the list of jar from hdp/currrent/spark-client /usr/hdp/current/spark-client/lib

datanucleus-api-jdo-3.2.6.jar

datanucleus-rdbms-3.2.9.jar

spark-assembly-1.3.1.2.3.0.0-2557-hadoop2.7.1.2.3.0.0-2557.jar

datanucleus-core-3.2.10.jar

spark-1.3.1.2.3.0.0-2557-yarn-shuffle.jar

spark-examples-1.3.1.2.3.0.0-2557-hadoop2.7.1.2.3.0.0-2557.jar

Re: Spark application fails on slaves when launching from Oozie on Yarn

andrew_sears — Thu, 11 Aug 2016 08:39:42 GMT

CoarseGrainedExecutorBackend should be in spark-assembly.

Might be relevant to you...

https://issues.apache.org/jira/browse/OOZIE-2482

https://community.hortonworks.com/articles/49479/how-to-use-oozie-shell-action-to-run-a-spark-job-i.html

https://developer.ibm.com/hadoop/2015/11/05/run-spark-job-yarn-oozie/

Try setting SPARK_HOME variable in hadoop-env.sh

cheers,

Andrew

Re: Spark application fails on slaves when launching from Oozie on Yarn

mramasami — Thu, 11 Aug 2016 12:57:03 GMT

Can you tell me which mode you are using? yarn-cluster/yarn-client mode.

Also can you share the workflow.xml you are using?

Re: Spark application fails on slaves when launching from Oozie on Yarn

shariyar_murtaz — Thu, 11 Aug 2016 20:26:48 GMT

yarn-cluster

<workflow-app name="${wf_name}" xmlns="uri:oozie:workflow:0.4">
  <start to="spark"/>
  <action name="spark">
  <spark xmlns="uri:oozie:spark-action:0.1">
  <job-tracker>${job_tracker}</job-tracker>
  <name-node>${name_node}</name-node>
  <master>${master}</master>
  <mode>cluster</mode>
  <name>logminer</name>
  <class>logminer.main.LogMinerMain</class>
  <jar>${filesystem}/${baseLoc}/oozie/lib/argus-logminer-1.0.jar</jar>
  <spark-opts>--driver-memory 4G --executor-memory 4G --num-executors 3 --executor-cores 5</spark-opts>
  <arg>-logtype</arg> <arg>adraw</arg>
  <arg>-inputfile</arg> <arg>/user/inputfile-march-3.txt</arg>
  <arg>-configfile</arg> <arg>${filesystem}/${baseLoc}/oozie/logminer.properties</arg>
  <arg>-mode</arg> <arg>test</arg>
  </spark>
  <ok to="success_email"/>
  <error to="fail_email"/>
  </action>
  <action name="success_email">
  <email xmlns="uri:oozie:email-action:0.1">
  <to>${emailTo}</to>
  <cc>${emailCC}</cc>
  <subject>${wf_name}: Successful run at ${wf:id()}</subject>
  <body>The workflow [${wf:id()}] ran succesfully.</body>
  </email>
  <ok to="end"/>
  <error to="fail_email"/>
  </action>
  <action name="fail_email">
  <email xmlns="uri:oozie:email-action:0.1">
  <to>${emailTo}</to>
  <cc>${emailCC}</cc>
  <subject>${wf_name}: Failed at ${wf:id()}</subject>
  <body>The workflow [${wf:id()}] failed at [${wf:lastErrorNode()}] with the following message: ${wf:errorMessage(wf:lastErrorNode())}</body>
  </email>
  <ok to="fail"/>
  <error to="fail"/>
  </action>
  <kill name="fail">
  <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
  </kill>
  <end name="end"/>
</workflow-app>

Re: Spark application fails on slaves when launching from Oozie on Yarn

mramasami — Thu, 11 Aug 2016 20:54:08 GMT

Thanks @Shary M for providing the workflow. Looks like the arguments which we have passed might not be passed to the java application . The way we need to specify the argument in the application as args[0] .. args[n], where arg[0] is the argument passed first in the oozie workflow. In the above one,

arg[0] --> -logtype and
arg[1] --> adraw

You can refer the following examples.

Sample workflow: https://github.com/apache/oozie/blob/master/examples/src/main/apps/spark/workflow.xml
Sample java application : https://github.com/apache/oozie/blob/master/examples/src/main/java/org/apache/oozie/example/SparkFileCopy.java

Please let us know if you need more information. If it failing again, please share the snippet of your application also.

Re: Spark application fails on slaves when launching from Oozie on Yarn

shariyar_murtaz — Thu, 11 Aug 2016 21:05:22 GMT

No arguments are passed correctly. This is how my application is accepting it as I am using

org.apache.commons.cli.BasicParser. I verified it multiple times by printing them inside the application. There is nothing wrong there. Thanks for you help.

Re: Spark application fails on slaves when launching from Oozie on Yarn

shariyar_murtaz — Sat, 13 Aug 2016 01:16:37 GMT

Setting SPARK_HOME in hadoopn-env.sh solved the issue.

For others who have the same issue. Just add the following line in this file: /usr/hdp/your_version_number/hadoop/conf

export SPARK_HOME=/usr/hdp/current/spark-client