- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Spark application fails on slaves when launching from Oozie on Yarn
- Labels:
-
Apache Oozie
-
Apache Spark
-
Apache YARN
Created 08-10-2016 07:14 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am trying to launch a spark application which works perfectly well from shell but executors fail when launched from OOzie. On the slaves (Executors) side, I see the following:
Error: Could not find or load main class org.apache.spark.executor.CoarseGrainedExecutorBackend
On the driver side I see the following, but it is not really any null pointer in my code. My code is working fine when I launch spark directly from shell. It has something to do with executors.
[Driver] ERROR ogminer.main.LogMinerMain - nulljava.lang.InterruptedExceptionat java.lang.Object.wait(Native Method) ~[?:1.8.0_66]at java.lang.Object.wait(Object.java:502) ~[?:1.8.0_66] at org.apache.spark.scheduler.JobWaiter.awaitResult(JobWaiter.scala:73)~[spark-assembly-1.3.1.2.3.0.0-2557-hadoop2.7.1.2.3.0.0-2557.jar:?] at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:513)~[spark-assembly-1.3.1.2.3.0.0-2557-hadoop2.7.1.2.3.0.0-2557.jar:?] at org.apache.spark.SparkContext.runJob(SparkContext.scala:1466)~[spark-assembly-1.3.1.2.3.0.0-2557-hadoop2.7.1.2.3.0.0-2557.jar:?] at org.apache.spark.SparkContext.runJob(SparkContext.scala:1484)~[spark-assembly-1.3.1.2.3.0.0-2557-hadoop2.7.1.2.3.0.0-2557.jar:?] at org.apache.spark.SparkContext.runJob(SparkContext.scala:1498) ~[spark-assembly-1.3.1.2.3.0.0-2557-hadoop2.7.1.2.3.0.0-2557.jar:?] at org.apache.spark.SparkContext.runJob(SparkContext.scala:1512)~[spark-assembly-1.3.1.2.3.0.0-2557-hadoop2.7.1.2.3.0.0-2557.jar:?] at org.apache.spark.rdd.RDD.collect(RDD.scala:813)~[spark-assembly-1.3.1.2.3.0.0-2557-hadoop2.7.1.2.3.0.0-2557.jar:?] at org.apache.spark.api.java.JavaRDDLike$class.collect(JavaRDDLike.scala:320)~[spark-assembly-1.3.1.2.3.0.0-2557-hadoop2.7.1.2.3.0.0-2557.jar:?] at org.apache.spark.api.java.AbstractJavaRDDLike.collect(JavaRDDLike.scala:46)~[spark-assembly-1.3.1.2.3.0.0-2557-hadoop2.7.1.2.3.0.0-2557.jar:?] at logminer.main.LogSparkTester.test(LogSparkTester.java:214)~[__app__.jar:?] at logminer.main.LogMinerMain.testTrainOnHdfs(LogMinerMain.java:232)~[__app__.jar:?] at com.telus.argus.logminer.main.LogMinerMain.main(LogMinerMain.java:159)[__app__.jar:?] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_66]at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8. 0_66] atsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_66]at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_66] at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:484) [spark-assembly-1.3.1.2.3.0.0-2557-hadoop2.7.1.2.3.0.0-2557.jar:?]
I am not sure how to solve this issue. I have put all the spark related jars in the lib folder for this oozie job. Here is my directory structure on hdfs for this OOzie job
oozie/ oozie/workflow.xml oozie/job.properties
oozie/lib/argus-logminer-1.0.jar
oozie/lib/core-site.xml
oozie/lib/hdfs-site.xml
oozie/lib/kms-site.xml
oozie/lib/mapred-site.xml
oozie/lib/oozie-sharelib-spark-4.2.0.2.3.0.0-2557.jar
oozie/lib/spark-1.3.1.2.3.0.0-2557-yarn-shuffle.jar
oozie/lib/spark-assembly-1.3.1.2.3.0.0-2557-hadoop2.7.1.2.3.0.0-2557.jar
oozie/lib/yarn-site.xml
Does any know how to solve this? Any idea which jar has this: CoarseGrainedExceutorBackend class?
Created 08-11-2016 01:39 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
CoarseGrainedExecutorBackend should be in spark-assembly.
Might be relevant to you...https://issues.apache.org/jira/browse/OOZIE-2482
https://developer.ibm.com/hadoop/2015/11/05/run-spark-job-yarn-oozie/
Try setting SPARK_HOME variable in hadoop-env.shcheers,
Andrew
Created 08-10-2016 07:49 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What version of spark and hdp? Can you list out all jar under SPARK_HOME directory from worker machine in cluster?
Created 08-10-2016 08:15 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Spark: 1.3.1 HDP: 2.3.0.0-2557
I don't see any SPARK_HOME variable in my shell. But here is the list of jar from hdp/currrent/spark-client /usr/hdp/current/spark-client/lib
datanucleus-api-jdo-3.2.6.jar
datanucleus-rdbms-3.2.9.jar
spark-assembly-1.3.1.2.3.0.0-2557-hadoop2.7.1.2.3.0.0-2557.jar
datanucleus-core-3.2.10.jar
spark-1.3.1.2.3.0.0-2557-yarn-shuffle.jar
spark-examples-1.3.1.2.3.0.0-2557-hadoop2.7.1.2.3.0.0-2557.jar
Created 08-11-2016 01:39 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
CoarseGrainedExecutorBackend should be in spark-assembly.
Might be relevant to you...https://issues.apache.org/jira/browse/OOZIE-2482
https://developer.ibm.com/hadoop/2015/11/05/run-spark-job-yarn-oozie/
Try setting SPARK_HOME variable in hadoop-env.shcheers,
Andrew
Created 08-12-2016 06:16 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Setting SPARK_HOME in hadoopn-env.sh solved the issue.
For others who have the same issue. Just add the following line in this file: /usr/hdp/your_version_number/hadoop/conf
export SPARK_HOME=/usr/hdp/current/spark-client
Created 08-11-2016 05:57 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you tell me which mode you are using? yarn-cluster/yarn-client mode.
Also can you share the workflow.xml you are using?
Created 08-11-2016 01:26 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
yarn-cluster
<workflow-app name="${wf_name}" xmlns="uri:oozie:workflow:0.4"> <start to="spark"/> <action name="spark"> <spark xmlns="uri:oozie:spark-action:0.1"> <job-tracker>${job_tracker}</job-tracker> <name-node>${name_node}</name-node> <master>${master}</master> <mode>cluster</mode> <name>logminer</name> <class>logminer.main.LogMinerMain</class> <jar>${filesystem}/${baseLoc}/oozie/lib/argus-logminer-1.0.jar</jar> <spark-opts>--driver-memory 4G --executor-memory 4G --num-executors 3 --executor-cores 5</spark-opts> <arg>-logtype</arg> <arg>adraw</arg> <arg>-inputfile</arg> <arg>/user/inputfile-march-3.txt</arg> <arg>-configfile</arg> <arg>${filesystem}/${baseLoc}/oozie/logminer.properties</arg> <arg>-mode</arg> <arg>test</arg> </spark> <ok to="success_email"/> <error to="fail_email"/> </action> <action name="success_email"> <email xmlns="uri:oozie:email-action:0.1"> <to>${emailTo}</to> <cc>${emailCC}</cc> <subject>${wf_name}: Successful run at ${wf:id()}</subject> <body>The workflow [${wf:id()}] ran succesfully.</body> </email> <ok to="end"/> <error to="fail_email"/> </action> <action name="fail_email"> <email xmlns="uri:oozie:email-action:0.1"> <to>${emailTo}</to> <cc>${emailCC}</cc> <subject>${wf_name}: Failed at ${wf:id()}</subject> <body>The workflow [${wf:id()}] failed at [${wf:lastErrorNode()}] with the following message: ${wf:errorMessage(wf:lastErrorNode())}</body> </email> <ok to="fail"/> <error to="fail"/> </action> <kill name="fail"> <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name="end"/> </workflow-app>
Created 08-11-2016 01:54 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks @Shary M for providing the workflow. Looks like the arguments which we have passed might not be passed to the java application . The way we need to specify the argument in the application as args[0] .. args[n], where arg[0] is the argument passed first in the oozie workflow. In the above one,
- arg[0] --> -logtype and
- arg[1] --> adraw
You can refer the following examples.
- Sample workflow: https://github.com/apache/oozie/blob/master/examples/src/main/apps/spark/workflow.xml
- Sample java application : https://github.com/apache/oozie/blob/master/examples/src/main/java/org/apache/oozie/example/SparkFil...
Please let us know if you need more information. If it failing again, please share the snippet of your application also.
Created 08-11-2016 02:05 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
No arguments are passed correctly. This is how my application is accepting it as I am using
org.apache.commons.cli.BasicParser. I verified it multiple times by printing them inside the application. There is nothing wrong there. Thanks for you help.