Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

What dependencies to submit Spark jobs programmatically (not via spark-submit)?

avatar
 SparkConf conf = new SparkConf();
conf.setAppName("Test");
conf.setMaster("yarn-cluster"); // or with "yarn-client"
JavaSparkContext context = new JavaSparkContext(conf);

I am trying to run the above code against CDH 5.2 and CDH 5.3. I receive the errors below (note I am trying both "yarn-client" and "yarn-cluster"). I noticed in the Cloudera maven repo there doesn't seem to be a stable (non SNAPSHOT) spark-yarn [1] or yarn-parent [2] for CDH 5.2 and 5.3. Is this supported? Any tips?

 

[1] https://repository.cloudera.com/artifactory/cloudera-repos/org/apache/spark/spark-yarn_2.10/

[2] https://repository.cloudera.com/artifactory/cloudera-repos/org/apache/spark/yarn-parent_2.10/

 

// with yarn-client

java.lang.NoClassDefFoundError: org/apache/hadoop/yarn/conf/YarnConfiguration
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:171)
	at org.apache.spark.deploy.SparkHadoopUtil$.liftedTree1$1(SparkHadoopUtil.scala:102)
	at org.apache.spark.deploy.SparkHadoopUtil$.<init>(SparkHadoopUtil.scala:101)
	at org.apache.spark.deploy.SparkHadoopUtil$.<clinit>(SparkHadoopUtil.scala)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:228)
	at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:53)
	at com.bretthoerner.TestSpark.testSpark(TestSpark.java:111)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
	at org.junit.runners.Suite.runChild(Suite.java:127)
	at org.junit.runners.Suite.runChild(Suite.java:26)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
	at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74)
	at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:211)
	at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:67)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.yarn.conf.YarnConfiguration
	at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
	... 44 more

 

// with yarn-cluster

org.apache.spark.SparkException: YARN mode not available ?
	at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:1561)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:310)
	at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:53)
	at com.bretthoerner.TestSpark.testSpark(TestSpark.java:116)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
	at org.junit.runners.Suite.runChild(Suite.java:127)
	at org.junit.runners.Suite.runChild(Suite.java:26)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
	at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74)
	at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:211)
	at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:67)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.scheduler.cluster.YarnClusterScheduler
	at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:171)
	at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:1554)
	... 39 more

 

 

 

1 ACCEPTED SOLUTION

avatar
Master Collaborator

Nah it's not crazy, just means you have to do some of the work that spark-submit does. Almost all of that is dealing with the classpath. If you're just trying to get a simple app running I think spark-submit is the way to go. But if you're building a more complex product or service you might have to embed Spark and deal with this. 

 

Example, I had to do just this recently, and here's what I came up with: https://github.com/OryxProject/oryx/tree/master/bin

 

In future versions (like 1.4+) there's going to be a more proper programmatic submission API. I know Marcelo here has been working on that.

View solution in original post

17 REPLIES 17

avatar
Explorer

Sweet! Thanks! Gonna try this out when we get a chance! 

avatar
Rising Star
i'm using sbt, should i use spark-submit everytime we need to run a project? SBT run, is catering my needs for now, as im using it in local mode.

avatar
Rising Star
Hi, the link is unavailabe now. Could you provide a new link?

avatar
Master Collaborator

(Which link are you referring to?)

avatar
Rising Star

avatar
Master Collaborator

avatar
Rising Star

i'm running my spark job using sbt,i.e., "sbt run".

You think this can be a problem?

Or should we stick to the spark-submit all the time?

avatar
Rising Star

i'm running my spark job using sbt,i.e., "sbt run".

You think this can be a problem?

Or should we stick to the spark-submit all the time?