Created on 02-16-2015 06:59 AM - edited 09-16-2022 02:21 AM
SparkConf conf = new SparkConf();
conf.setAppName("Test");
conf.setMaster("yarn-cluster"); // or with "yarn-client"
JavaSparkContext context = new JavaSparkContext(conf);
I am trying to run the above code against CDH 5.2 and CDH 5.3. I receive the errors below (note I am trying both "yarn-client" and "yarn-cluster"). I noticed in the Cloudera maven repo there doesn't seem to be a stable (non SNAPSHOT) spark-yarn [1] or yarn-parent [2] for CDH 5.2 and 5.3. Is this supported? Any tips?
[1] https://repository.cloudera.com/artifactory/cloudera-repos/org/apache/spark/spark-yarn_2.10/
[2] https://repository.cloudera.com/artifactory/cloudera-repos/org/apache/spark/yarn-parent_2.10/
// with yarn-client
java.lang.NoClassDefFoundError: org/apache/hadoop/yarn/conf/YarnConfiguration at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:171) at org.apache.spark.deploy.SparkHadoopUtil$.liftedTree1$1(SparkHadoopUtil.scala:102) at org.apache.spark.deploy.SparkHadoopUtil$.<init>(SparkHadoopUtil.scala:101) at org.apache.spark.deploy.SparkHadoopUtil$.<clinit>(SparkHadoopUtil.scala) at org.apache.spark.SparkContext.<init>(SparkContext.scala:228) at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:53) at com.bretthoerner.TestSpark.testSpark(TestSpark.java:111) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.junit.runners.Suite.runChild(Suite.java:127) at org.junit.runners.Suite.runChild(Suite.java:26) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.junit.runner.JUnitCore.run(JUnitCore.java:160) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:211) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:67) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.yarn.conf.YarnConfiguration at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) ... 44 more
// with yarn-cluster
org.apache.spark.SparkException: YARN mode not available ? at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:1561) at org.apache.spark.SparkContext.<init>(SparkContext.scala:310) at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:53) at com.bretthoerner.TestSpark.testSpark(TestSpark.java:116) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.junit.runners.Suite.runChild(Suite.java:127) at org.junit.runners.Suite.runChild(Suite.java:26) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.junit.runner.JUnitCore.run(JUnitCore.java:160) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:211) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:67) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134) Caused by: java.lang.ClassNotFoundException: org.apache.spark.scheduler.cluster.YarnClusterScheduler at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:171) at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:1554) ... 39 more
Created 02-16-2015 07:44 AM
Nah it's not crazy, just means you have to do some of the work that spark-submit does. Almost all of that is dealing with the classpath. If you're just trying to get a simple app running I think spark-submit is the way to go. But if you're building a more complex product or service you might have to embed Spark and deal with this.
Example, I had to do just this recently, and here's what I came up with: https://github.com/OryxProject/oryx/tree/master/bin
In future versions (like 1.4+) there's going to be a more proper programmatic submission API. I know Marcelo here has been working on that.
Created 02-16-2015 07:25 AM
In general, you do not run Spark applications directly as Java programs. You have to run them with spark-submit, which sets up the classpath for you. Otherwise you have to set it up, and that's the problem here; you didn't put all of the many YARN / Hadoop / Spark jars on your classpath.
spark-yarn and yarn-parent were discontinued in 1.2.0, but then brought back very recently for 1.2.1. You can see it doesn't exist upstream for 1.2.0: http://search.maven.org/#search%7Cgav%7C1%7Cg%3A%22org.apache.spark%22%20AND%20a%3A%22yarn-parent_2....
CDH 5.3 was based on 1.2.0, so that's why.
That said, that is not the artifact you are missing here. You don't even have YARN code on your classpath.
Created 02-16-2015 07:32 AM
Thanks,
I guess my followup is... am I insane for wanting to do it that way? Do other people with existing JVM based apps that need to submit jobs actually use a ProcessBuilder to run spark-submit?
Created 02-16-2015 07:44 AM
Nah it's not crazy, just means you have to do some of the work that spark-submit does. Almost all of that is dealing with the classpath. If you're just trying to get a simple app running I think spark-submit is the way to go. But if you're building a more complex product or service you might have to embed Spark and deal with this.
Example, I had to do just this recently, and here's what I came up with: https://github.com/OryxProject/oryx/tree/master/bin
In future versions (like 1.4+) there's going to be a more proper programmatic submission API. I know Marcelo here has been working on that.
Created 04-08-2015 01:41 AM
Hi
I am trying to do the same thing but the code link you has moved. Can you point me to the location within the oryx code base that I should look
Thanks in advance
Created 04-08-2015 01:58 AM
Created 04-08-2015 05:40 PM
@sowen how do you manage Oryx dependency conflicts with that of Spark? What Spark jars should one include for launching a Spark job programatically within java code? I have been running into a few conflicts, latest one being
java.lang.NoClassDefFoundError: com/google/common/util/concurrent/ThreadFactoryBuilder
Created 04-09-2015 03:41 AM
The Spark, Hadoop and Kafka dependencies are 'provided' by the cluster at runtime and not included in the app. Other dependencies you must bundle with your app. In the case that they conflict with dependencies that leak from Spark you can usually use the user-classpath-first properties to work around them.
Created 07-10-2015 11:53 AM
Out of curiosity, is there a JIRA task for any work relating to the programmatic API to submit spark jobs?
Created 07-10-2015 11:57 AM
It's already complete and in 1.4. This was the initial JIRA https://issues.apache.org/jira/browse/SPARK-4924