Created on 06-30-2014 06:27 AM - edited 09-16-2022 02:01 AM
I have a newly installed CDH5 cluster with Spark configured and installed. I have verified that I can log into the Spark interactive shell, but as of yet I have been unable to submit any spark application via spark-class. Whenever I do, I get the following exception:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/yarn/client/api/impl/YarnClientImpl
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
The instructions I am following are here:
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/CDH5-Installation-Guide/c...
Created 06-30-2014 07:38 AM
OK, I meant what are the Maven / SBT deps, but in any event I think Hadoop 0.20.2 is the problem. CDH5 is Hadoop 2.3, and the supplied Spark works with that. Your classpath on your cluster shows you've got Hadoop 0.20.2 classes also in the mix somehow. I don't know where those are coming from? that is the problem.
Created 06-30-2014 06:43 AM
Can you share the exact command you are running?
The link you supplied goes through a redirector we can't access, and the important part got cut off. Maybe you can clarify what page you are looking at under the installation guide.
Created 06-30-2014 07:05 AM
Here is the command I am running:
$SPARK_HOME/bin/spark-class org.apache.spark.deploy.yarn.Client --jar ~/samples/syn-spark-project-0.0.1-SNAPSHOT.jar --class spark.WorkCountJob --args yarn-standalone --args input/hello.txt --args output.txt
My cluster is setup with Yarn, and the syn-spark-project is a jar that I assembled using the jar stored in the $SPARK_HOME directory on my cluster.
I am looking at the "Running Spark Applications" entry in the documentation.
Created 06-30-2014 07:09 AM
(I think you have a typo in "WorkCountJob" but that's not the issue yet)
Did you run:
source /etc/spark/conf/spark-env.sh
Created 06-30-2014 07:23 AM
Yep, source spark-env and set the SPARK_JAR environment variable as the instructions suggested (and I used the same jar during the compilation of my jar),
Created 06-30-2014 07:30 AM
How did you compile your jar file -- against which Spark and Hadoop deps?
It seems like something is missing from the classpath.
Try executing this first and then re-running:
export SPARK_PRINT_LAUNCH_COMMAND=1
That ought to make it print the command including classpath.
The error is from the local driver, not the app on the cluster right?
Created on 06-30-2014 07:35 AM - edited 06-30-2014 07:36 AM
I am compiling the jar again the Hadoop 0.20.2 dependency and the Spark jar that is loaded on the cluster (the same point I am pointing the spark-class command to).
I do not think this particular error has anything to do with my compiled jar though, because when I run the spark-class while leaving off the jar argument, it fails with the same error (ie, looks like it does not even get to parsing the arguments).
I executed that with the debug command you supplied, here is what I got:
10:32 AM ~/lib/spark-0.9.0-incubating/bin: export SPARK_PRINT_LAUNCH_COMMAND=1
10:34 AM ~/lib/spark-0.9.0-incubating/bin: $SPARK_HOME/bin/spark-class org.apache.spark.deploy.yarn.Client
Spark Command: java -cp :/opt/cloudera/parcels/CDH-5.0.1-1.cdh5.0.1.p0.47/lib/spark/conf:/opt/cloudera/parcels/CDH-5.0.1-1.cdh5.0.1.p0.47/lib/spark/assembly/lib/*:/opt/cloudera/parcels/CDH-5.0.1-1.cdh5.0.1.p0.47/lib/spark/examples/lib/*:/etc/hadoop/conf:/home/tclay/tools/hadoop-0.20.2/*:/home/tclay/tools/hadoop-0.20.2/../hadoop-hdfs/*:/home/tclay/tools/hadoop-0.20.2/../hadoop-yarn/*:/home/tclay/tools/hadoop-0.20.2/../hadoop-mapreduce/*:/opt/cloudera/parcels/CDH-5.0.1-1.cdh5.0.1.p0.47/lib/spark/lib/scala-library.jar:/opt/cloudera/parcels/CDH-5.0.1-1.cdh5.0.1.p0.47/lib/spark/lib/scala-compiler.jar:/opt/cloudera/parcels/CDH-5.0.1-1.cdh5.0.1.p0.47/lib/spark/lib/jline.jar -Djava.library.path=/opt/cloudera/parcels/CDH-5.0.1-1.cdh5.0.1.p0.47/lib/spark/lib:/home/tclay/tools/hadoop-0.20.2/lib/native -Xms512m -Xmx512m org.apache.spark.deploy.yarn.Client
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/yarn/client/api/impl/YarnClientImpl
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:482)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.yarn.client.api.impl.YarnClientImpl
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
Created 06-30-2014 07:38 AM
OK, I meant what are the Maven / SBT deps, but in any event I think Hadoop 0.20.2 is the problem. CDH5 is Hadoop 2.3, and the supplied Spark works with that. Your classpath on your cluster shows you've got Hadoop 0.20.2 classes also in the mix somehow. I don't know where those are coming from? that is the problem.
Created 06-30-2014 07:44 AM
Perfect - Hadoop home was pointing to the wrong place, that was being picked up. I am able to submit applications just fine now. Thanks.